There are
some common topologies for caching used commonly by cache providers. They are:-
Standalone or in-process Cache
While using
an in-process/standalone cache, the cache elements are local to a single
instance of an application. In case of applications having multiple instances,
there will be as many caches as application instances. Each individual cache
may hold data in different states which may lead to inconsistency. However,
with a cache eviction policy in place, the data in the cache will be eventually
consistent. But this could pose a problem with applications with high volume of
cache request and transaction. Standalone cache is a good choice for small
applications or where the data in the cache is pre-fetched and seldom changes.
So, standalone caches are good candidates for read-only cache with pre-fetched
data. Google Guava library is a good example of standalone cache.
Distributed Caching or Partitioned
Cache
In
distributed or partitioned mode, cache is deployed on a cluster of multiple
nodes and offers a single logical view of the cache. Since there is always a
single state of the cache cluster, it is never inconsistent. Distributed cache
makes use of hash
algorithms to determine where in a cluster entries should be stored. Hashing
algorithm is configured with the number of copies each cache entry should be
maintained cluster-wide. Number of copies represents the tradeoff between
performance and consistency of data. The more copies you maintain, the lower
will be the performance, but there will be a lower risk of losing data due to
server outages. Hash algorithm helps in locating entries without resorting to
multicasting requests or maintaining expensive metadata. So, during a put operation,
the number of remote calls will be the same as the number of copies maintained.
But doing a get operation in the cluster would result in at most 1 remote call.
Actually in get operation also the number of remote calls made is equal to the
number of copies maintained, but they are made in parallel and whichever call
returns first will be accepted. A good example of distributed cache is Memcache.
Replicated Cache
In
replicated mode also cache is deployed on a cluster of multiple nodes but any entries
added to any of these cache instances will be replicated to all other cache
instances in the cluster, and can be retrieved locally from any instance. Replication
can be synchronous or asynchronous. Synchronous replication blocks the caller when
there is a write operation like put, until the modifications have been
replicated successfully to all nodes in a cluster. Asynchronous replication
performs replication in the background (the write operation returns
immediately). Asynchronous replication is faster since the caller is not blocked.
However, when a synchronous replication returns successfully, the caller knows
for sure that all modifications have been applied to all cache instances,
whereas this is not the case with asynchronous replication. Products like Terracota
and Infispan can be configured to run in a replicated mode.
Here is a comparison of the above topologies:-
|
Replicated Cache
|
Partitioned Cache
|
Standalone
|
Fault Tolerance
|
Extremely
High
|
Extremely
High
|
No fault
tolerance
|
Read Performance
|
Extremely
fast
|
Extremely
fast
|
Instant
|
Write Performance
|
Fast
|
Extremely
fast
|
Instant
|
Typical Uses
|
Metadata
|
Read-write
caches
|
Local
data
|
There are some other types of topologies which are basically a hybrid of above approaches. Those approaches are very specific to cache providers. Next I'll try to share my thoughts on different cache access patterns.
1 comment:
Nice reading
Post a Comment