Wednesday, June 25, 2014

Cache access patterns

Cache access patterns refer to how an application is going access the cache. There are primarily few strategies that all major cache providers like (EHCache, Infispan, Coherence etc) supports.

Cache aside
In this access pattern, the application will first search for the requested data in cache and if the data does not exist in cache, then it is the responsibility of the application to fetch the data from datasource and update the data into cache. Thus in cache aside access pattern, the application code directly uses the cache by invoking its API to add any missing data in the cache.

Read Through
In this access pattern, the application requests for a data from cache and if the data exist in cache, it is returned to the application. In case, if the data does not exist in the cache (cache miss), it is the responsibility of the cache provider to check for the existence of the data in the datasource. If data exist in datasource, the cache provider will fetch the data, update the cache and finally return the data to the application.

Write Through
In this access pattern, whenever the application updates any data in the cache, the operation will not be complete until the cache provider writes the data directly into the underlying datasource. In this case the cache is always in sync with the underlying datasource. This pattern is easy to implement but the disadvantage is that the write operation is slower due to latency because the datasource need to be accessed for every writes.

Write Behind

This access pattern is similar to Write-Through, the only difference being that the data is updated in the datasource asynchronously. Writes to the datasource can be configured to take place at a specific time, like after 1 hour or midnight or may be at weekends to avoid peak hours. While this pattern is hard to implement, the write operation is very fast and does not require dealing with latency. Another big advantage of this pattern is that many transactions can be grouped in one single transactions which will further reduce latency. The biggest challenge of this pattern is that the write to the datasource happens after the write to the cache and the data is written to the datasource outside the transaction. So there is always a risk of failure and transaction rollback has to be handled very efficiently. Compensating actions like retry counts are used by the cache providers to deal with transaction rollback. Another big challenge for write-behind pattern is that there is a time gap between the cache transaction and the actual datasource transaction which may lead to out-of-order updates. Proper ordering of update actions is required to mitigate this challenge.

Few other cache access patterns exists which are very specific to the cache providers which are basically a hybrid approaches of the above four patterns.
In the next post I'll try to share my knowledge on cache providers and try to do a comparison of the products.

Monday, June 23, 2014

Caching Topologies

There are some common topologies for caching used commonly by cache providers. They are:-

Standalone or in-process Cache
While using an in-process/standalone cache, the cache elements are local to a single instance of an application. In case of applications having multiple instances, there will be as many caches as application instances. Each individual cache may hold data in different states which may lead to inconsistency. However, with a cache eviction policy in place, the data in the cache will be eventually consistent. But this could pose a problem with applications with high volume of cache request and transaction. Standalone cache is a good choice for small applications or where the data in the cache is pre-fetched and seldom changes. So, standalone caches are good candidates for read-only cache with pre-fetched data. Google Guava library is a good example of standalone cache.

Distributed Caching or Partitioned Cache
In distributed or partitioned mode, cache is deployed on a cluster of multiple nodes and offers a single logical view of the cache. Since there is always a single state of the cache cluster, it is never inconsistent. Distributed cache makes use of hash algorithms to determine where in a cluster entries should be stored. Hashing algorithm is configured with the number of copies each cache entry should be maintained cluster-wide. Number of copies represents the tradeoff between performance and consistency of data. The more copies you maintain, the lower will be the performance, but there will be a lower risk of losing data due to server outages. Hash algorithm helps in locating entries without resorting to multicasting requests or maintaining expensive metadata. So, during a put operation, the number of remote calls will be the same as the number of copies maintained. But doing a get operation in the cluster would result in at most 1 remote call. Actually in get operation also the number of remote calls made is equal to the number of copies maintained, but they are made in parallel and whichever call returns first will be accepted. A good example of distributed cache is Memcache.

Replicated Cache

In replicated mode also cache is deployed on a cluster of multiple nodes but any entries added to any of these cache instances will be replicated to all other cache instances in the cluster, and can be retrieved locally from any instance. Replication can be synchronous or asynchronous. Synchronous replication blocks the caller when there is a write operation like put, until the modifications have been replicated successfully to all nodes in a cluster. Asynchronous replication performs replication in the background (the write operation returns immediately). Asynchronous replication is faster since the caller is not blocked. However, when a synchronous replication returns successfully, the caller knows for sure that all modifications have been applied to all cache instances, whereas this is not the case with asynchronous replication. Products like Terracota and Infispan can be configured to run in a replicated mode.

Here is a comparison of the above topologies:-

Replicated Cache
Partitioned Cache
Fault Tolerance
Extremely High
Extremely High
No fault tolerance
Read Performance
Extremely fast
Extremely fast
Write Performance
Extremely fast
Typical Uses
Read-write caches
Local data

There are some other types of topologies which are basically a hybrid of above approaches. Those approaches are very specific to cache providers. Next I'll try to share my thoughts on different cache access patterns.

Tuesday, June 17, 2014

How does a cache works

In this post, I’ll try to share my understanding on how cache work and how a caching framework can be integrated in an application. From an implementation perspective cache is nothing but an interface with methods to get/put/remove a cache object. A sample cache interface is as follows:-

public interface Cache<K, V> {

        * Returns a cached value associated with a key.
        * @param key the key
        * @return the cached value
       public V get(final K key);
        * Puts a value associated with a key in cache.
        * @param key the key
        * @param value the value to cache
        * @return the cached value
       public V put(final K key, final V value);
        * Removes a cached object.
        * @param key the key
       public void remove(K key);

A concrete implementation adds many more methods as well. A cache always store data as key-value pair in memory. Applications need to send the key to retrieve a value from cache. Caching follows a very simple workflow. It lookups up for an value using the key and if value is available, it is returned else it will lookup for the value in the primary storage (database, files etc) to get the value and then update the cache memory before returning the value to the user.

Typically, we use caching frameworks like Ehcache, which integrates with the application to get the data from cache. Applications should not put data directly into cache and should let the framework handle it. An typical example of how an application interacts with a cache framework is as follows:-

Cache Integration architectures

A caching framework could be standalone or distributed (I’ll cover it in a separate post). Distributed caching is used in enterprise applications. Distributed caching enables cached data to be shared across multiple nodes thus providing better performance, high availability and scalability. A cache framework can be integrated with an enterprise application in a number of ways. I'll try to cover a few of them here. 

Cache Integration using an independent Cache Server
This integration styles uses independent cache server spanning across nodes. The application server connects to the cache server and the and the application servers connect to the cache server. The advantage of this architecture is that it can be scaled vertically as well as horizontally without any dependency of the application server. Memcache is used in enterprise application using this architecture. The diagram below is an example of such architecture. 

Cache Integration using an Application Server

In this architecture, applications use the cache associated with the application server. In this case the cache server is "tied" to the application server. Websphere Dynacache is an example of utilizing the following architecture. Using this type of architecture requires very little configuration as the application takes the responsibility of it.

Cache Integration using an Independent Server and a local cache

In this architecture type, a local cache reside in the application server while the main cache server stands independent of the application server. Here actually, the cached data is moved closer to the application requiring cache which will further improve the performance. ObjectStore is an example which utilizes this kind of architecture.

Among all the above architecture, the first is simple and my preferred approach. The next one is dependent on the application server. Not all application servers available in the market comes with an caching solution. So if my application uses the 2nd approach and if I consider changing my application server, my choices will be limited. Although, the last approach is a very elegant solution for caching, there are very few such solutions available in the market and they are very expensive too.
In the next post, I'll try to share my knowledge on distributed caching topologies.

Wednesday, June 4, 2014

Cache Replacement Algorithms

Since last few days, I was studying cache replacement algorithms. After going through a lot of documents and papers, I thought of summarizing what I have read and understand about cache replacement algorithms in last few days. Although there are lot more algorithms, I am summarizing onlya few of the popular algorithms.

LRU (Least Recently Used)
In case of a cache hit, the cache returns a reference to the object and object gets moved to the first position in the cache (the most recently used object). On the other hand, if it is a cache miss, then the object needs to read from the primary storage (or some other source) and then puts the object in to the first position of the cache. If the cache is full, the last entry (the least recently used one) in the cache will be removed before a new object can be added.

LRU2 (Least Recently Used 2)
This algorithm is a development on the LRU algorithm. In this case, the algorithm replaces the object whose penultimate (second-to-last) access is least recent among all penultimate accesses. The objective of this algorithm is not to find and evict an object since it is accessed once but to find out how frequent the object is accessed over time (popularity) and to evict the less popular one.

MRU (Most Recently Used)
MRU algorithm is most useful in situations where the older an item is more likely to be accessed. When we need to get rid of a object in cache, we trash the one we just recently accessed. This algorithm has very specific use, for example, a photo gallery. Users who have already viewed a photo has less chance of viewing it again.

2Q is similar to LRU2 but improves upon it making it better and adaptive cache replacement algorithm. When an object in cache is first hit, 2Q place it in a special buffer, which is managed as a FIFO queue. If the object is referenced again while in the special buffer, the object is moved into another list which is managed as LRU. If a object is not referenced while on the special buffer, removed from the buffer.

LFU (Least Frequently Used)
In this algorithm, each object in cache has a corresponding counter associated with it. The counter counts how many times an object is accessed. As a replacement policy, the objects that are access least number of times are first to be replaced. So if an object is accessed a lot number of times, it is going to stay in cache.

First-In First-Out (FIFO) is considered as the oldest cache replacement algorithm. It simply maintains a list of all objects in the cache such that head of the list is the oldest arrival and tail of the list is the most recent arrival. FIFO has a poor performance compared to LRU and is seldom used today in its original form.

Second Chance
Second Chance (SC) is an enhancement to FIFO. In SC, a reference bit is maintained for each object in cache while maintaining the objects as a FIFO queue. When a new object is fetched from a primary storage, it is appended at the tail of the queue. The reference bit for the new object added is set to zero. When an object in cache is accessed then its reference bit is set to 1. The replacement policy in SC checks the object at the head of the FIFO queue and replaces it if its reference bit is zero else the object at the head is moved to the tail and its reference bit is reset to zero. The policy keeps checking the objects at the head until an object is found with a zero reference bit. Once it finds an object with zero reference bit, it will be replaced. SC has a overhead of moving objects from head to the tail of queue which makes it a bit inconsistent.

CLOCK is functionally very similar to SC. The difference being in CLOCK uses a circular queue instead of FIFO and the replacement algorithm cycles through the queue like a arm of CLOCK. By default, the clock hand points to a object and whenever a replacement is needed the policy starts checking the object, pointed by the hand, in the same as SC. Clock eliminates the need to move a object from head to tail thus providing a better performance.

ARC (Adaptive Replacement Cache)
ARC is considered to be one of the best cache replacement algorithms. ARC uses 2 lists, L1 ans L2. The first list contains objects that have been seen only once recently, while the latter contains objects that have been seen at least twice recently. So the List L1 captures recency while the list L2 captures the frequency. The replacement policy is based on constant reshuffling of the L1 and L2 lists. ARC also adapts based on workloads. It is fast and self-tuning.

CAR (Clock with Adaptive Replacement)
This algorithm CAR is inspired by the Adaptive Replacement Cache (ARC) algorithm, and inherits all advantages of ARC including its high performance. CAR maintains four doubly linked lists, say, T1, T2, B1 and B2. The lists T1 and T2 contain the objects in cache, while the lists B1 and B2 maintain history information about the recently evicted objects. The CLOCKS T1 and T2 contain those objects that are in the cache and the lists B1 and B2 contain history objects that were recently evicted from the cache. The CLOCK T1 captures “recency” while the CLOCK T2 captures “frequency.” The lists B1 and B2 are simple LRU lists. Objects evicted from T1 are placed on B1 , and those evicted from T2 are placed on B2.

Sunday, June 1, 2014

Caching basics

Caching is the technique of storing data temporarily (frequently used data) which are expensive to fetch every time. Fetching the data from caching is much faster compared to fetching it from the source.

Cache Hit:

Simply put, if the cache contains the data being looked up, it is known as cache hit.
The percentage of requests that result in cache hits is known as the hit rate or hit ratio of the cache.

Cache Miss:
When the data being looked up is not available in the cache, then it is known as cache miss. In case of a cache miss, the data is fetched again from the primary storage and put into cache.
In case of cache miss there can be two scenarios:

Free Space available
In this case the object will be fetched from primary storage and added to the cache.

No Free Space available
In this case the object will be retrieved from the primary storage and will replace an existing entry in the cache. Now, the entry to be replaced is governed by a cache replacement algorithm (LRU, MRU, LRU2 etc, details of which I’ll share in a later post).

Storage Cost
When a cache miss occurs, data will be fetch it from the primary storage, load it and place it in the cache. Also data can be prefetched at application startup. Here what is important is how much space to be used by cache (space versus time tradeoff)

Miss Penalty
And when we need to load the data we need to know how much does it take to load the data. This is known as Retrieval cost

Replacement Policy:
When cache miss happens, the cache ejects entries in order to make room for the newly fetched data from primary storage. The heuristic used to select the entry to eject is known as the replacement policy.

Average memory-access time = Hit time + Miss rate x Miss penalty

Hit time – time to access the cache
Hit ratio -- percentage of time the data is found in the cache
Miss ratio -- (1 - hit ratio)

Miss penalty – Time to access the primary storage