Wednesday, June 25, 2014

Cache access patterns

Cache access patterns refer to how an application is going access the cache. There are primarily few strategies that all major cache providers like (EHCache, Infispan, Coherence etc) supports.

Cache aside
In this access pattern, the application will first search for the requested data in cache and if the data does not exist in cache, then it is the responsibility of the application to fetch the data from datasource and update the data into cache. Thus in cache aside access pattern, the application code directly uses the cache by invoking its API to add any missing data in the cache.

Read Through
In this access pattern, the application requests for a data from cache and if the data exist in cache, it is returned to the application. In case, if the data does not exist in the cache (cache miss), it is the responsibility of the cache provider to check for the existence of the data in the datasource. If data exist in datasource, the cache provider will fetch the data, update the cache and finally return the data to the application.

Write Through
In this access pattern, whenever the application updates any data in the cache, the operation will not be complete until the cache provider writes the data directly into the underlying datasource. In this case the cache is always in sync with the underlying datasource. This pattern is easy to implement but the disadvantage is that the write operation is slower due to latency because the datasource need to be accessed for every writes.

Write Behind

This access pattern is similar to Write-Through, the only difference being that the data is updated in the datasource asynchronously. Writes to the datasource can be configured to take place at a specific time, like after 1 hour or midnight or may be at weekends to avoid peak hours. While this pattern is hard to implement, the write operation is very fast and does not require dealing with latency. Another big advantage of this pattern is that many transactions can be grouped in one single transactions which will further reduce latency. The biggest challenge of this pattern is that the write to the datasource happens after the write to the cache and the data is written to the datasource outside the transaction. So there is always a risk of failure and transaction rollback has to be handled very efficiently. Compensating actions like retry counts are used by the cache providers to deal with transaction rollback. Another big challenge for write-behind pattern is that there is a time gap between the cache transaction and the actual datasource transaction which may lead to out-of-order updates. Proper ordering of update actions is required to mitigate this challenge.

Few other cache access patterns exists which are very specific to the cache providers which are basically a hybrid approaches of the above four patterns.
In the next post I'll try to share my knowledge on cache providers and try to do a comparison of the products.

No comments: