Tuesday, June 17, 2014

How does a cache works

In this post, I’ll try to share my understanding on how cache work and how a caching framework can be integrated in an application. From an implementation perspective cache is nothing but an interface with methods to get/put/remove a cache object. A sample cache interface is as follows:-

public interface Cache<K, V> {

       /**
        * Returns a cached value associated with a key.
        * @param key the key
        * @return the cached value
        */
       public V get(final K key);
      
       /**
        * Puts a value associated with a key in cache.
        * @param key the key
        * @param value the value to cache
        * @return the cached value
        */
       public V put(final K key, final V value);
      
       /**
        * Removes a cached object.
        * @param key the key
        */
       public void remove(K key);
}


A concrete implementation adds many more methods as well. A cache always store data as key-value pair in memory. Applications need to send the key to retrieve a value from cache. Caching follows a very simple workflow. It lookups up for an value using the key and if value is available, it is returned else it will lookup for the value in the primary storage (database, files etc) to get the value and then update the cache memory before returning the value to the user.



Typically, we use caching frameworks like Ehcache, which integrates with the application to get the data from cache. Applications should not put data directly into cache and should let the framework handle it. An typical example of how an application interacts with a cache framework is as follows:-




Cache Integration architectures

A caching framework could be standalone or distributed (I’ll cover it in a separate post). Distributed caching is used in enterprise applications. Distributed caching enables cached data to be shared across multiple nodes thus providing better performance, high availability and scalability. A cache framework can be integrated with an enterprise application in a number of ways. I'll try to cover a few of them here. 

Cache Integration using an independent Cache Server
This integration styles uses independent cache server spanning across nodes. The application server connects to the cache server and the and the application servers connect to the cache server. The advantage of this architecture is that it can be scaled vertically as well as horizontally without any dependency of the application server. Memcache is used in enterprise application using this architecture. The diagram below is an example of such architecture. 




Cache Integration using an Application Server

In this architecture, applications use the cache associated with the application server. In this case the cache server is "tied" to the application server. Websphere Dynacache is an example of utilizing the following architecture. Using this type of architecture requires very little configuration as the application takes the responsibility of it.

Cache Integration using an Independent Server and a local cache


In this architecture type, a local cache reside in the application server while the main cache server stands independent of the application server. Here actually, the cached data is moved closer to the application requiring cache which will further improve the performance. ObjectStore is an example which utilizes this kind of architecture.

Among all the above architecture, the first is simple and my preferred approach. The next one is dependent on the application server. Not all application servers available in the market comes with an caching solution. So if my application uses the 2nd approach and if I consider changing my application server, my choices will be limited. Although, the last approach is a very elegant solution for caching, there are very few such solutions available in the market and they are very expensive too.
In the next post, I'll try to share my knowledge on distributed caching topologies.

No comments: