Friday, July 11, 2014

From distributed caching to in-memory data grid

Distributed caching is around since the late 1990s and has been continually evolving since then. Many new features have been added to distributed caching frameworks but simply accessing data in key/value pairs from memory was not fulfilling the expectations of the customer. In fact, storing data in key/value pair does not add much value but processing of the data do add value. So the customers were expecting to move the computation of data closer to data itself. Data grids addressed this requirement by adding features to be able to do computations on the cached data. For example data grids provide feature like querying data in the memory using standard SQL syntax, data-indexing, providing map-reduce based processing, support for various complex data models (document, relational) etc.

Distributed Caching

Since late 1990s, distributed caching technologies evolved continuously by add more and more features. While the first generation distributed caching provided simple cache clusters with a sophisticated hashing algorithm to keep track of the data. In the next generation of distributed caches, we find advanced features like high availability using partitioned or replicated architecture, ACID transactions, distributed locking, asynchronous events and active backups.

Data Grid

Although, distributed caching has evolved and matured over the years, what was missing was to bring the computation of cache data to in-memory. Data has become more and more complex over the years. New requirements like providing dynamic scalability, database-like persistence, map-reduce based processing, SQL like querying features, support for different types of data model like document, json, relational etc were coming up. Data grids addressed these requirements and also provided additional features like capability for monitoring and management, policy and security enforcement, support quality of service and easy integration with existing enterprise applications. Growing adaption of data grid is bringing in more sophisticated requirements and higher customer expectations.
The important thing to note here is that Data Grid is not backed by any specification or industry standards. So the growth of Data grid is completely based on customer requirements. Few popular Data Grid tools in terms of customer adoption are:-
  • VMWare Gemfire
  • Oracle Coherance
  • Gigaspaces-xap

Along with the previous 4 posts, I tried to cover as much as possible about caching. There are many specific areas which I'll address going forward like tools comparison. Please comment if you find anything missing or need more information and explanation.

No comments: