Architectures and Design Patterns · Data Structures and Algorithms · Distributed Computing

Hazelcast In Memory Data Grid: a quick Intro

In Today’s distributed computing, the need to have reactive times in processing event flows is a stringent requirement in many contexts, for example Financial Industry, Internet Services, Health Industry, and so on. Such a requirement pushes the actual technological limits towards a new model of computation: in-memory processing, for which data never leaves the memory until it’s fully fledged. This means that clusters of in-memory processors can share and work collaboratively on data: let’s imagine a distributed map-reduce computation with having the data stored in main memory (a fast and volatile support) instead of having it on disk (a slow and durable support).

The above is the scenario for Hazelcast, an In Memory Data Grid (IMDG) offering a permissive licensing scheme and lots of powerful features, like Clustering, Sharding, Java-compliant APIs for Map, Set and List Data Structures.

Let’s dig into Hazelcast’s main features.

  • Distributed Data Structures. A comprehensive set of standard-compliant data structures, natively distributed at Hazelcast’s core. Examples are: Set, Map, Queue, List and MultiMap (not defined in the collection library, but a highly used compound data structure). These data structures are thread-safe implementations and can be seamlessly injected in your Java code, apart of the MultiMap that doesn’t have a standardized interface.
  • Distributed Events. Applications can subscribe to core events (e.g. modification of a data structure) and being asynchronously notified upon the events themselves, no matter where they’ve been generated in the cluster. ExecutorService can be used to offload the application in case of long-running computation of any of the received event.
  • Distributed Computing. A distributed implementation of the ExecutorService allows to execute application code in/over the cluster. For example, by defining a serializable Callable, Hazelcast can be instructed to execute the code on i. a specific cluster member, ii. a member owning the picked key, iii. a member Hazelcast picks or iv. all or a subset of the cluster  members.
  • Distributed Query. Since the data structures are natively distributed and data is sharded, operations like a full-scan over a map values is not efficient anymore in such context (all data should be moved locally before to be accessed). Hazelcast provides a SQL-like support to query the data distributely and retrieve locally small chunks of data (just that satisfying the criteria).
  • Distributed Transactions. The distributed data structures can be used in transactional contexts, by simply using transactional interfaces provided by Hazelcast and declaring a transactional context; commit/rollback can then be used to confirm or abort the actual operation.
  • Integrated Clustering. Clustering is a powerful out-of-the-box feature provided natively by Hazelcast. Members use multicast to discover each other, no intervention or configuration is needed to start up an Hazelcast cluster: zero-configuration clustering means that by creating instances of Hazelcast on different processes over the network, these instances will be able to discover each other and create a group of communication.
  • Integrated Sharding. In a cluster, data is almost evenly distributed according to the formula: 1/N*D, where N is the cluster size (number of nodes) and D is the total amount of data maintained. Dynamic joining members will be loaded with almost the same amount of data plus some backups.
  • Serialization. Before to be transferred on the network, or simply used at the core, everything is serialized by Hazelcast’s serialization framework. From field experience, different types of serialization can achieve different performances, and Hazelcast is able to offer several serialization paradigms to allow to picking the best one for the actual application needs.
  • Security. SSL can be used for network communication. Actually, Hazelcast provides more in terms of security configuration, in fact it allows to intercept client- and members-connections to enforce the desired security policy That’s achieved by the mean of customized interceptors registered by the application and managed by Hazelcast’s core.
  • Management. Hazelcast provides monitoring features through JMX. Different aspects can be monitored for each data structures, as well as for the clustering data. Such monitoring support allows to use any of the JMX monitoring tool already in place, or simply pick a preferred new one seamlessly. There’s and in-bundle monitoring tool, the Hazelcast Management Center, that may be simply installed (it’s shipped as a WAR into the Hazelcast’s ZIP archive) and used to monitor the cluster health above the all.

Of course, for a more comprehensive introduction to Hazelcast, a look at the Reference Manual and the Get Started sections might help.

 

As conclusion, in this post Hazelcast is introduced as a valuable support for accommodating modern distributed computing paradigms. A list of features is proposed, and for each one a quick explanation is provided. Definitely, this post didn’t aimed at providing a in-depth but at providing a complete and immediate understanding of Hazelcast and its features.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s