Distributed Computing · NoSQL · Persistence

A Polyglot NoSQL Data Store: OrientDB

It’s vary rare to find a NoSQL Data Store able to provide Polyglot Persistence capabilities (a comprehensive introduction by M. Fowler is here). In the NoSQL products portfolio, OrientDB elevates itself as an integrated storage solution for:

  • Document stores (like the more famous MongoDB and Couchbase),
  • Object DB (it can directly store Java POJO objects), and
  • Graph DB (like the most appreciated Neo4j, and also Titan),

with a very permissive licensing scheme based on Apache 2 (almost all features can be used in production without any fee).

OrientDB is a project that starts a bit far in the past (late 2000s)  and that is grown a lot during these last few years till to become a popular NoSQL data store in the open source community. As said, it provides the capability to have a polyglot persistence layer integrated in only one solution, with a query language that’s a SQL dialect with ad-hoc extensions for Graph and Object DBs. It provides out-of-the-box features like Master-Master or Active-Active replication, allowing to easily scale both in reading and writing; replica auto-discovery is just another example of features related to the data store clustering, and that’s really important in today’s deploy scenarios. Behind the scenes, OrientDB makes use of Hazelcast and its features to make the replication/clustering stuff so easy to deploy and maitain: that’s another good point, since we’ve seen/proof Hazelcast’s capabilities.

In order to better explore and evaluate OrientDB, let’s write down a features list.

  • ACID Transactionality. Such a level of strict transactionality is normally found in traditional Relational Database.
  • Record, User and Roles Security. A very customization level of authentication and authorization, allowing a fine tuning of the system use cases.
  • SQL compliant dialect. Something really desirable, mostly for people that come from traditional Relational storage and that cannot learn new query languages (e.g. MongoDB query language or Neo4j Chyper language) normally very counter intuitive.
  • Embedded and Server deploys. OrientDB can live in its own process, listening on a network socket for the incoming requests, or can live in the same JVM of the client application and being used by the means of its APIs.
  • Data Sharding. Data is stored and then distributed cleverly (e.g., EMEA Customers are stored in a specific records cluster, separated from the US one) to allow very fast query time and an effective storage.
  • Replica Auto-Discovery. Upon the start up, database cluster members discover each other seamlessly, without any human intervention, neither configuration (that’s almost allowed/inherited by Hazelcast).
  • Native REST Services/APIs. A set of rich REST APIs is provided to access the data stored in OrientDB databases or records clusters, this is a very intriguing feature normally offered out-of-the-box by market leaders NoSQL data stores.
  • Multi-Master Replication. Data is mirrored among the database replicas, that’s not literally but in the sense that practically each replica is able to both serve write and read requests.
  • Visualization Tools. OrientDB Studio is a comprehensive set of tools running in a Web App that seamlessly allow to query and visualize data (a very nice Graph Visualizer is in bundle).

It’s a very comprehensive set of features that everyone would like to have in its own persistence layer, moreover it’s fee-free seen the licensing scheme.

It’s time now to have a quick look performance-wise. Well, even in this field, OrientDB seems to be very ready to compete with competitors like MongoDB and Neo4j, respectively for the document-oriented and graph data storage solutions. In fact, benchmarks from the last stable version highlights an impressive scalability in endurance tests, and a paper from the Tokyo Institute of Research and IBM Research Labs shows unexpected performances: OrientDB seems to be 10x faster than Neo4j, in all the benchmarked scenarios – 3rd party and independent source.

For the interested readers that want to dig into the above said competition, two interesting comparison sheets can be found respectively in OrientDB Vs MongDB and OrientDB Vs Neo4j.

OrientDB innovated the storage engine (hopefully, another post will dig into these aspects) in order to achieve both scalability and high performance. Storing effectively the data is a key aspect and Orient Technologies’ Engineers got that, creating a performant storage layer able to accommodate several data representation schemes/models by preserving effectiveness and elasticity (i.e. relationships can be light- or heavy-weight, according to the specific storage type and needs).

Definitely, OrientDB deserves at least a try, a run… I believe that soon later, the statements in this blog will be more familiar and much more trustworthy!

PS I was forgetting: OrientDB is enterily written in Java, so it’s portable on almost every hardware configuration!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s