An insightful presentation of a re-design which allows HDFS to scale out elegantly.
HopFS: Scaling hierarchical file system metadata using NewSQL databases Niazi et al., FAST 2017
If you’re working with big data and Hadoop, this one paper could repay your investment in The Morning Paper many times over (ok, The Morning Paper is free – but you do pay with your time to read it). You know that moment when you’re working on a code base and someone says “why don’t we replace all this complex home-grown code and infrastructure with this pre-existing solution…?” That!
Here’s the big idea – for large HDFS installations, the single node in-memory metadata service is the bottleneck. So why not replace the implementation with a NewSQL database and spread the load across multiple nodes? In this instance, MySQL Cluster was used, but that’s pluggable. Of course, there will be some important design issues to address, but the authors do a neat job of solving these. I…
View original post 1,448 more words