With GemFire XD, Pivotal Closes the Big Data Analytics Loop

By Jeff Kelly | March 19, 2013

The team at Pivotal made a handful of announcements today focused on the Big Data and analytics layer of its enterprise platform play. Namely, the company announced Pivotal HD 2, the latest version of its Hadoop distribution that incorporates HAWQ, a modified version of the Greenplum database, for SQL analytics. Version 2 is based on Hadoop 2.2, which as you’ll remember is the remastered version of Apache Hadoop that leverages YARN to enable multiple types of applications (not just MapReduce applications) to run on top of HDFS.

In addition, Pivotal also announced the general availability of GemFire XD, in which its in-memory GemFire database is now able to – like HAWQ – run on Pivotal HD. Finally, the company announced that it has integrated GraphLabs, a library of graph analytics algorithms, and MadLIB, the machine learning library, with HAWQ, making it easier for Data Scientists to perform large-scale analytics on Hadoop.

The GemFire XD announcement is a particularly important one for Pivotal, in my opinion. The company is betting big on its grand vision of a data layer that, among other things, allows for not just Big Data application development, but (as it calls it) Fast Data applications, including transactional applications. The vision is a single data layer that closes the analytics loop by leveraging real-time analytics for intelligent transactional workloads and large-scale Big Data analytics for historical analysis and model building that learns from and feeds the front-end transactions. David Floyer and I developed the below model to illustrate the general concept (not Pivotal’s vision specifically) of a close-loop Big Data architecture.

Integration of Big Streams, Big Systems & Big Data for Implementation of Enterprise-wide Integrated Applications; Source: Wikibon 2013

GemFire XD is the final piece of the Big Data closed-loop puzzle for Pivotal, with Hadoop and HAWQ being the other two. With all three in place, the data fabric layer of Pivotal’s platform is capable of Big Data analytics, Fast Data analytics, and everything in between. The addition of GraphLab and MadLIB should make Pivotal HD even that much more attractive to enterprise Data Scientists.

But that still leaves plenty of work on the cloud and application layers of Pivotal’s platform. Pivotal still needs to show it can attract developers to the platform, because application developers are the ones that put those theoretical models built by Data Scientists into production. On this from, a lot will depend on how well the company integrates the Spring Framework for Java application development with the data layer. And then there’s the cloud layer. Pivotal must seamlessly integrate its PaaS, which the company is touting as a multi-cloud PaaS, with the data fabric layer above it.