The data platform is becoming an application platform driven by analytics and artificial intelligence. This is fueling a major transition as organizations rapidly discover they must do many things differently.
To better understand the significant dynamics that are shaping this transformation, George Gilbert, my colleague and fellow analyst here at theCUBE Research and I reached out to Bob Muglia, an industry leader who has a deep understanding of the technology trends worth watching in this evolving framework.
Muglia was the president of the precursor to Microsoft Corp.’s Enterprise Division, growing the business to $14 billion. After two years at Juniper Networks Inc., in 2014, he left to step into the CEO role at Snowflake, Inc., a two-year-old tech startup with virtually no revenue. Under Muglia’s leadership, not only did Snowflake grow to become a key technology player, but he also prepared the company for an IPO that remains the largest debut of an enterprise software company in history.
Muglia left Snowflake in 2019 and now works as an advisor, consultant, and investor. He is also the author of “The Datapreneurs: The Promise of AI and the Creators Building Our Future.
George and I sat down for an extended conversation with Muglia for an episode of our series, “The Road to Intelligent Data Apps” theCUBE’s continuing conversation about the next data platform, a modern, emerging framework where the leading vendors are Databricks, Snowflake, AWS, Azure, and Google.
Watch the full discussion: The Ultimate Insider’s Guide to the Modern Data Stack with Bob Muglia
The Search for Consistency in Data Lakehouse Architecture
“What we have right now is a little bit of a ‘Beta versus VHS’ situation in terms of the fact that customers are putting their data into a given type of data lake,” Muglia said. “The metadata that turns these files into tables is all different in its structure. It means that these systems don’t work together the way they should.”
As Muglia pointed out in previous conversations with analysts at SiliconANGLE and theCUBE Research, data lakes are maturing, yet there remains a lack of consistent governance. As data is added, the transaction consistency model must be effective in examining data from both a file view and a table view.
The three most popular standards for data lakehouse architecture are Apache Hudi, Delta Lake, which is used by Databricks, and Apache Iceberg, the open-source format of choice for Snowflake and Google. The problem is there are significant incompatibilities between these options.
One solution on the horizon is XTable, an open-source lightweight translation layer that was accepted as an Apache Software Foundation incubating project this month. The goal behind XTable is to allow users to seamlessly translate metadata between the source and target table formats without the headache of rewriting or duplicating data files
“That allows you to have data in one of these formats like Delta or Iceberg and then it will convert it to another format so it can be used by a different vendor,” Muglia explained. “It’s unclear right now whether that’s going to solve the problem or not. It is a mess for customers; this is an unfortunate situation right now.”
An important part of the dialogue around this issue is that while the industry is used to working with tables, they do not provide an effective structure for storing semantics, the meaning behind the data, and relationships between entities. Users need something more granular, and this is where Muglia believes the relational knowledge graph can provide a better solution. Note that Muglia is currently a board member for RelationalAI, provider of an AI coprocessor based on knowledge graph technology. If you’re smart, you’ll keep RelationalAI on your radar screen, as Muglia tends to make some solid bets on where he puts his attention.
“I continue to believe that the industry needs a solution for a knowledge graph database,” Muglia said. “Database semantics that we need for the semantic layer just don’t exist in the modern data stack. When we have that, I think these solutions will begin to emerge much more rapidly.”
Major Platforms Jockey for Position in Intelligent Data
A quest for database semantics in the modern stack has gained more urgency over the past year with the explosion of AI use cases. Interest in incorporating AI models into the business is at an all-time high. Muglia believes that the future of data apps will be tied to a shift from a code-first approach to a model-driven framework.
“A language model is not a genie or something; it can’t find out things out of thin air,” Muglia said. “If you have a business term that you use in your company that doesn’t have generally accepted meaning to other customers, that must be defined somehow in order to have the large language model do the right thing. It ultimately needs to be expressed inside these semantic models.”
In the meantime, the existing intelligent data platforms continue to announce new enhancements and services. We asked Muglia for thoughts in terms of current or expected solutions for the application platform and here is what he shared:
Databricks. Databricks has emerged as a place for developers to build models, and the acquisition of MosaicML should provide further progress for the company, according to Muglia. He characterized Snowflake as an “advanced enterprise development environment for modern data applications” and noted its focus on shipping Snowpark Container Services as a set of interfaces for applications built on its platform.
Google. Google has a “lot of good tools,” according to Muglia, yet he wonders if the company can put those pieces together in a way that makes it easy for the customer. Asked if AWS could move to a data-centric architecture, Muglia was doubtful, saying, “It’s not in their DNA.”
Microsoft. Muglia was intrigued by Microsoft’s position, as the company looks to capitalize on the launch of its Fabric unified analytics solution that leverages Azure Data Factory, Azure Synapse Analytics and Microsoft Power BI.
“I think they are the 800-pound gorilla in the room, to be honest with you,” Muglia said. “They’ve built a good product in Fabric. They’ve not released Power BI Copilot yet. It will be really interesting to see what they do there.”
As is always the case with Bob Muglia, he brings deep insights and his decades of experience to the conversation about the evolution of the modern data platform. If you’ve not yet read his book, “The Datapreneurs, The Promise of AI and the Creators Building Our Future,” add it to your ‘must read’ list.
.
Featured image credit: ThisisEngineering
See more of our coverage on the evolution of the modern data platform here:
Breaking Analysis: Bob Muglia on Uber for Everyone, how Modern Data Apps Will Evolve