Formerly known as Wikibon

241 | Breaking Analysis | The Emerging Data Stack Brings Opportunities and Risk for Buyers and Sellers

The so-called modern data stack is getting a facelift and perhaps a complete body makeover. As the point of control shifts from the DBMS to the governance layer we cite three dynamics that highlight a reshaping of today’s data landscape, including: 1) Key data players are disrupting the established norm as they expand their aspirations; 2) Data platform vendors used to compete among each other, as they pursue TAM expansion, they enter new competitive environments up the stack; and 3) These market and stack dislocations cause confusion for customers which present both opportunities and risks. 

In this Breaking Analysis we review our learnings from Supercloud 7, Get Ready for the Next Data Platform, which featured the top voices and thought leaders in data. We’ll present a view of the shifting data stack as we see it today, review some data points from a recent ETR survey and close with some final thoughts on what to look for going forward.

Shifting Points of Control in the Data Stack

Our analysis coming out of Supercloud 7 provided several insights from the community which reinforced many key points of our premise. Specifically we see today’s modern data stack, typified by cloud infrastructure and the separation of compute from storage, evolving in critical ways that will impact customer decisions in the near to mid-term. Leveraging survey research from ETR that we introduced last week, we explored the sentiments of joint Databricks and Snowflake customers, going deeper into customer perspectives and future plans around open table formats, governance and generative AI. The following comments summarize our current views.

Key Takeaways:

  • Shifting Points of Control: Traditionally centered in the DBMS, we believe the control point is moving towards the governance layer. This shift is catalyzing dislocations within the industry and will affect customers spending patterns.
  • Governance Layer Dynamics: Established players with decades-long experience, are now facing new open source dynamics highlighted by Unity and Polaris, two emerging governance solutions vying for leadership.
  • Open vs. Proprietary Formats: Organizations are managing a blend of open and proprietary table formats, adding layers of complexity to governance solutions. This mix includes cloud vendor governance, emerging open-source governance, and multi-faceted catalog solutions. Iceberg appears to have the early lead but adoption is still nascent.
  • TAM Expansion Impacts: Data platforms are moving beyond traditional metrics, analytics, and dashboards, aiming to build intelligent data apps and construct digital representations of businesses. This necessitates engagement with operational data from legacy systems (e.g. Salesforce, Oracle, SAP, etc.), thereby opening up new opportunities and competitive pressures.
  • Data Pipelines and the Harmonization Layer: Data pipelines have played an important role in consolidating data, but complexity has proved problematic for many customers. Complexity is actually increasing in our view which necessitates new thinking around how to address governance and open formats.

Bottom Line:

The modern data stack is undergoing a significant transformation, with control points shifting towards governance layers, and data platform vendors, specifically Databricks and Snowflake, attempting to expand their TAM. As these platforms move up the stack, they face new competition, particularly from hyperscalers and legacy software vendors. The complexity of many open and proprietary data and governance choices, highlights the importance of data harmonization. We believe that organizations must navigate these changes carefully to harness the full potential of their data assets, however the path today is uncertain due to a lack of clear standards.

Watch this conversation George Gilbert had with Muralidhar Krishnaprasad of Salesforce to better understand the increasing levels of competition Databricks and Snowflake face as they move up the stack: Building a Metadata-Centric Platform for Intelligent Applications.

Conflicting Priorities and Personas in Data Governance

In last week’s Breaking Analysis, we introduced a flash survey conducted with ETR, based on data from 105 joint Databricks and Snowflake accounts. The survey aimed to uncover prevailing sentiments regarding security, governance, and tool selection in data management. We use the following slide from that survey to highlight the diverse and often conflicting priorities that organizations face as they navigate the complexities of modern data governance.

Key Takeaways:

  • Security and Governance are Fundamental: A significant majority (86% for security and 70% for governance) of respondents prioritize security and governance above all else. Our view is this inclination tends to favor more integrated platforms like Snowflake, which require customers to put their data into Snowflake to take advantage of the most comprehensive governance solutions.
  • Avoiding Lock-In: Conversely, a substantial cohort is focused on avoiding vendor lock-in at all costs, aligning more with Databricks’ open-source ethos.
  • Consolidation vs. Flexibility: There is a stark divide, with 45% of respondents indicating a preference for consolidating data into a single tech stack, even at the expense of flexibility. Meanwhile, others prioritize the freedom for analysts to choose their tools, highlighting a fundamental tension within organizations.
  • Persona Alignment Challenges: The survey data underscores the internal conflicts between different personas within organizations, each with distinct priorities. Aligning these personas through governance and reorganization is a critical but challenging task. Lack of alignment will in our view expose firms to greater risk.
  • On-Prem vs. Cloud: A notable 39% plan to keep core intellectual property data on-premises for the next year, while others advocate for robust data warehousing systems that minimize the need for open table formats.
  • Data Rebels and Innovation: A segment of respondents, referred to as “data rebels,” prioritize rapid innovation over stringent data security and governance. Notably, these data rebels were the most open minded to moving off of Snowflake on to Databricks.

A notable 39% of respondents plan to keep core data intellectual property on-premises for at least the next twelve months.

Bottom Line:

The survey and our analysis reveal a landscape fraught with conflicting priorities and personas, complicating the path towards cohesive data governance. Organizations must navigate these tensions, balancing the need for security and governance with the desire for flexibility and innovation. As data platforms like Snowflake and Databricks continue to evolve, the industry must address these challenges head-on to achieve harmonized and effective data management strategies. Organizations must evaluate the quality, efficacy and maturity of open source governance solutions and develop strategies that align with their existing governance approach. Nearly 30% of respondents in the survey cited comfort with managing their data silos. We generally believe this approach is suboptimal for putting data at the core of operations, however it may bring time to market advantages for individual business units and will likely remain a viable strategy.

The Evolution and Fragmentation of the Modern Data Stack

As we examine the emerging data stack, it’s evident to us that the so-called modern data stack is evolving rapidly, introducing new complexities and competitive dynamics. While foundational elements like cloud infrastructure and data warehouses are well-established, the layers above are where significant action and innovation are unfolding. The following points summarize our thinking on how the data stack is evolving and the changes it portends.

Key Takeaways:

  • Cloud Infrastructure: AWS set the gold standard for cloud infrastructure. Competitors like Google, Microsoft, and Oracle are advancing by learning from AWS’s strengths and weaknesses and developing differentiated strategies at the infrastructure level. Regardless, this layer of the stack is fairly well understood and mature.
  • Data Warehousing and Pipelines: Snowflake has cemented its place as the leader in cloud DBMS, while Databricks has dominated the data pipeline space with Spark and other tooling.
  • Open Table Formats: While still early days, the interest in interest in adopting open table formats, particularly Iceberg, is on the rise, with 70% of respondents indicating a shift towards this format.
  • Governance Layer: The governance layer is becoming the new strategic control point, moving beyond traditional DBMS. Key players are attempting to make this the new ‘moat’ in our view. This includes Databricks’ Unity Catalog, Snowflake’s Polaris, which must coexist with a variety of solutions from Google, Microsoft, AWS, Informatica, Collibra, Alation and others. The governance landscape remains highly fragmented, with a plethora of solutions and a complex ecosystem of partnerships and standards. As well, solutions like Microsoft Purview are attempting to become the ‘catalog of catalogs’ leaving the governance wars to others.
  • Semantic Layer: For the lack of a better term, we often referred to the semantic layer, which involves data harmonization to support the creating digital representations of business entities. This layer is still nascent, with significant development needed to achieve a mature and functional state. We believe that full realization of this layer is still years away but the industry is attempting to create this harmonization capability. We note that there is a metrics layer that possibly could be drawn on the above graphic below the governance box. Hence the arrows from governance as it touches pieces below and above.
  • Intelligent Data Apps and Products: The upper layers of the stack, namely data products, agents, and intelligent apps, are seeing new competition as data platforms expand their TAM. Players like Palantir, Salesforce, and Microsoft are advancing their capabilities in this space, creating rich metadata and unified data environments. As Databricks and Snowflake expand their aspirations, they increasingly run into these traditional software companies with products that contain business logic and critical data. Being able to connect to this data is fundamental to building intelligent data apps and these legacy firms are unlikely to cede the market to Databricks and Snowflake.

Bottom Line:

The modern data stack is undergoing a significant transformation, characterized by increasing fragmentation and complexity, particularly in the governance and semantic layers. While foundational elements are established, the competitive landscape is intensifying as companies like Snowflake and Databricks expand their capabilities and face new challengers in the upper layers of the stack. Organizations must navigate these dynamics carefully, leveraging robust governance frameworks and strategic partnerships to harness the full potential of their data ecosystems.

Watch this conversation with visionary data leader Zhamak Dehghani on what’s missing in the emerging data stack.

The Journey Ahead and the Role of Hyperscalers

The transformation of the data landscape is a journey that won’t be completed overnight. As industry leaders like Molham Aref and Zhamak Dehghani have pointed out, this evolution is expected to take three to five years, with numerous challenges and missing pieces along the way. Moreover, we believe the hyperscalers, with their resources and advanced capabilities in machine learning and AI, will play a crucial role in shaping this future.

Key Takeaways:

  • Three to Five Year Journey: The path to a mature data governance framework is long and complex. Key industry figures anticipate significant developments over the next few years but acknowledge the current gaps and challenges.
  • Hyperscalers as Major Players: Over a third of surveyed Databricks and Snowflake accounts recognize the strong ML and AI capabilities of hyperscalers and indicate a leaning in this direction. This positions them as significant influencers and potential disruptors in the data platform ecosystem.
  • Stickiness of Data Platforms: Core data platforms remain deeply entrenched and difficult to displace. While there may be optimization shifts in data engineering and pipeline workloads, the core functionalities are likely to remain stable.
  • Cost and ROI Dynamics: The cost factor currently influences decision-making, but the emergence of AI-driven ROI could alter the landscape significantly, driving further investment and adoption.
  • Confusion and Chaos as Opportunities: The current state of flux presents both risks and opportunities. Companies that navigate this chaos effectively can capitalize on new market opportunities and drive significant value.
  • Future Outlook and Supercloud 8: The ongoing data evolution will continue to be a focal point in upcoming industry discussions, such as Supercloud 8. Innovation and acquisitions by hyperscalers, along with new players transitioning from on-prem to hybrid models, will shape the competitive dynamics.

Bottom Line:

We continue to believe the journey towards a fully realized new modern data stack is ongoing, marked by a blend of opportunities and risks. Hyperscalers, with their advanced capabilities, will be pivotal players in this evolution along with Databricks, Snowflake and their respective ecosystems. The entrenched nature of core data platforms, coupled with shifting cost dynamics and the potential for AI-driven ROI, will influence strategic decisions for customers and shape spending patterns. As the industry navigates this increasingly complex landscape, those who can cut through the noise and leverage data to their advantage will emerge as leaders in the next phase of innovation.

What do you think? How are you handling governance and security of your data? Do you lean toward more integrated and closed platforms like Snowflake because they are ‘safer’ or do you feel that open formats are the way to go and you can manage the governance concerns over time? And where do the hyperscalers fit in your plans?

Please let us know how you’re thinking about the future of data in your organization.

Article Categories

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
"Your vote of support is important to us and it helps us keep the content FREE. One click below supports our mission to provide free, deep, and relevant content. "
John Furrier
Co-Founder of theCUBE Research's parent company, SiliconANGLE Media

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well”

Book A Briefing

Fill out the form , and our team will be in touch shortly.
Skip to content