With Rob Strechay and Erik Bradley
Three main pressure points are transforming the modern data landscape: 1) Increased interest in adopting open table formats to allow any compute to operate on any data; 2) The point of control is shifting from the DBMS to the governance layer; and 3) AI is enabling the emergence of a new class of intelligent data apps that promise to radically automate business processes and drive unprecedented levels of productivity.
Databricks and Snowflake are two firms locked in what sometimes appears to be an internecine battle. But our research shows in fact, that their fiercely competitive nature is adding value for customers in ways that increases innovation while at the same time supporting corporate edicts around AI safety, governance and data security.
In this Breaking Analysis, we co-author with our partners at ETR and share fresh results from a survey of more than 100 joint Databricks and Snowflake accounts. The data we present will show that customers remain conflicted on how to best rationalize their need to balance data trust with the desire to move fast and innovate. And as such, the war of the data roses may not be a zero sum game.
Survey Respondent Profiles and Some Quick Takes
Let’s first set the table and share the demographics and intent of the survey with some relevant tidbits.
Our goal was to survey joint Databricks/Snowflake customers to assess how they’re using each platform, their attitudes toward the vision of each company and how they were thinking about governing open data generally. An N of 105 is not as big as ETR’s quarterly survey, which is in the thousands, but it didn’t need to be for this instance. In this survey we captured 29 of the F500 and 50 G2000 respondents, across a mix of industries, with a clear bias toward larger firms with bigger budgets.
Some other quick tidbits: Ninety-six percent (96%) of these of these joint customers are deeply involved in the data platform decision-making. The other 4% weren’t involved in decision-making, but they used the platforms on a regular basis. There were a lot of other data platforms in use at these firms, including Azure, Synapse (37%) Amazon Redshift (35%), Google BigQuery (32%) and a number of data mesh-like Lakehouse platforms (22%) representing platforms such as Starburst/Trino, Dremio, Presto Ezmeral, etc.
In addition, 80% of the respondents are using some type of on-prem and hybrid platforms. The dominant ones were Microsoft SQL Server, Oracle, MongoDB, Couchbase, SAP HANA, several MySQL variants, Teradata and PostgreSQL.
Security and governance are primary decision points for these customers with 70% saying they don’t make platform decisions without considering governance. And then a big number, 48% of the respondents, said they plan to shift in some way their Databricks or Snowflake, AI, ML, Data mix and spending
The prominence of BigQuery is notable. We weren’t surprised with the amount of Microsoft or Redshift, but the presence of BigQuery stands out. Recent ETR survey data shows momentum for Google, not just in cloud, but also on the data stack and in ML/AI in particular.
Work Running on Databricks and Snowflake Platforms
Let’s take a look at what these joint customers are doing with their data platforms.
We asked customers what they were doing on these platforms and the use cases are shown above. Not surprisingly, Snowflake is more prominent in data warehousing and storage, while Databricks’ historical strength in AI/ML shows up significantly. More so even than Snowflakes lead in data warehouse despite what is Snowflake’s history there. While the emerging control point of data governance catalogs is in an early transition, the use of these respective platforms for governance catalog management is comparable. While the point of control is shifting toward governance catalogs, the source of value is moving toward building data apps. And this new locus of value we discussed in our intro, while favoring Databricks somewhat, is the next big race to value. There’s a slight advantage there for Databricks but other than the two areas we’ve highlighted it’s pretty neck and neck.
And, not Or
ETR’s quarterly survey of nearly 1,800 respondents shows that the overlap in accounts between Databricks and Snowflake is significant. That survey comprises approximately 300 Databricks and 500 Snowflake accounts. The data from that survey shows about 60% of Databricks accounts also run Snowflake and about 40% of Snowflake accounts have Databricks installed. As such, the the idea that the competition between Databricks and Snowflake is a zero sum game may be overstated. Instead, despite fierce competition, among these respondents at least, we believe these companies are currently complementary players and likely to remain in place for some time.
Customer Somewhat Databricks Leaning in Gen AI
Because we feel AI is enabling a new wave of data apps, and is an important new vector of value for customers, we wanted to gauge how respondents think about the generative AI capabilities of Databricks and Snowflake. The chart below asks respondents to what degree they align with Databricks’ and Snowflake’s approach and how they see the hyperscalers, specifically as it relates to building their Gen AI applications.
The data above shows the percent of customers on a scale of 1-5 that chose 4 or 5 – i.e. agree or strongly agree – with the conditioned response statements. And you can see more than 65% lean Databricks, about half Snowflake, but interestingly more than one third said the hyperscalers have more capabilities than either of these firms.
Of course, these are joint Databricks and Snowflake customers so it’s no surprise those two are favored but the prominence of the hyperscalers is notable.
There are two additional points we’d make: 1) The question was written specifically about generative AI applications. So lust knowing what we know about these two companies, it’s not surprising to see this particular question, as worded, lean towards Databricks more than Snowflake, because of Databricks’ historical strength in machine learning and AI; and 2) These companies are so busy competing with each other and get so many headlines that we sometimes forget about a common “frenemy” in the hyperscales. Here, 34% of their joint customers actually said the hyperscalers have more capabilities and by inference, could be leaning more toward cloud vendors for building intelligent data apps.
Placing Bets on ML/AI Dominance
We further pushed the companies to tell us if they had to back a company to dominate AI, where would they place their bets? Highlighted in the red is the percent of customers that feel each firm is somewhat likely to dominate and the green is very likely to dominate. Add the reds and the greens and it’s 48% for Databricks and 21% for Snowflake, with 30% saying they’re equally matched.
The reality is that the green highlights tell us that neither firm is likely to dominate, given the competitive nature of the market, its fast pace and the resources of hyperscalers. But the perception among these joint customers is Databricks at this time has the edge. Again, we note that the question was asked specifically about ML/AI, so it’s not surprising that customers are Databricks leaning. The fact that 30% of the respondents are “undecided” implies there’s enough of a swing vote here that it’s unlikely either player will emerge as a single dominant force in the ML/AI space.
Not a Zero Sum Game
We kept poking at this issue to see if customers are planning to shift spending to/from these platforms, and if so how. Remember we’ve been reporting for a while that anecdotal customer data tells us that some customers are moving some work, particularly the data engineering and data prep work, outside of Snowflake because they feel it’s too expensive. The chart explores how customers are thinking about changing horses and to what degree.
The question posed “do you have any plans to consolidate your AI and data ops onto either Databricks or Snowflake within the next 24 months.” You can see above, 28% plan to make some form of shift toward Databricks and 19% plan to make some form of shift to Snowflake, with 44% saying they have no plans to change. Notably, only 4% of the respondents said they were phasing out Snowflake and completely moving to Databricks and only 2% said they’re completely phasing out Databricks and moving to Snowflake.
You may be wondering if this data flies in the face of our politically correct premise that it’s not a zero sum game between these two platforms. Perhaps. These two companies are clearly going after similar budgets. To this end we would make the following points:
- To our earlier commentary, this is a small sample. But it is statistically significant that there is some movement.
- The 24% moving more toward Databricks and 17% toward Snowflake shows a clear leaning.
- However it’s not definitive and the complete moves away from either platform are nominal.
- More than half the sample has no plans to change or is unsure, so there is plenty of time to sway buyer behavior in this fast-moving market.
Key Issues Driving Sentiment – Security and Governance Above All
Next we wanted to assess the degree to which customers agree with a variety of statements that we posed. In the interest of time we’re just going to focus on a few items and let our audience digest the full scope of data at their leisure. The chart below shows the respondents indicating a 4 or 5 meaning they agreed or strongly agreed with the statement listed.
We call your attention to the the following:
- The first two bars, 86% and 76% respectively said security and governance were the first order decision points.In this era of AI safety, security, privacy and governance are in the driver’s seat.
- The next two bars, 54% and 50%, tell us that customers don’t want to be locked in and they plan to use open table formats.
- That data is juxtaposed with the next bar at 45% representing those respondents who say they want to consolidate their data into a single stack and are willing to sacrifice tools diversity.
- All the way to the right we have this smaller 14% saying security is less important to us than rapid innovation. And we call this the “Swing Vote,” which we’ll explain in detail in the next section.
Importantly, this analysis reveals a nuanced landscape of preferences among data platform customers. The survey provides a rich source of sentiment data at this point in time and allows us to identify key factors driving user decisions. Preferences in the data tool market are complex, multifaceted and often counterpoised. By understanding the motivations and priorities of different user groups, sellers can tailor their strategies to win over the crucial buyer cohorts that can further drive adoption.
Factor Analysis: Why the 14% ‘Swing Vote’ Matters
This next section underscores where theCUBE Research and ETR really differentiate from your every day analyst boutique and so-called survey house. Not only because of access to vibrant communities but also deep domain expertise from theCUBE research, combined with the data science expertise of ETR to perform relevant and deeper analysis.
The ETR data team noticed two clusters of people in the survey. W’re calling them cluster one and cluster two. Cluster 1 prioritizes security and governance over innovation and scores low on Cluster 2. Cluster 2 prioritizes innovation over security and governance and scores low in Cluster 1. ETR then analyzed which factors were going to be the ones that were actually swinging people to move.
What you’re seeing above is that that 14% that we highlighted before, that said that security is not as important as rapid innovation, those people were much more likely to lean towards Databricks. Inversely, on the other side here, you saw the people that were much more worried about security and consolidating their tech stack were much more likely to move towards Snowflake.
The takeaways for the vendors here are for Snowflake, they need to find a way to go ahead and address those people that aren’t security and governance first, but are actually more interested in a rapid pace of innovation because right now, that’s the area where their sentiment isn’t aligned to the marketplace. The reverse is true for Databricks. To the extent customers are bound by security, privacy and governance edicts, Databricks must prove that open source tooling can match the deep integration of Snowflake. That 14% cohort, what we sometimes call the “Data Rebels,” are willing to take a chance and can be pioneers to pove out open source governance models.
Open Table Format Adoption is Low but Shows Promise
Let’s shift gears and get into some of the data on open table formats, a key discussion point in the data community over the past couple of years. We asked respondents how they’re adopting open table format and we show that below.
While only 15% are using open table formats today, 70%, indicate they plan to evaluate and may use, intend to use, in the near future or are already using them.
Mix of Open Table Formats…Iceberg Shows the Most Potential
Drilling down, below we show which formats are in use today and what’s planned in the next 6-12 months.
Our analysis suggests that while excitement surrounds open-source technologies like Iceberg, adoption may be slower than expected. Large organizations, which comprise a significant portion of the market, tend to be cautious in embracing open-source solutions.
In particular, the data shows that adoption of open table formates is all over the map with “Other” as the largest category. In there is Parquet and very possibly some answers that weren’t specific to any formats, like “AWS” for example. Hive stands out, despite all its criticisms of being designed for batch, having latency issues and all its reported limitations. Hive is considered inexpensive and it can handle lots of data. It confirms there’s there’s still a lot of Hadoop out there.
The other takeaway is plans to use Iceberg exceed all other formats and that is validation of all the enthusiasm we hear from customers around the format. It’s clearly catching the attention of data communities and is a big reason why Databricks acquired Tabular for an estimated $2B or more. Tabular for those who don’t know is the company whose founders created Apache Iceberg. Databricks is on a mission to technically integrate all these open table formats. To our earlier point, the achilles heel of open table formats is governance. A splintered market creates governance complexity and to the degree that Databricks can unify these formats it can make interest in open data a tailwind.
Governance Concerns Keep the Market Fragmented
While the allure of open table formats is strong, when we ask customers ‘how you’re going to govern the data that resides in Iceberg?’ for example, they really don’t have a clear handle on it. So we wanted to go deeper and ask some questions around this topic which we show below.
Above is another set of conditioned response questions that force respondents into buckets. It’s no surprise based on the previous conversation that 37% of respondents said governance comes before open adoption Twenty-nine percent (29%) of respondents said they pick the right platform for the right job and know how to manage silos. Another 26% say we prefer proprietary because it’s more integrated and trusted and then just under 10% say they’re all in on open table formats and will figure out the governance over time. These are the Data Rebels and, as explained earlier, they are a small but potentially important cohort.
Key Takeaways:
- Despite enthusiasm and announcements from Snowflake and Databricks, adoption of open-source technologies may be gradual.
- Iceberg has the highest plans for evaluation and adoption, but its current usage lags behind other options.
- Governance of data in open-source formats like Iceberg remains a cricital concern for customers.
Market Movement:
- The market is evolving towards open-source technologies, with Databricks heavily invested in this vision.
- Snowflake is also committed to supporting Iceberg, driven by community demand.
- The adoption curve for open-source technologies often follows a slow arc, as seen with other technologies like open telemetry.
Our research indicates that while open-source technologies like Iceberg hold promise, their adoption will likely be gradual. Vendors and customers must address governance concerns and other challenges to fully realize the potential of these solutions.
More fragmented and slower moving adoption of open table formats would be more advantageous to Snowflake.
Governance Catalogs Become the New Point of Control
We’re going to go into a bit of detail here for those that don’t follow this market closely. It’s important in our view to cover what’s happening with governance catalogs because as we said it’s an emerging point of control.
What is a Governance Catalog?
A governance catalog is a repository that captures and manages metadata about an organization’s data assets. A governance catalog has technical metadata, which contains things like names of tables, columns and data types. This is important for technical people when they’re doing data integration and transformation. A catalog can also contain operational and business metadata describing lineage, business rules and things like role based access controls.
What’s happening is while the point of control is shifting to governance catalogs, the value isn’t necessarily going with it because much of this functionality is being open sourced as we saw recently with both Snowflake and Databricks at their customer conferences.
In June of this year, at its Summit, Snowflake announced that it was open sourcing Polaris, a technical metadata catalog. Horizon, Snowflake’s built-in governance solution remains closed source and contains all the high value governance capabilities like role based access controls and compliance features. In response to Polaris, Databricks one week later at its Summit, announced it was open sourcing Unity – which includes the entire scope of a governance catalog including the technical, operational and business metadata. It also announced the week of Snowflake’s conference that it was acquiring Tabular.
Unity has the Early Advantage but It’s not Definitive
Polaris Adoption
Despite all the noise around open sourcing these governance platforms this past June, a big chunk of these joint customers is not adopting from either of these two firms…at least at this time. We asked customers, did you know Polaris was open source?
- Thirty-nine percent (39%) said ‘no.’
- Eight percent (8%) said they’re excited to learn more about Polaris.
- Sixty-nine percent (69%) said they’re unlikely to use Polaris.
These are important KPIs because if Snowflake can convince a larger number of customers to use Polaris, it will increase its chances of getting folks to stay or come into Snowflake to use Horizon. Remember, Horizon requires you to be inside of Snowflake. Polaris doesn’t.
Unity Adoption
Unity was announced by Databricks at its summit in 2021 and has been iterating since that time.
- Forty-four percent (44%) of the respondents were not aware that Unity was open sourced in June.
- Forty-seven percent (47%) are currently using Unity, so pretty decent adoption.
- Among these 47%, 43% plan to increase their use of Unity.
- Among those not using Unity, 24% plan to increase its usage.
- Forty percent (40%) of the entire survey base said they’re unlikely to use Unity.
We’re throwing a lot of numbers here, but the bottom line is despite all the hoopla about open sourcing these governance platforms, a big chunk of these joint customers are not adopting open sourced governance tools from either of these firms, at least at this time. There are a lot of governance options out there. AWS has Glue and also it offers Data Zones, which has governance capabilities, Google, Alation, Collibra, IBM…and Microsoft is switzerland with Purview – essentially trying to be a catalog of catalogs, which is interesting.
The point is, based on this data, the market for governance remains fragmented and that is an advantage for Snowflake. To the extent that governance remains risky, Snowflake’s value proposition of – bring it all together into Snowflake and we’ll do the hard part – will remain compelling.
The Field is Crowded and the Stakes are High
Both Snowflake and Databricks are delivering significant value to their joint customers. We’re seeing Snowflake perceived as the safe bet and Databricks a the leader in AI innovation. The big question is the degree to which open source governance is going to actually mature; and how well each firm can deliver on the AI developer experience.
The ideology of ‘don’t bet against open source’ will be tested in this market because governance across a broad estate is such a hard problem.
You have what we called the “swing votes” from the data rebels that prioritize innovation first. This could be a small but vocal minority as they say. So that’s something that we’re going to watch. Open table formats, as we said, while alluring, the jury is still out on how to govern them and how well open source is going to be able to cover those customer’s needs, especially those large financial institutions, the large healthcare companies, which are big cohorts for both Databricks and Snowflake. While Databricks and Snowflake are locked in this battle, as we said, the hyperscalers are a wild card with lots of capabilities, lots of GPUs and big balance sheet. And they house much of the world’s critical data.
Our data-first, collaborative research reveals a insights at a snapshot in time for the Databricks and Snowflake landscape.
Key Customer Feedback
- Both Databricks and Snowflake offer unique value propositions, and their joint customers see benefits in using both.
- The rivalry between the two companies will drive innovation and benefit end users.
- Both companies are well-positioned in the larger data market, with Databricks and Snowflake ranking third and seventh, respectively, in ETRs Net Score, a measure of spending momentum.
- While Databricks has the lead in ML/AI, Snowflake debuted recently in that sector with a 54% Net Score, well above 40%, considered an elevated level.
Bottom Line
Our research suggests that the competition between Databricks and Snowflake may be beneficial to the industry, driving innovation and delivering value to customers. Rather than a zero-sum game, the focus on each other may elevate the overall market, ultimately benefiting end users.
Databricks message of “don’t give your data to any vendor, including us” is resonating with customers and pressuring Snowflake to open up its platform. On the other hand, governance remains a sticking point and market fragmentation, confusion and risk will confer advantage to Snowflake’s integrated approach.
These two leaders have set the mark for the modern data platform. At Supercloud 7 next week in Palo Alto, we’ll hear from data leaders including execs from Databricks and Snowflake. We’ll test our assumptions, share more insights and further advance our scenarios.
Please let us know what you think, how you see the market shaping and where you’re placing your data bets.
Image DALLE