The challenges of legacy data warehouses have been well documented. Built on rigid infrastructure and managed by specialized gatekeepers, data warehouses of the past were, as one financial customer once told us, “like a snake swallowing a basketball.”
Slide from Wikibon’s 2014 Big Data Capital Markets Event:
The amount of data ingested into a data warehouse overwhelmed the system. Every time Intel came out with a new microprocessor, practitioners would “chase the chips” in an effort to try and compress the overly restrictive elapsed time to insights. This cycle repeated itself for decades.
Cloud data warehouses generally and Snowflake specifically changed all this. Not only were resources virtually infinite, but the ability to separate compute from storage permanently altered the cost, performance, scale and value equation. But as data makes its way into the cloud and is increasingly democratized as a shared resource across clouds – and at the edge – practitioners must bring a SecDevOps mindset to securing their cloud data warehouses.
This Breaking Analysis takes a closer look at the fundamentals of securing Snowflake. An important topic as data becomes more accessible and available to a growing ecosystem of users, customers and partners. To do so we welcome two guests to this episode. Ben Herzberg is an experienced hacker, developer and an expert in several aspects of data security. Yoav Cohen is a technology visionary and currently serving as CTO at Satori Cyber.
These two individuals have co-authored the book shown above called Snowflake Security. The work is a comprehensive guide to what you need to know as a data practitioner using Snowflake. It’s packed with great information, best practices and practical advice and insights – all in one place.
Security and Data Practices are Colliding
Before we get into the discussion let’s share some ETR survey data to set the context. We’re seeing cybersecurity and data colliding in an important way.
Below are some data points from ETRs latest drilldown survey. ETR asked more than 1,200 respondents – CIOs, CISOs and IT professionals – which organizational priorities would be most important in 2022. The top seven are shown in the diagram.
It’s no surprise that security is #1 – although as we shared in our predictions post the magnitude of its relative importance varies depending on the degree of expertise within the organization – the delta is not as significant in large companies for example.
Analytics and data are prominent in the list and we’ve tied these two domains together. We’re highlighting a term our two guests have used called DataSecOps. Which to us is the idea that you bring agile DevOps practices to data operations. And build in security, from the start, as part of the full cycle of managing the creation, use, access, protection and recovery of data.
As Yoav Cohen points out, it’s also significant that Cloud Migration was the #2 priority on the list as that’s driving changes in operational models. According to Cohen:
This definitely aligns with what we’re seeing on the ground in the market. In the diagram you have cybersecurity and data warehousing. In the middle you have cloud migration. That’s basically what’s pushing companies to invest in security and data and warehousing, because the cloud changed the game for cybersecurity. The tools that we used before are not the same tools that we need to use now. And also, it unlocks a lot of performance value and capabilities around data warehousing. So, all of that comes together to a big trend in the industry for investment, for replacement, and definitely we’re seeing that on the Snowflake platform, which is doing really, really well recently.
Listen to Yoav Cohen comment on the connection between cyber, analytics/data and cloud.
Why are we Always Talking About Snowflake?
Let’s share one more graphic before we dive in with Ben and Yoav. Of course, Snowflake is a hot company, everyone knows that and it shows in their financials. The ETR survey data below tells a similarly compelling story with survey data.
The chart above is from the most recent ETR January survey. The blue line at the top represents Snowflake’s Net Score or spending momentum. The darker line at the bottom represents the company’s presence or pervasiveness in the survey sample. There were 165 Snowflake customers that responded to this survey. Ten percent of companies within the Fortune 500 were in that sample and around 4% of Global 2000 companies were in the Snowflake data set. Just under 30% were C-suite execs and about 20% were analysts, engineers or data specialists; with around 50% in VP, director or manager roles. With a very broad mix of industries and a bias toward larger companies.
The top blue line in the graph is derived using simple math from the data in the inserted box. ETR asks customers each quarter: 1/ Are you adopting Snowflake new in 2022? That’s the 27% lime green; 2/ Will you be spending 6% or more on Snowflake relative to 2021? That’s the 57% forest green; 3/ Is your spending flat? That’s 15% of respondents in the gray; 4/ Is your spending down by 6% or worse? Only 1% in in the pink; and 5/ Are you leaving the platform/defecting? That’s the bright red at 0%.
No defections.
Subtract the reds from the greens and you get Net Score, which calculates out to 83% for Snowflake in this past survey. What’s remarkable is that Snowflake has held this elevated score for more than 12 quarterly surveys. It is in the stratosphere among the many thousands of companies that ETR tracks. Remember as well, anything above 40% on the vertical axis is considered elevated Net Score and Snowflake is glued to the ceiling.
The greenish brown line in the graph shows the company’s market presence in the data set. It continues to grow and the green shaded area emphasizes that it’s pace this last quarter is accelerating.
Snowflake is becoming ubiquitous and customers are becoming intimately familiar with its platform. Snowflake is scaling like we’ve never seen before and is building a hard to penetrate fortress with its product, ecosystem and execution.
Broadly, Ben Herzberg attributes this momentum to five main factors:
- Simplicity and brand promise. Snowflake performs as advertised– out of the ‘box;’
- Very rich support of capabilities and features needed in a cloud data warehouse;
- Multi-cloud support reduces dependencies on a single cloud;
- It’s fast and scalable with no worries about infrastructure and heavy maintenance lifting;
- Fast pace of innovation – e.g. many security and governance features, moves to support unstructured data, etc.
Listen to Ben Herzberg talk about why Snowflake customers are rapidly adopting the platform.
Next we pivot to the deeper dive in Snowflake security. We asked the experts to comment on several questions, summarized below:
Question #1. Snowflake already gets high marks on security so why does there need to be a book on the subject?
The answer comes down to the need to understand the nuances of Snowflake in the cloud’s shared responsibility model and how to apply best practices in this environment. According to Cohen:
Snowflake is investing in and putting a lot of emphasis on security. However, it’s connected to the cloud, and like any other cloud service, there is a shared responsibility model between Snowflake and its customers when it comes to fully securing their data cloud. So Snowflake can build amazing features, but then customers have to really adopt them, implement them in the best way. One of the things that we’ve seen by working with Snowflake customers is that we typically interact with data engineers, but then they have to implement security features and security capabilities. We thought writing a book about the topic would help these customers to understand the features better, benefit from them better and really structure their implementation and decide what’s most important to implement at every step of their journey.
Listen to Yoav Cohen explain why he and his colleague wrote the book on Snowflake Security.
Question #2. What are the basic fundamentals of securing Snowflake?
We wanted to explore this topic because in a world of flexible and globally distributed data, where democratization is a major theme, data how do you really make sure only those folks that should have access do have access?
According to Ben Herzberg it comes down to many common sense items like limiting network access with a few simple commands. This will significantly lower your security risk and improve your compliance posture. Further understanding which applications access Snowflake and how are they gaining access? If it’s by password, rethink that and use a key instead. Are users accessing Snowflake with usernames and passwords? Change that to an identity system like Okta. And there are many other areas discussed in the book that go into these fundamentals in great detail, from configuring, monitoring and auditing Snowflake security.
Listen to Ben Herzberg talk about the fundamentals of Snowflake security.
Question #3. Don’t these fundamentals apply to any environment…what’s unique to Snowflake?
The answer according to Cohen is yes and no. Sure basic security hygiene is important in all environments but as data moves to cloud data warehouses generally and Snowflake specifically, policies are becoming more dynamic. More sophistication around authorization and more fine-grained controls are necessary and available in Snowflake. Here’s Cohen’s explanation in detail:
A couple things to consider. First of all, we love to say that it’s 80% good security hygiene. You have to make sure that your basics are locked and tightly configured and that brings a lot of value. But two points to consider, first of all, all of these types of [standard] controls are pretty static in the sense that once you get in, you get in, and then you have pretty broad access; and we can talk about authorization concepts. But these [standard practices] are really static gatekeepers around your data. Once you have access, then it’s really free for all. When you compare it to other types of environments and what we’re seeing in other domains, maybe a move to more dynamic type of controls, elevated access or elevated additional authentication steps before you get elevated access. And what we’re thinking is that beyond those static controls, the market is going to move towards implementing more dynamic, more fine-grain control, especially because in Snowflake, but any other data warehouse or large-scale data store, which becomes an aggregation point of data in the company, and we work with really big companies, and they bring in data from multiple jurisdiction from across the world, so they can get an overview of the business and run the business in a much more efficient way, but that really creates a pressure point when it comes to securing that data.
Question #4. Coming back to the Snowflake specifics and the shared responsibility model. Snowflake talks about a three layered security approach: Network, identity access & encryption. Can we dig into each of these areas and better understand the responsibilities of the Snowflake customer?
Configuring Network Security – What’s the Starting Point?
Let’s start with the network. The customer is responsible for things like setting up the DNS, deciding the level of public Internet access for other apps and users. Herzberg says there are two high level areas Snowflake customers should be focused on with regard to network security:
- Setting the policy to limit network access to your account;
- Consider configuring the network with a private link to the cloud environment.
Listen to Ben Herzberg explain the basics of the network shared responsibility model for Snowflake.
Identity Access – Avoiding “Hierarchy Hell”
With identity you have to worry about things like setting up roles and managing users and possibly configuring row and column based access. Setting up roles can get tricky – especially when you’re crossing domain identities and setting up hierarchies. Complexity is the enemy of good security and customers have to be careful about setting up complex hierarchies. Cohen explains hierarchy hell:
Hierarchy hell, in the book says that you can use hierarchy, but you should avoid getting to a hierarchy hell. Basically, we’ve seen that with several Snowflake customers with the ability to set roles in a hierarchy model, setting a role that inherits privileges from another role, that inherits privileges from other roles and maybe, of course, used in a good way, but it also in some of the cases, it leads to complexities and to access not being deterministic, at least not obvious to the person who gives access, who is usually the data engineer. So, whenever you start having a complex authorization model, whenever I want to give Yoav access to a certain data set, and because things are complex, I also, by mistake, give him access to the salary information of the company, that’s when things become tricky. If your roles are messy and complex, then it may lead to data exposure within the organization or outside the organization.
Listen to Ben Herzberg explain hierarchy hell.
Encryption in Snowflake – Basic to Advanced Levels
For many companies, encryption in Snowflake is pretty straightforward and doesn’t require a lot of responsibility for the customer. Snowflake encrypts everything in motion and rotates keys every 30 days. So many companies really have to just monitor things and make sure they’re in compliance and have good log data. But it depends on the degree of sophistication required. As Herzberg explains:
This really depends. So, for the average company, I would say, yes. For some of the companies with higher security requirements or compliance requirements or both, sometimes there are issues like companies that do not want to have the data stored in clear text, in Snowflake, even encrypted as in the data warehouse encryption or the account encryption, even if someone accidentally gets access to the table, they want them not to be able to pull the data in clear text, and then it gets slightly more complicated. You have different ways of tackling this, but for the average company or companies who do not have such requirements, then everything in Snowflake is encrypted in transit and addressed, and of course, there are more advanced features for higher requirements.
Question #5. What are some of the more vulnerable aspects of Snowflake? If you were a hacker, where would you attack first?
In addition to phishing scams and other user vulnerabilities, we wanted to understand if there were any area where customers should be extra cautious. Yoav’s Cohen’s answer starts with essentially follow the data:
I would start with where data resides. And, if you look at the Snowflake architecture, there’s a separation between storage and compute, but that also means storage is accessible without going through the compute. That can create opportunities for hackers to go and try and find access where access shouldn’t be had. That’s where I would focus on.
Listen to Yoav Cohen explain where a hacker would likely focus on finding Snowflake vulnerabilities.
Question #6. Does the multi-tenant nature of the Snowflake Data Cloud increase security risks? Should customers use Virtual Private Snowflake (VPS) to reduce exposures?
Herzberg doesn’t believe that multi-tenancy inherently increases exposures and feels most companies don’t need VPS. He feels there are more optimal and cost effective approaches to mitigating risks.
Virtual Private Snowflake is Snowflake’s highest security level. It is designed for organizations with the most stringent requirements (e.g. regulated industries like healthcare and financial services. VPS isolates the Snowflake environment from all other Snowflake accounts and shares no resources outside of the VPS account.
Herzberg summarized his thoughts as follows:
To the best of my knowledge, Virtual Private Snowflake is used by a minority of the customers, a small minority of the customers. There are other more popular ways within Snowflake, like private link, for example, to enhance your security and your account segregation. But I wouldn’t say that simply because the platform is multi-tenant, it is vulnerable. Of course, in many cases, your security or compliance needs require you to eliminate even this risk; but I would say that there are a lot of other platforms in different areas that are multi-tenant and probably more secure than many on-premises environments.
Listen to Ben Herzberg and Yoav Cohen comment on multi-tenancy and its relative risk.
Question #7. Will new functionality like support for unstructured data or adding data science use cases create new attack vectors for hackers?
Snowflake rolls out new functions at a rapid pace. Its CEO has prioritized investments in engineering since his first days on the job and that is translating to rapid rollout of new functionality. We wanted to understand if the pace of new feature rollouts and TAM expansion moves create new opportunities for hackers…and how will customers deal with this?
According to Cohen, while new capabilities may create a greater threat surface, the techniques to address them will be similar. It’s more likely a case that as customers tap these new areas of development they will perhaps apply security features they haven’t previously deployed. The biggest trend to watch will be the democratization of data and that will require greater diligence and focus by organization.
Cohen explained as follows:
I would say that Snowflake is moving fast with adding new functionality– fast, but not too fast. They’re releasing it in a controlled way. I would say that for new capabilities, of course, in some cases there are new attack vectors or new risks and obviously, securing different types of data may bring new challenges, but the basics, I think, remain the same. The basics of the network, identity authentication, authorization and auditing monitoring. I would say they will be the same and perhaps new features or capabilities will need to be used. And the largest issue, as data democratization is growing within organizations, and more and more people are using your data cloud, that also needs to be addressed.
Listen to Ben Herzberg talk about new attack vectors and how customers will address them.
Question #8. Snowflake is building what we call a supercloud, a layer that adds value above the hyperscale infrastructure and across clouds. How will that impact the way organizations will approach DataSecOps?
Let’s talk futures. In the book, Cohen and Herzberg discuss multi-cloud as a way to reduce reliance on a single vendor and that’s all good. But we’ve been using the term “Supercloud” as a reference to an abstraction layer that exists on top of multiple clouds and hides some of the underlying cloud complexity and we feel Snowflake is a good example of that– building value on top of all the hyperscale infrastructure and across clouds. We wanted to understand how this might affect the way companies think about DataSecOps.
Here’s what Yoav said:
Definitely, we also see the trend of companies adopting more and more types of cloud and cloud technologies. They’re in one cloud today. They want to move to a second one, almost every company that I talk to have, nowadays, a multi-cloud strategy. With respect to Snowflake, they basically have it figured out, because they are an overlay, like a supercloud, super data cloud, that is spread across any cloud, and you can basically pick and choose where you want to put your data for what use cases, and that’s really, really helpful, because then you don’t have to manage the complexity of multiple solutions for multiple areas of the business. We see this also in other areas where companies are saying, “Hey, I prefer to not use a specific cloud technology for that purpose, but use a vendor that can cover my needs across the clouds.” Definitely on the security side, where they want one throat to choke, so to speak, but they want to control things on a central place. As Ben mentioned before, complexity is the enemy of security and having those multi-cloud operations, from a security perspective, definitely adds complexity, which adds risks, so simplifying that is really, really helpful.
Thanks for Yoav Cohen and Ben Herzberg for participating in this Breaking Analysis. Here’s Yoav Cohen giving a quick explanation of what their company Satori does.
Keep in Touch
Remember we publish each week on Wikibon and SiliconANGLE. These episodes are all available as podcasts wherever you listen.
Email david.vellante@siliconangle.com | DM @dvellante on Twitter | Comment on our LinkedIn posts.
Also, check out this ETR Tutorial we created, which explains the spending methodology in more detail.
Watch the full video analysis:
Note: ETR is a separate company from Wikibon and SiliconANGLE. If you would like to cite or republish any of the company’s data, or inquire about its services, please contact ETR at legal@etr.ai.
All statements made regarding companies or securities are strictly beliefs, points of view and opinions held by SiliconANGLE media, Enterprise Technology Research, other guests on theCUBE and guest writers. Such statements are not recommendations by these individuals to buy, sell or hold any security. The content presented does not constitute investment advice and should not be used as the basis for any investment decision. You and only you are responsible for your investment decisions.