How to Secure Your Snowflake Credentials in Jupyter and Zeppelin Notebooks
April 16, 2020
Note: This blog post was originally published by Jonathan Sander, Security Field CTO for Snowflake and published on Snowflake’s blog. We’re grateful that Jonathan “likes to call out when partners have done the right thing.”
An interesting aspect of being a security specialist in a cloud data platform company is I often sit in meetings listening to conversations I don’t fully understand until it’s my turn to speak. A few months back, I sat confused as a whole conversation about notebooks occurred. I couldn’t imagine why all these deeply technical folks were writing down all this information in a notebook, and I was doubly confused when they turned and asked me how they should secure these notebooks. What came to mind was a lock like this:
That confusion was cleared up quickly, but it led to many real questions that were not as easily dismissed. How were our customers using notebooks to handle their Snowflake credentials? What was the best practice we could encourage them to use? How were our partners leveraging the best practices we’ve built into the security features of Snowflake Cloud Data Platform? These questions led me through a tour of how notebooks are being used, and they put me at Zepl’s doorstep once I realized the magnitude of the tasks I had taken on.
I like to call out when a partner does things right. So this is the story of how Zepl failed to solve my problem by doing everything correctly.
Notebooks are a Security Breach Waiting to Happen
When I want to know how things are used, the first place I turn is to our customers. After getting a good chuckle out of the fact that I had never heard of notebooks like Zeppelin and Jupyter, the customers I spoke with told me about how they were managing the credentials they use to authenticate their notebooks. The news was quite sobering. Many were saving clear-text passwords directly in the notebooks. Others were using complicated, error-prone processes that amounted to saving an encrypted file on some server, which then contained the password—and everyone had the key needed to decrypt it. Some told stories of notebooks posted to GitHub with passwords visible publicly. Most confirmed that the shared Jupyter system they were using didn’t even make them log in. All of them asked me: “What best practice would you suggest?”
The best practices for Snowflake authentication of people (as opposed to service accounts and unattended programmatic workloads) is to use personal credentials managed by an identity provider (IdP). These credentials would be identities managed by an IdP such as Azure Active Directory, Okta, or other SAML and OAuth powered federation systems. This ensures that when a user is accessing data in Snowflake they are doing so using a context that grants them only the roles and entitlements for their personal user. It also means auditing will show the activity was done by that specific user. Personal credentials and person-level auditing are, of course, security best practices I always encourage. So hearing customers were forced to be so far from that set me on task to see what I could do to help.
The first step in my process is always getting hands on. So I resolved to build out a lab with Zeppelin and Jupyter to reproduce the things Snowflake customers were seeing. The trouble with this is that I always have to squeeze these lab building exercises into windows between customer sessions. As I tried to fit in the time to build this lab out, I kept hitting the typical roadblocks you do when you’re building servers, networks, systems, and platforms on your own. Each roadblock meant walking away to return later and have to restart the whole thing with a new approach to get around the challenge. And this was all before I even had Zeppelin and Jupyter installed in a real-world way.
How Zepl Got Notebook Security Right
That’s when I found Zepl. They were involved with our Sales Kick Off in February, and I spoke to someone from Zepl after seeing Zepl’s product is in the notebook space. I told that person what I was trying to do, and they said they would set me up with a way to do testing. The first good thing that happened was it turned out Zepl’s platform is a SaaS platform. That took the build challenges out of the picture, so I had my first notebook ever staring me in the face. Now it was time to connect it to Snowflake and experience that pain those customer conversions described. Except when I went to add a data source, I saw this:
Zepl has already integrated single sign-on (SSO) into their data sources. I could have picked the “User/Password” option (though you know I never would), but even that wouldn’t be the janky “password in a file” experience that customers described. Zepl manages these passwords as secured entities in their platform. Their SSO leverages the Snowflake OAuth feature, which I always advise customers is the most elegant way to securely connect to other platforms.
This made me wonder how they lock down the front door. Customers had said that their current notebooks were wide open to anyone on their network who knew the URLs. When I checked out those options, I found Zepl had locked that down, too. And they had federated SSO options here as well. At the time, I had only set up the password, but I could now do better things to lock down this little lab of mine if I wished.
To ensure this all worked from an end-to-end point of view, I went ahead and played around with the notebooks Zepl includes for Snowflake. I was able to connect to my lab, hit my favorite table filled with prime numbers, and get results into a nice little table. I also had to google “pandas,” but that’s another story.
As the final kicker, I found that Zepl had linked to the Snowflake documentation for people to learn more about Snowflake’s OAuth features, and Zepl had also built a step-by-step guide in their documentation. Lack of clear documentation has caused more security gaps than any bad guy ever has. So it was good to see all this was in place to help people.
As stated in the beginning of this post, I like to call out when partners have done the right thing. Zepl is in a small club that gets full marks for that, and I’m hoping to see all our other partners follow suit. This leaves me high and dry—and safe—though. So if you want me, I’ll be running a ton of commands in some undocumented voodoo mess to get myself into the same bad state most customers seem to find themselves in. I still need to find something broken for my lab, because Zepl fixed all their stuff before I got there.