Snowflake is a cloud based Software as a Service data platform. Snowflake’s architecture consists of three key layers:
- Database Storage: When data is loaded into Snowflake, Snowflake reorganizes that data into its internal optimized, compressed, columnar format. Snowflake stores this optimized data in cloud storage.
- Query Processing: Query execution is performed in the processing layer. Snowflake processes queries using “virtual warehouses”. Each virtual warehouse is an MPP compute cluster composed of multiple compute nodes allocated by Snowflake from a cloud provider.
- Cloud Services: a collection of services that coordinate activities across Snowflake. Services managed in this layer include:
- Infrastructure management
- Metadata management
- Query parsing and optimization
- Access control
Snowflake offers 30-day free trial and I’m going to use that to test it:
Snowflake is provided as Software-as-a-Service (SaaS) that runs completely on cloud infrastructure. Snowflake is supported on multiple regions on Azure, AWS, and GCP. I’m going to work with GCP:
Once registered, a confirmation email with a link to activate is sent:
The first time you connect, you need to change your password:
Once logged in the SnowSight, the dashboard is displayed (I switched to classic):
There are five main flows to consider:
- The connection between the Snowflake driver/connector and the Snowflake account URL (1)
- The connection between the Snowflake driver/connector and one or more OCSP providers (2)
- The connection between the Snowflake driver/connector and the Snowflake Internal Stage (3)
- The connection between the Snowflake service and the customer-owned cloud storage (4)
- The connection between the users’ browsers and the Snowflake Apps layer (5)
Those flows are graphically represented on the diagram below:
Snowflake allows users to connect to the service from any computer or device IP address by default. A administrator can create a network policy to allow or deny access to a single IP address or a list of addresses.
- A network policy is not enabled until it is activated at the account or individual user level.
- Only a single network policy can be assigned to the account or a specific user at a time.
The diagram below shows the use of private connections for pattern (1), (2), (3), and (4). Snowflake appears as a resource in a network, but the traffic flows one-way from the customer VPC to Snowflake VPC over the CSP networking backbone.
- This feature requires Business Critical (or higher)
- Contact Snowflake Support and provide a list of your Google Cloud
<project_id>values and the corresponding URLs that you use to access Snowflake with a note to enable Google Cloud Private Service Connect.
- In a Snowflake run the SYSTEM$GET_PRIVATELINK_CONFIG
use role accountadmin; select key, value from table(flatten(input=>parse_json(system$get_privatelink_config())));
- Create a NLB pointing to the private link:
- Update DNS settings: All requests to Snowflake need to be routed through the Private Service Connect endpoint
After testing the Google Cloud Private Service Connect connectivity with Snowflake, you can block public access