Snowflake — Cost Optimization Framework — Part 1 of 3: Understand

Madhivanan Anbalagan
8 min readJan 28, 2023

In this 3 part blog series, I’ve collated the key elements to understand and some of the highly recommended best practices to follow while dealing with cost optimization in Snowflake.

Introduction

As the customer base of Snowflake Data Cloud is growing rapidly, it becomes more essential to improve the literacy of the Snowflake best practices & recommendations amongst the data team.

We cant ignore the fact that few of those organizations may be onboarding Snowflake Data Cloud due to the “bandwagon” effect. Apparently, it is the responsibility of the data teams (Data Consultants, Solutions Architects, Data Engineers, etc.) to ensure that the customer gains more maturity on the Snowflake Data Cloud and to help them realize the true business value of moving to Snowflake. Establishing that realization would further strengthen their confidence on Snowflake and they eventually tend to build more solutions on Snowflake, driving significant business value, while making Snowflake as their organization’s single source of truth when it comes to data.

One of the key elements that tangibly showcases the value of Snowflake to the customer is “cost”. Ultimately, the question from the customer would be “How much value does Snowflake help me derive for the💲I spend?”

Cost Management

My strong recommendation is to follow the 3 stage framework, for an effective cost management journey in Snowflake.

3 Stage Framework

This article details everything related to Stage 1 — “Understand”.

I. Snowflake Editions

Snowflake is offered in 4 Editions to choose from, according to the organization’s business needs and use-cases.

  • Cost considerations start with the process of selecting the right “edition” based on the workload needs, as the consecutive editions come with additional features with an added premium over the previous one.
  • Second factor that impacts the cost is selecting the right “region” based on the geographical requirements of the organization. There is no generic right or wrong region at all. It always boils down to the specific use-cases and requirements such as regulatory compliances, etc.
Sample pricing for AWS US East (Northern Virginia) | Source — Snowflake Pricing

Choosing an edition in the beginning is not carved in stone. Changing the edition as the organization’s usage & needs change, is easy.

II. Cost Contributing Factors

To know the factors that contribute to cost in Snowflake, all we need to recall is the Snowflake Architecture itself. Basically, these 3 layers of the Architecture are directly attributed as the primary factors of cost in Snowflake.

Snowflake Architecture

Below are the 5 factors that incur costs in Snowflake.

*Cloud Services & Serverless Compute Resources would fall under the Compute layer costs. We don’t really have full control on these resource allocation. For this article’s sake, let us treat them as individual cost drivers.

Let’s understand each of them in detail.

1. Storage

When setting up a new Snowflake Account, one of the first steps is to choose the appropriate Storage plan according to the needs — On-Demand or Capacity.

As Capacity plan is offered at a significant discount (almost 50% as shown in the below example), it is obviously the preferable choice for any project. But choosing the right capacity level, close to the ground reality, is key to high efficiency in terms of cost savings.

AWS — US East (Northern Virginia) Storage Cost as of Jan 2023.

In Storage layer, consumption can be categorized into 3 buckets as below.

Storage Categories
  • Internal Stages — where the files reside within the Snowflake environment in Named Internal Stages/User Stages/Table Stages.
-- Below view returns daily average STAGE storage information
-- field: AVERAGE_STAGE_BYTES
SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.STAGE_STORAGE_USAGE_HISTORY;

SELECT *
,ROUND(AVERAGE_STAGE_BYTES/1024/1024, 2) AS AVERAGE_STAGE_MB
,ROUND(AVERAGE_STAGE_BYTES/1024/1024/1024, 2) AS AVERAGE_STAGE_GB
FROM SNOWFLAKE.ACCOUNT_USAGE.STAGE_STORAGE_USAGE_HISTORY
ORDER BY USAGE_DATE DESC;
  • Database Storage — this would ideally be the biggest chunk of Storage, which includes both active data & time-travel data.
-- Below view returns DATABASE level daily average storage information
-- field: AVERAGE_DATABASE_BYTES
SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.DATABASE_STORAGE_USAGE_HISTORY;

SELECT *
,ROUND(AVERAGE_DATABASE_BYTES/1024/1024, 2) AS AVERAGE_DATABASE_MB
,ROUND(AVERAGE_DATABASE_BYTES/1024/1024/1024, 2) AS AVERAGE_DATABASE_GB
,ROUND(AVERAGE_DATABASE_BYTES/1024/1024/1024/1024, 2) AS AVERAGE_DATABASE_TB
FROM SNOWFLAKE.ACCOUNT_USAGE.DATABASE_STORAGE_USAGE_HISTORY
ORDER BY USAGE_DATE DESC
,DATABASE_NAME;

-- Below view returns TABLE level daily average storage information
-- fields: ACTIVE_BYTES, TIME_TRAVEL_BYTES, FAILSAFE_BYTES & RETAINED_FOR_CLONE_BYTES
SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.TABLE_STORAGE_METRICS;
  • Fail-safe Storage — fixed 7 days of fail-safe data retention for Permanent Tables, post exhaustion of the configured time-travel period.
-- field: AVERAGE_FAILSAFE_BYTES
SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.DATABASE_STORAGE_USAGE_HISTORY;

-- field: FAILSAFE_BYTES
SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.TABLE_STORAGE_METRICS;
/* ================
Other useful views
==================*/

-- Combines DATABASE_STORAGE_USAGE_HISTORY + STAGE_STORAGE_USAGE_HISTORY
SELECT * FROM SNOWFLAKE.ORGANIZATION_USAGE.STORAGE_DAILY_HISTORY;

-- To know the usage in terms of USD $
SELECT * FROM SNOWFLAKE.ORGANIZATION_USAGE.USAGE_IN_CURRENCY_DAILY;

Do not overlook Storage during cost optimization activity, just because it is cheaper in comparison to Compute. The smallest things can make the biggest difference.

2. Compute

  • Primarily, Compute layer cost is driven by the user-defined Virtual Warehouses. On top of this, Cloud Services & Serverless resources consumption are reported as separate line items as part of Compute costs.
Credits consumed per hour
  • Below is the formula to calculate the cost in $ at a query or workload level.

$ (Cost) = No. of clusters (in case of multi-cluster configuration)
x No. of nodes (based on the Warehouse Size)
x Warehouse Running Time (in hours)
x $ value per credit

— *$ value per credit is obtained from Snowflake Pricing based on the chosen Cloud Provider & Region combination.

  • Be mindful of the fact that there is a ⏲ ️60 second (1 minute) minimum billing for all Virtual Warehouses irrespective of the size or type.
  • Post the first 1 minute, the billing is calculated on a per-second basis.
  • Each time a Virtual Warehouse is resized to a larger size, it incurs 1 minute worth of extra credits (only for the additional nodes added).

i.e., Just running below statement incurs 1 minute worth of 2 additional credits.

-- Consider WH_SMALL is a SMALL sized Virtual Warehouse
ALTER WAREHOUSE WH_SMALL
SET WAREHOUSE_SIZE = 'MEDIUM';
/* ===============================
Some useful views to get the stats
==================================*/

-- Below view returns daily overall credit consumption.
-- The view also returns the
SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.METERING_DAILY_HISTORY;

-- Below view returns hourly credit consumption at the Warehouse level.
SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY;

3. Cloud Services

  • Cloud Services layer is basically the brain of Snowflake.
  • It handles below activities to name a few.
  1. Authentication
  2. Security & Governance
  3. Metadata Management
  4. Transaction Management
  5. Query Compilation & Optimization
  6. Query Caching
  7. SQL API
  • The compute resources used by the Cloud Services layer are provisioned by Snowflake automatically.
In the Usage screen, Cloud Services consumption falls under Compute type
  • There is a 10% adjustment of Cloud Services usage offered by Snowflake which is calculated on a daily basis in the UTC Time zone.
10% adjustment calculation (Daily Basis)

4. Data Transfer

  • Data ingress (bringing data into Snowflake) is always Free.
  • Data egress/transfers within the same region is Free.
  • Data egress (moving data out of Snowflake) into another region of the same cloud provider or another cloud provider is charged based on a per-byte fee.
-- Below views return daily average data transfer information
SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.DATA_TRANSFER_HISTORY;
SELECT * FROM SNOWFLAKE.ORGANIZATION_USAGE.DATA_TRANSFER_DAILY_HISTORY;

-- To know the usage in terms of USD $
SELECT * FROM SNOWFLAKE.ORGANIZATION_USAGE.USAGE_IN_CURRENCY_DAILY;

Below are the Snowflake Features that incur Transfer costs

  • Unloading Data — the process of exporting data from Snowflake Tables into files.
  • Replicating Data — Replication of Databases to a secondary instance.
  • External Functions.
In the Usage screen, Data Transfer is listed as a type

5. Serverless Compute Resources

Below are the list of Serverless features in Snowflake which incur costs that are reported under the corresponding line item as part of Compute in billing.

  1. Automatic Clustering
  2. External Tables
  3. Materialized Views
  4. Query Acceleration Service (Public Preview)
  5. Replication (Account or Database)
  6. Search Optimization Service
  7. Snowpipe
  8. Tasks
  • These compute resources are Snowflake-managed and the users do not have control over the resource allocations.
Source — Snowflake Credit Consumption Table (PDF)

Stay tuned for the next parts of this blog series,

👉🏼 “Snowflake — Cost Optimization Framework — Part 2 of 3: Control” (publishing soon)

👉🏼 “Snowflake — Cost Optimization Framework — Part 3 of 3: Optimize” (publishing soon)

About Me

I am Madhivanan Anbalagan, currently working as an Associate Solution Architect at kipi.bi (an Elite Partner of Snowflake), located in Bangalore, India. I have an extensive 9+ years of experience in the data space, lucky to have played various roles such as Data Engineer, ETL Developer, Oracle PL/SQL Developer, etc. In my current role, I primarily architect strategic solutions for the clients with Snowflake Data Cloud as the platform of choice, along with the Modern Data Stack tools & technologies (such as dbt, Fivetran, Matillion, Airflow, etc.). I’m loving it! 💙💚

Along with my passion towards data, my interests are Photography and Graphic Design. I’m also sort of a perfectionist, a sample of that is in the banner image of this blog post up top! Its basically a gradient of 2 precise hex color codes (Snowflake Blue ↔ kipi.bi Green), which I’m planning to keep in my future blog posts as a color signature.

Please feel free to Follow or Connect with me on LinkedIn or other social media platforms. You can find me on pretty much all the platforms with the username @madhiceg.

--

--

Madhivanan Anbalagan

1 of 80 Snowflake Data Superheroes 2024 🦸🏻 | 4x SnowPro Advanced Certified Solution Architect @ kipi.bi | Snowflake Bengaluru Chapter Leader ❄️