![]() |
Last updated on Nov 07, 2023
Snowflake data cloud is a powerful data warehouse management platform. It is not an addition to already existing data warehouse management platforms. Snowflake data cloud is built on top of various cloud services such as Amazon web services, Microsoft Azure, and GCP (Google cloud platform) infrastructure. One more interesting thing about Snowflake data cloud is that this platform doesn’t require any installation prerequisites like hardware or software components to select, configure, or manage. With the help of the Snowflake data cloud, it’s easy to move the data into the ETL (extract, transfer, and load) process.
Snowflake tool allows users to access the data in one place so that they can deliver valuable outcomes. All business data in the snowflake is stored in the “Snowflake database tables” which are structured in rows and columns.
Points to remember while storing the data in Snowflake data cloud:
The below image illustrates the data storage in the snowflake:
In the future section, we will be explaining more about micro partitions, and the auto clustering process.
Become a Snowflake Certified professional by learning this HKR Snowflake Training!
Here is an example of micro-partition;
Snowflake uses the pruning (trimming) method to reduce the amount of data read from the storage.
In a query language, we define them using SELECT SUM (x) FROM T1 WHERE y = 42 (number of columns).
Important points to remember:
For example date or geographic region format.
The below image illustrates how does data clustering looks like:
Top 30 frequently asked snowflake interview questions & answers for freshers & experienced professionals
The clustering key is just a subset of the Snowflake table where we store the data. These types of clustering keys are specifically used to co-locate the data in the snowflake table within the same micro partitions. Clustering keys are very useful while working with large data tables when ordering the column data is not the most favorable one or extensive DML (data manipulation language) on the table has caused the natural table clustering to upgrade.
The below example determines the clustering keys usage in the query;
Take a look at a few points of clustering partitioned tables;
The following image is an example of automatic reclustering;
The most common question that comes to your mind is, how does this automatically reclustering work? Here is the answer;
The following are a few examples that define when to use clustering:
Defining a clustering key for a table :
As we know that snowflake supports both the partitioned and non-partitioned clustering. The below are the few scenarios that will explain when we need to use clustering;
1st scenario: you have fields that are accessed frequently in WHERE clauses: for example:
Select *from orders
Where product = ‘kindle’
Select * from Orders
Where date
between ‘01-01-2020’ and ‘30-01-2020’
2nd scenario:
Snowflake clustered table syntax:
You can use the following syntax to create clustered tables in Snowflake;
Create table…. Cluster by ( , , ….
3. Scenario:
Defining a clustering key for a table:
The following example creates a table with a clustering key by appending a cluster by clause to create a table.
For example:
— cluster by base columns
Create or replace table t1 (c1 date, c2 string, c3 number) cluster by (c1, c2);
–cluster by expression
Create or replace table t2 (c1 timestamp, c2 string, c3 number) cluster by (to_date(c1) , substring(c2, 0, 10) );
At any point in time, you can change or add clustering to an existing table using the alter table clause:
For example:
-- cluster by base columns
alter table t1 cluster by (c1, c3);
-- cluster by expressions
alter table t2 cluster by (substring(c2, 5, 15), to_date(c1));
-- cluster by paths in variant columns
alter table t3 cluster by (v:"Data":name::string, v:"Data":id::number);
You can also drop the clustering keys for a table using the alter table clause.
alter table t1 drop clustering key;
Get ahead in your career with our Snowflake Tutorial!
Final take:
In this snowflake clustering article, we have defined many core topics like micro-partitions, data clustering, clustering key usage, maximizing the query performance, and auto clustering. I hope we are able to justify our reader’s expectations and help them to achieve great success in the respective technologies.
Related Articles:
A technical lead content writer in HKR Trainings with an expertise in delivering content on the market demanding technologies like Networking, Storage & Virtualization,Cyber Security & SIEM Tools, Server Administration, Operating System & Administration, IAM Tools, Cloud Computing, etc. She does a great job in creating wonderful content for the users and always keeps updated with the latest trends in the market. To know more information connect her on Linkedin, Twitter, and Facebook.
Batch starts on 12th Dec 2023 |
|
||
Batch starts on 16th Dec 2023 |
|
||
Batch starts on 20th Dec 2023 |
|