Snowflake Cloning

Snowflake Cloning is the process of reproducing the data from a database, schema, or table without any additional storage or extensive waiting time. In organisations, it is generally required that the real-time data is replicated from production to development or the staging environment to have exact results to make changes that are planned in the object. Gone are the days when we used to sit long to get the environment provided. Cloning is comparatively quicker using Snowflake Cloning than any other database. It only takes a few minutes to replicate the data depending on the source objects. Snowflake data can be cloned “N” number of times without having any limitation.

What is Cloning in Snowflake?

Cloning in Snowflake is an attribute that provides a speedy and simple, uncomplicated way to produce a copy of a schema, table, or even a database. It conducts the entire process without any additional cost and time or any form of resources. The duplicate copy of the data shares primary storage with a native object. 

The best feature of cloning in Snowflake is that both the cloned and also the original objects are not dependent on each other which means that any modification done on any of the objects does not impact the other. This is a useful technique to quickly replicate backups that do not cost additionally till the same object is modified. Thus, cloning in Snowflake is faster than cloning in the other databases, making it one of the preferred choices in the market.

How to Clone objects in Snowflake?

To clone objects in Snowflake the user needs to follow a few steps mentioned below-

  • To clone any object in a zero-copy clone snowflake a single structure query language is required. With the help of it, the user can clone any existing object to re-create a  new one.
  • The user can use a command ( CLONE < source_object_name> ) to clone the objects. For instance, we refer to it as Table 1. Now, a clone is generated with all the accessible data in Table 1. This clone is just a brand new set of metadata which refers to the micro partitions that are identical and store production data.
  • This cloned table can be used similarly to the other. It will support time travel, self-contained as well the DDL and DML procedures.

CLONE

  • Let us refer to the following diagram to understand it more clearly.  Consider a scenario where you are running your ETL process in a staging environment which is a part of the integration operation which is into testing. They happen to change data from the Table 1 clone.  As seen in the diagram MP-3 (Micro partition ) is the owner of the data that is updated. Since the data of the Table 1 clone is modified/updated snowflake duplicate will then create a new MP (Micro Partitioner) and assign it again to the stage environment. The micro partitions in snowflakes are immutable. Therefore any variation is recorded separately ( individual basis) in the stage environment and metadata will then refer to the new micro partition which is generated to update the Table 1 clone as shown in the below diagram.

metadata

  • The user should note that the clone is a new item. The newly generated clone will have its history of data loading and time travel because the parents' object’s data and metadata will be safeguarded. 

Become a Snowflake Certified professional by learning this HKR Snowflake Training !

Snowflake Training

  • Master Your Craft
  • Lifetime LMS & Faculty Access
  • 24/7 online expert support
  • Real-world & Project Based Learning

How does Snowflake’s Zero Copy Cloning work?

The data in a Snowflake table is divided automatically into the micro-partitions i.e small units of the storage that are continuous. Every micro-partition has on average uncompressed data that ranges from 50MB to 500MB. The process of micro-partitioning was performed on all the snowflakes on its own.  When an object from the database is cloned, Snowflake then creates new metadata info indicating micro partitions of native source objects in place of generating copies of micro-partitions that are existing. Therefore it is named Zero copy cloning.  The entire operation is carried by the Snowflake’s cloud service layer and the user need not intervene.

The user can also generate copies of database objects by not copying the data. 

For Instance

Contemplate a database of tables called “Table A” and also it is cloned as “Table B”. The following diagram indicates the Snowflake’s layer of metadata. As seen in the image the data of Table A is directed to micro partitions in the storage layer. The table next to Table A which is Table B is a cloned version of the data of Table A and that is also directed to the micro partitions of Table A.

storage layer

Please Note: All the Micro-partitions that are in the Snowflake are permanent which means that they cannot be changed once it is created it lasts in the state till the table is dropped. To change the data in a micro-partition a brand new micro-partition needs to be created and the metadata would point out similarly to the new-created micro-partition. Also, the older micro-partitioner is then retained for fail-safe and time-travel purposes.  

The below image indicates an illustration where the user has made some changes by modifying the data that micro-partition MP-3 holds in Table B. The modification is seized in Micro-partition 4 that is referenced by Table B only. Therefore the additional cost is only levied on the changed data and not the entire clone as seen in the diagram.

data

Which objects are supported in Snowflake Zero Copy Cloning?

Prior to learning how to clone any object, it is important to know the list of objects that supports cloning. Find the below-mentioned list of the objects:

Data Containment Objects
  1. Databases: A logical group of schemas is referred to as a Database. Each database is held by a single snowflake account.
  2. Schemas: A schema refers to a group of database objects such as views, tables, and others.  Each Schema is owned by a single database.
  3. Table: All the data in the Snowflake is stored in tables. The user can use ‘views’ to display the rows and columns that are selected. 
  4. Streams: All the data manipulation made to the external table, directory table, or underlying table is recorded using streams. It is known as source objects for which changes are recorded.
Data Configuration and Transformation Objects
  1. Stages:  Stages in a Snowflake refer to the locations which are used to store the data. If the data is required to be loaded using snowflake and is stored on other cloud regions such as GCP, AWS S3, or Azure then it is called an External stage. Likewise, if the data is stored inside the Snowflake it is called Internal Stages.
  2. File Formats: CSV, ORC, Avro, ORC, Parquet or XML are a few of the file formats that are supported by Snowflake. 
  3. Sequences: A sequence is used to generate a unique number across statements and sessions which includes concurrent statements too. Snowflake does not ensure generating the sequence numbers without having gaps. The generated numbers often increase or decrease in value.
  4. Task: A task refers to a job that can be executed by the following types of Structures Query Language :
  • Single Structured Query Language
  • Call via  Structured Query Language stored procedure.
  • Process logic by using Snowflake Scripting.

It can also be used independently to produce periodic reports by merging or inserting rows in a report table.

Want to know more about Snowflake,visit here Snowflake Tutorial !

Subscribe to our youtube channel to get new updates..!

Advantages of Zero Copy Clone Snowflake

Saves you Time:  The users have to wait for long hours, days, and weeks to build a test or even develop the environment from the copy of production of the data warehouse. The user not only saves time but also does not have to shell additional costs for both the test and development environment so that all the replicated data can be handled.

Snowflake “Fast Clone”: Zero copy clone snowflake is a technique that quickly allows the users to make as many copies of the data without an additional cost for expenses that are associated with data replication. The user can also save time with it.

Saves Money on storage: With the help of Zero clone snowflake, the users can create a clone of the item without reproducing underlying storage. When a user clones a table, it does not use data storage since it maintains the existing micro partitions of the parent’s database at the time of cloning. The rows of any clone can be deleted, updated, and even added irrespective of the native table. Each clone that is updated generates newly developed micro partitions that are safeguarded by CDP and solely relate to the clone.

Easy to use:  The technology of Zero copy clone snowflake can create copies of tables, databases, and schemas without reproducing the original data by using the term called CLONE. Any additional administrative activities are not required.  Therefore cloning is simple and a very basic process that does not require any special expertise.  

Key Features of Snowflake

Following are the features of Snowflake:

  1. Scalability: The Multi-cluster Data Architecture which is shared splits the storage resources and computes for scalability purposes. This helps the users to quickly scale up and scale down the resources when a large amount of data is needed to load and complete the operation without interfering with the other tasks. 
  2. Support for Semi-Structured Data: The architecture of the snowflake allows its users to store both structured and Semi-structured data by using VARIANT schema in a type that it can be read. A variant can be used to store both the types of data i.e. Structure and Semi-structured.
  3. Security: Snowflakes has a lot of security features for its users. It covers almost everything right from how the data is accessed by the users to how data is stored in the system. A user can restrict the access of the account by modifying the Network Policies, this can be done by whitelisting the IP addresses.

 Top 30 frequently asked Snowflake Interview Questions !

Snowflake Training

Weekday / Weekend Batches

Conclusion

Hope you have now understood the concept of “Zero copy clone snowflake”. This is one of the best snowflake features. Cloning databases is easy and simple. Cloning will not only help you save time but can save a lot of money too! However, for developers dealing with a variety of data sources such as CRMs, streaming services, and Databases, can be difficult. If you are from a non-technical background or want to explore a career in the data warehouse, HKR Trainings can be your perfect training partner. To learn more about it, you can comment in the section below.

related blogs: 

Find our upcoming Snowflake Training Online Classes

  • Batch starts on 2nd Oct 2022, Weekend batch

  • Batch starts on 6th Oct 2022, Weekday batch

  • Batch starts on 10th Oct 2022, Weekday batch

Global Promotional Image
 

Categories

Request for more information

Gayathri
Gayathri
Research Analyst
As a senior Technical Content Writer for HKR Trainings, Gayathri has a good comprehension of the present technical innovations, which incorporates perspectives like Business Intelligence and Analytics. She conveys advanced technical ideas precisely and vividly, as conceivable to the target group, guaranteeing that the content is available to clients. She writes qualitative content in the field of Data Warehousing & ETL, Big Data Analytics, and ERP Tools. Connect me on LinkedIn.

No, you cannot clone users in the Snowflake. The two sets of objects that can be cloned are Data containment objects and Data configuration & Transformation objects.

Cloning the data using the concept of zero-copy clone snowflake, reproduces a schema, table, or database. When it is produced a snapshot is made accessible in the copied object by capturing it from the source object. Both the clone and the source object are not related to each other. 

Clones help to create copies of any existing object in the system. 

Transferring data from one database to another in Snowflake is possible using secure views. This can be only done when the database is owned by the same account. Objects such as tables, schemas, and others can be referenced from one database to another database using a secure view.

Currently, it is not possible to clone a transient table in Snowflake. It cannot be modified. The user cannot change a permanent table to a transient table and vice versa.