Snowflake Schema

A snowflake schema is a logical arrangement of tables in a multidimensional database that mimics a snowflake shape on the entity-relationship diagram. The snowflake schema is made up of centralized fact tables with multiple dimensions. A method of normalizing the dimension tables in a star schema is known as "snowflaking". In this blog, we are going to cover the topics which include a complete overview of Snowflake schema, characteristics of snowflake, advantages, and disadvantages of a snowflake.

Snowflake Schema - Table of Content

Overview of Snowflake Schema

Snowflake's Data Cloud is based on a cutting-edge data platform that is available as Software-as-a-Service (SaaS). Snowflake provides data storage, processing, and analytic solutions which are quicker, simple to use, and more adaptable than traditional systems. Snowflake is not based on any current database technology or "big data" software platforms like Hadoop. Snowflake, on the other contrary, blends a brand-new SQL query engine with a cutting-edge cloud architecture designed for the cloud. Snowflake brings all of the features and capabilities of an enterprise analytic database to the user.

Snowflake is a cloud-based application that runs entirely in the cloud. All of Snowflake's components (except for optional command-line connectors, drivers, and clients) are executed on public cloud infrastructures. Snowflake's computational needs are met by virtual compute instances, and data is stored persistently via a storage service. Snowflake isn't compatible with private cloud infrastructures (hosted or on-premises). Snowflake isn't a user-installable package of software. Snowflake is responsible for all software updates and installation.

Become a Snowflake Certified professional by learning this HKR Snowflake Training !

The architecture of Snowflake is a hybrid of shared-nothing and shared-disk databases. Snowflake uses a central data repository for persisting data that is accessible from all compute nodes in the platform, similar to shared-disk systems. Snowflake, however, performs queries utilizing MPP (massively parallel processing) compute clusters, in which each node in the cluster maintains a piece of the full data set locally, akin to shared-nothing systems. This method combines the ease of data management of a shared-disk design with the performance and scale-out advantages of a shared-nothing architecture.

Snowflake allows you to connect to the service in multiple ways. All aspects of administering and using Snowflake could be accessed using a web-based user interface. Snowflake command-line clients (such as SnowSQL) provide access to all aspects of Snowflake management and use. Other applications (like Tableau) can connect to Snowflake via ODBC and JDBC drivers. Native connectors (e.g., Spark, Python) that could be used to create Snowflake-connected applications. Third-party connections can be used to connect Snowflake to programs like ETL tools (eg. Informatica) and BI tools (eg. ThoughtSpot).

Snowflake Training

  • Master Your Craft
  • Lifetime LMS & Faculty Access
  • 24/7 online expert support
  • Real-world & Project Based Learning
Example: 

Model of a Snowflake Schema in a Data Warehouse


EmployeeID, EmployeeName, DepartmentID, Region, and Territory are now all available in the Employee dimension table. The Employee table is connected to the Department dimension table by the DepartmentID attribute. The Department dimension is used to offer specific information about each department, like the department's name and location. CustomerID, CustomerName, Address, and CityID are now attributed in the Customer dimension table. The Customer dimension table and the City dimension table are connected by the CityID attributes. Each city's details are contained in the City dimension table, including CityName, Zip Code, State, and Country.

The main distinction between star and snowflake schemas is that the snowflake schema's dimension table is retained in its normalized form to minimize redundancy. The benefit is that such (normalized) tables are simple to maintain and save storage capacity. However, this means that the query would require more joins to run. This will have an adverse effect on the system's performance.

Get ahead in your career with our Snowflake Tutorial !

Characteristics of snowflake schema:

  • The snowflake schema takes up a small amount of disk space.
  • The addition of a dimension to the schema is simple to implement.
  • As the tables are multiple, performance decreases.
  • The dimension table is made up of two or more attribute sets that specify data at different grain levels.
  • Distinct source systems populate different attribute sets in the same dimension database.

Subscribe to our youtube channel to get new updates..!

Advantages of Snowflake Schema:

The following are the two key advantages of the snowflake schema:

  • It offers structured data, which helps to solve the data integrity problem.
  • It utilizes a small disk space since the data is structured highly.

Top 30 frequently asked snowflake interview questions & answers for freshers & experienced professionals

Disadvantages of Snowflake Schema:

  • Snowflaking minimizes the amount of space consumed by dimension tables, however, the savings are usually minimal when compared to the overall data warehouse.
  • Snowflaking or normalizing of a dimension table should only be done if absolutely necessary.
  • Snowflake one-dimensional table hierarchy into independent tables. Snowflakes must never be hierarchies and must always belong to the dimension table.
  • Multiple hierarchies which can belong to the same dimension were designed with the lowest possible.

Snowflake Training

Weekday / Weekend Batches

Conclusion:

In this blog, we have learned an overview of Snowflake Schema such as Data Platform as a Cloud Service, the architecture of Snowflake, ways of connecting snowflakes. We have also discussed an example for Snowflake schema along with the characteristics of snowflakes, benefits, and drawbacks of Snowflake. We hope this blog has provided you with sufficient knowledge to understand the Snowflake Schema and its related concepts.

Related Articles:

Find our upcoming Snowflake Training Online Classes

  • Batch starts on 21st Jan 2022, Fast Track batch

  • Batch starts on 25th Jan 2022, Weekday batch

  • Batch starts on 29th Jan 2022, Weekend batch

Global Promotional Image
 

Categories

Request for more information

Saritha Reddy
Saritha Reddy
Research Analyst
A technical lead content writer in HKR Trainings with an expertise in delivering content on the market demanding technologies like Networking, Storage & Virtualization,Cyber Security & SIEM Tools, Server Administration, Operating System & Administration, IAM Tools, Cloud Computing, etc. She does a great job in creating wonderful content for the users and always keeps updated with the latest trends in the market. To know more information connect her on Linkedin, Twitter, and Facebook.