The two most common types of data warehouse schemas are star schemas and snowflake schemas. A data warehouse schema describes how you structure your tables and their mutual relationships within a database or data warehouse. Because the main purpose of a data warehouse (and other OLAP databases) is to provide a centralised view of all enterprise data for analytics, data warehouse schemas assist us in achieving superior analytic results. What role do schemas play in analytics? What are the distinctions and trade-offs between the star and snowflake schemas? In this article, we evaluate the two most common data warehouse schema variants and contrast their benefits and drawbacks.
Star schemas are the most basic structure for storing data in a data warehouse. A star schema's centre is made up of one or more "fact tables" that index a series of "dimension tables." To fully comprehend star schemas, as well as snowflake schemas, fact tables and dimension tables must be thoroughly examined.
A snowflake schema's purpose is to normalise the denormalized data in a star schema. This eliminates the write command slowdowns and other issues that are commonly associated with "star schemas."
A "multi-dimensional" framework seems to be the snowflake schema. At its heart are fact tables which communicate the data gleaned in dimension tables, which radiate upwards and like a star. The snowflake schema's dimension tables, on the other hand, start dividing themselves into multiple tables. This results in the snowflake pattern.
Become a Snowflake Certified professional by learning this HKR Snowflake Training !
The following are the key differences between the start schema and snowflake schema across multiple factors.They are:
The goal of a star schema is to separate numerical "fact" data about a business from descriptive, or "dimensional" data. Price, weight, speed, and quantities that is, data in a numerical format will be included in fact data. Colors, model names, geographical locations, employee names, salesperson names, and so on will be included in dimensional data, in addition to numerical information.
The factual data is organised into fact tables, while the dimensional data is organised into dimension tables. In the data warehouse, fact tables are the integration points at the centre of the star schema. They enable machine learning tools to analyse the data as a whole, and they allow other business systems to access the data as well. Dimension tables store and manage data (both numerical and nonnumerical) that flows through fact tables to form the data warehouse.
From a technical point of view, fact tables make note of numeric data related to various events. They could, for instance, include numeric values as well as foreign keys that map to additional (descriptive and nonnumerical) information in dimension tables. To get more analytical, fact tables keep a low level of granularity (or "detail"), which means they record information at a more atomic level. This could result in a large number of records being added to the fact table over time.
The snowflake schema normalises the dimension tables it connects with using this "snowflaking" method by (1) removing "low cardinality" attributes (that appear multiple times in the parent table); and (2) splitting the dimension tables into multiple tables until the dimension tables are completely normalised.
The snowflake database, like snowflake patterns in nature, becomes extremely complex. The schema can generate complex data relationships in which child tables have multiple parent tables.
Get ahead in your career with our Snowflake Tutorial !
The snowflake schema is a data structure that has been fully normalised. Separate dimensional tables are used to store dimensional hierarchies (such as city > country > region).Because it saves space, it can be used when the Dimension Table is relatively large.
Star schema dimensions, on the other hand, are denormalized. The repetition of the same values within a table is referred to as denormalization.It can be used when the Dimension Table contains fewer rows.
Snowflake schema fully normalizes dimension tables and avoids data redundancy, whereas star schema stores redundant data in dimension tables.Because the Snowflake Schema does have low data redundancy, it is cheaper to update and change.
A star schema, for example, would repeat the values in the field customer address country for each order from the same country.The Star Schema does have a high level of data redundancy, making it hard to maintain and modify.
The denormalization vs normalisation schema design causes redundancy, or duplicated entries.
A straightforward star schema relates to straightforward query creation. Analysts do not need to write multiple joins because the fact table is joined to only one level of dimensional tables.It is easy to understand and has low query complexity.
Snowflake schemas, but on the other hand, necessitate a more complex query design. More joins are required to link the additional tables due to the complex relationships between the fact table and its dimensional tables. This adds to the overhead when writing analytical queries.
Star schemas have a faster query execution time. Because dimensional tables require a single join between a fact and its set of attributes, a star schema functions almost as a single table for query lookups.
Snowflake schemas, on the other hand, necessitate complex joins of dimensional tables with their own sub-dimensional or supra-dimensional tables. This slows query processing and may have an impact on other OLAP products such as cube processing.
Star schemas may run queries faster, but due to data redundancy, they require more storage space than snowflake schemas.
Star schemas put data integrity at greater risk than snowflake schemas. Because data is stored redundantly, multiple copies of the same data exist in the dimensional tables of the star schema. This means that new inserts, updates, or deletes can jeopardise data integrity.
The snowflake schema, on the other hand, is less vulnerable to data integrity issues because it fully normalises dimensional tables, storing dimension data only once in the appropriate table.
Snowflake schema is a bottom up model.Star schemas are simpler to develop and implement. Since they are depicted by straightforward relationships, creating a suitable star schema is simple for a database developer or data architect.
Star schemas, but on the other hand, are more tough to sustain than snowflake schemas. Star schemas become more difficult to maintain and check for data integrity violations as new information is consumed into the data warehouse.Star schema is a top -down model.
The following advantages are provided by star schemas:
Top 30 frequently asked snowflake interview questions & answers for freshers & experienced professionals
Snowflake schemas have the following advantages over standard star schemas:
There are three potential problems with snowflake schemas:
Working to improve read queries as well as analysis in a star schema may present the following challenges:
Which one of the two kinds of data warehouse schema will you be using?
Star schemas, on the other hand, are easier, running applications faster, and are simple to set up.Snowflake schemas, but on the other hand, are much less vulnerable to data integrity issues, are cheaper to update, and take up less space.
Premised on the tradeoffs discussed above, it really is up to you to determine which advantage (or disadvantage) better serves your company's use situations.
Batch starts on 27th May 2022, Fast Track batch
Batch starts on 31st May 2022, Weekday batch
Batch starts on 4th Jun 2022, Weekend batch