Data modeling is the process of creating a model for the data to store in a database. It is a conceptual representation of data objects, the association between different data objects, and the rules.
In this article, you can go through the set of Data Modeling interview questions most frequently asked in the interview panel. This will help you crack the interview as the topmost industry experts curate these at HKR trainings.
Let us have a quick review of the Data Modeling interview questions.
Ans: A data model is a conceptual representation of business requirements (logical data model) or database objects (physical) required for a database and is very powerful in expressing and communicating the business requirements and database objects. The approach by which data models are created is called data modeling.
Logical Data Model:
Entity, Attributes, Super Type, Sub Type, Primary Key, Alternate Key, Inversion Key Entry, Rule, Relationship, Definition, business rule, etc
Physical Data Model:
Table, Column, Primary key Constraint, Unique Constraint or Unique Index, Non Unique Index, Check Constraint, Default Value, Foreign Key, comment etc.
Ans: There are mainly three different types of data models:
Ans: The types of relationship in a data model are stated as follows.
Logical data model:
Physical data model:
Become a Data Modeling Certified professional by learning this HKR Dimensional Data Modeling Training !
Ans: Characteristics of the logical data model are:
Ans: Characteristics of the physical data model are:
Ans: The advantages of using data modeling in data warehousing are:
Ans: A data mart is the most straightforward set of data warehousing and is used to focus on one functional area of any given business. Data marts are a subset of data warehouses oriented to a specific line of business or functional area of an organization such as marketing, finance and sales. Data enters data marts by an assortment of transactional systems, other data warehouses, or even external sources.
Ans: Data warehousing is a process for collecting and managing data from varied sources. It provides meaningful business enterprise insights. Data warehousing is typically used to connect and analyse data from heterogeneous sources. It is the core of the BI system, which is built for data analysis and reporting.
Ans: The two design schema is called Star schema and Snowflake schema.
In a star schema, you just enter your desired facts and all the primary keys of your dimensional tables in the Fact table. The fact tables primarily are the union of its all dimension table key. In a star schema, dimensional tables are usually not in BCNF form.
It’s almost similar to a star schema but in this, our dimension tables are in 3rd NF, so more dimensions tables. These dimension tables are linked by primary, foreign key relation.
Ans: The main relationship types are:
Ans: The purposes of normalization are:
Ans: OLTP acronym stands for ONLINE TRANSACTIONAL PROCESSING. The approach by which data models are constructed for transactions is called OLTP data modeling.
Example: all online transactions, bank transactions, trading transactions.
Ans: There are three types of fact tables:
Ans: Fact less tables simply contain keys which refer to the dimension tables. Hence, they don’t really have facts or any information but are more commonly used for tracking some information about an event.
Example: To find the number of leaves taken by an employee in a month.
Ans: A fact table is usually designed at a low level of Granularity. This means finding the lowest level of information that can store in a fact table.
Example: Employee performance is a very high level of granularity. Employee_performance_daily, employee_perfomance_weekly can be considered lower levels of granularity.
Ans: Two types of data modelling techniques are:
Entity-Relationship (E-R) Model:
ER model or entity-relationship model is a methodology of data modeling that normalizes the data by reducing redundancy.
UML (Unified Modelling Language):
It is a general-purpose, database development, modelling language in the field of software engineering. The main intention is to provide a generalized way to visualize system design.
Ans: Dimensional model consists of dimension and fact tables. Fact tables store different transactional measurements and the foreign keys from dimension tables that qualify the data. The goal of the Dimensional model is not to achieve a high degree of normalization but to facilitate easy and faster data retrieval.
Relational Data modeling:
Dimensional Data modeling:
Ans: Data warehouses commonly use a surrogate key t0 uniquely identify an entity. A surrogate is not generated by the user but by the system. A primary difference between a primary key and surrogate key in a few databases is that the primary key uniquely identifies a record while a surrogate key uniquely identifies an entity.
Example: An employee may be recruited before the year 2010 while another employee with the same name may be recruited after the year 2010. Here, the primary key will uniquely identify the record while the surrogate key will be generated by the system (say a serial number) since the surrogate key is not derived from the data.
Ans: Example of OLTP system are:
Ans: Enterprise data model comprises all entities required by an enterprise. The development of a common consistent view and understanding of data elements and their relationships across the enterprise is referred to as Enterprise Data Modeling. For better understanding purpose, these data models are split up into subject areas.
Ans. Data modeling is developing a model for data to store within a database securely. Here, the data model is a concept of presenting business needs or physical objects that are required for a database. Further, it helps to communicate and present various business needs and objects in the database The approach by which data models are created is called data modeling.
Ans. The following are the types of Data Models:-
Ans. The physical data model consists of all the tables, columns, database properties, and relationships for the physical execution of databases. Moreover, some important variables of a physical data model include database execution, indexing strategy, and physical storage.
The table is the major component here that includes rows and columns. Further, physical data modeling is a viewpoint by which these data models are built.
Ans. A table is a collection of rows & columns where data is stored. Each column in a table includes a datatype where the data exists in a tabular format.
Ans. A column has another name field where data is vertically arranged that includes related information.
Ans. The row has another name tuple or record where data is horizontally aligned.
A Logical Data Model is responsible for the actual application of data stored in a database and it can design the needs of a business logically. Further, the logical data model consists of an alternate key, primary key features, entity, business rule, relation, etc.
A Physical Data Model is useful to build a new database model from the existing and it provides data about the target database source. Table, unique key, columns, indexes, foreign key, etc. are included in this data model.
Ans. In a data model, the tables look like square or rectangular boxes that include various features, and these tables are connected by lines. The term ER diagram ERD is a visual representation of tables or entities and their relationships.
Ans. The Surrogate key is a numerical element that is applied as a primary key. This key is a substitute for natural keys or an artificial key.
The data modelers create a surrogate key when there is no availability of a primary key or composite primary keys. Hence, this key is useful for SQL queries, uniquely identifying a record and performing well.
Ans. A foreign key is useful to link a parent and a child table on the column which includes a group of elements. The value of a foreign key column within the child table always refers to primary key values available in the parent table.
Ans. The primary key constraint is forced on the column data that avoids null and duplicate values in the column. The equation of Primary Key = Unique + Not Null. The following are the primary key examples:- bank account number, bank routing number, etc.
Ans. When more than one table column is used as a primary key, then the same is called a composite primary key constraint.
Ans. In data modeling, when there is a group of columns exists in a foreign key, the same is called a composite foreign key constraint.
Ans. This is a numerical element of a relation between two tables or entities. There are many types of cardinalities such as one-to-one, one to many, and many to many. In a database, high cardinality refers to a column that can have more unique or possible values stored and vice versa.
Ans. The Enterprise data model includes all the tables or entities that an enterprise requires. Enterprise Data Modeling refers to the evolution of a common compatible view and understanding of data attributes and their associates throughout the organization.
Ans. Relational data modeling is a visual presentation of objects within a relational database.
Ans. There are three different types of relationships in a Data Model. Such as identifying relationships, non-identifying relationships, and self-recursive relationships. Let us know these relationships in detail.
Generally, in a data model, 2 types of tables are present: parent tables and child tables. Both of these tables are linked through a relationship line.
Ans. The term OLTP refers to ONLINE TRANSACTIONAL PROCESSING that uses traditional DBMS. The approach using which data models are built for transactions is OLTP data modeling. Following are examples of OLTP: online transactions, trading transactions, order entry, bank transactions, etc.
Ans. The conceptual data model consists of all the important entities or tables and their relationships. But it doesn’t include much detailed information about elements and is frequently used in the early planning stage. Further, the data modelers build this data model and forward it to the functional team for their review.
Ans. The constraints are the rules that are enforced on the data columns available in the table. There are various types of constraints in data modeling such as primary key, not null, foreign key, composite foreign key, unique, check constraint, etc.
Ans. To recover data much faster, an index is enforced on a column or set of columns.
Ans. In the data model, a unique constraint is enforced on the data column to keep away identical values. But here it will include NULL values.
Ans. To produce a unique number, the sequence is used which is a database object in a data model.
Ans. There is no limit here, you can insert as many null values in a unique constraint column. This is because one null value is not identical to another.
Ans. A Check constraint in a data model is useful to verify the range of values within a column.
Ans. The term ‘Normalization’ refers to a design of relational databases where the tables are arranged in a way that helps to remove data redundancy. It also helps to overcome normal data modeling problems and reduces data dependency. Further, it splits the larger tables into many smaller tables
Ans. The following are the different types of normalization available in Data Modeling- First normal form, Second normal form, Third normal form, Boyce-Codd fourth, and Fifth normal forms.
Ans. A data model repository is a storage that consists of data models & their applicable data. Such as entity definition, columns, data types, the definition of an attribute, etc. This data is accessible to data modelers and the whole team.
Ans. A De-Normalization in data modeling is a process where redundancy is added to the data and it is also useful to build a data warehouse. Also, it helps to recover data much faster from the database.
Ans. This is a process using which DDL (Data Definition Language) scripts are produced from the data model. Further, data modeling tools have choices to build DDL scripts by associating with other databases. However, we can develop databases using these scripts.
Ans. Metadata is the data about several objects within the data model. Further, the Data Modeling tools provide various options to build reports by verifying several points. It gives you an option to build Metadata which is either a logical data model or a physical model. Moreover, it presents the actual data that is stored within a database.
Ans. The process of Reverse Engineering is used to develop data models from databases/scripts. Further, the tools of data modeling have various choices to link to the database. These tools are helpful to perform reverse engineering to convert a database into a data model.
Ans. An entity in a data model can be divided into several sub-entities and collected based on some features. Here, each sub-entity will have attributes pertinent to that entity and share some common attributes that are different from other subgroups. Hence, these are the subtype entities in a data model.
A supertype entity is an entity that has a relationship with many subtype entities. In this entity, the attributes that are common to them are moved to a super or higher-level entity.
Ans. We can go with the following non-computer example. Here, we can think about a system where people use some types of resources and compete for those resources. Examples of preemptive scheduling may include that if there is an emergency or an ambulance traveling on the single-lane road then the traffic on that road gives a path to those vehicles that are in need. Popular examples of Non-preemptive scheduling include First Come First Serve, and the Shortest Job First (SJF).
Star Schema: In this schema, a user just enters his desired facts including all the primary keys of the dimensional tables in the Fact table. Thus, it includes both dimension and fact tables.
Snowflake Schema: This schema is the same as the star schema but this schema includes all three types of tables such as dimension, sub-dimension, and fact tables. Further, these extent tables are connected through the relations of the primary and foreign keys.
Ans. Data sparsity is used for how much data we have for a specific entity/dimension of the model. It affects aggregation based on the deep union of members of the sparse dimension.
In case the dimensions carry insufficient information, then the data may need more space for the aggregations and it results in a large-size database.
Ans. In the Datastage server, we use a hashed file to save data on the hash algorithms along with the hash key value. On the other hand, a sequential file doesn’t contain any key value for saving data.
A sequential file stage is useful to read or write data from one or more flat files. Hashed file stage is useful to extract data within a DataStage job. Further, each of the hashed file stages can include many inputs or outputs.
Ans. Data models are tools useful to narrate the data needs and speculations in the system from a top-down standpoint. They also make it possible for designing databases later on in the SDLC.
The following are three basic elements available in ER models:
Ans. If an entity is in the second normal form, then it is considered the entity is in the third normal form. Further, all of its elements are not transitively dependent on the primary key in the data model. The term transitive dependence refers to the descriptor key elements not only depending on the entire primary key. But they also depend on other descriptor key elements that successively depend on the primary key in a data model.
Ans. Recursive relationships are a more complex concept than the relationships we have seen earlier like one-to-one, one-to-many, etc. This type of relationship takes place when there is a relation between an entity and itself.
Due to their complex nature, we may need a moderately more complex system to map them to a schema and present them in a style sheet.
Ans. If we use a name as the primary key in a data model, it will breach the principle of stability. The social security number might be a valid option, but an employee working in a foreign might not have this number. In this condition, a derived or artificial primary key is best suitable.
An artificial key is considered a derived key that we build. Further, a natural key is that which already exists in the database.
Ans. In case all the attributes/elements depend only on the primary key, then an entity is considered to be in the 2nf. In declarative terms, each column within a table must be functionally dependent on the entire primary key of the same table. This dependence specifies that there is a connection between two different column values.
Ans. The term ‘Granularity’ means the volume of information that a table carries. It can be measured in types - High or Low. A low granularity contains only low-level or low-volume information whereas high granularity data include transaction-level data.
Ans. Usually, all enterprise databases are normalized to the Third Normal Form to pull out redundancy and efficient access. We can also develop a database without
Define Conformed Dimension.
normalization. Therefore, it's not necessary that all databases should be in the Third Normal Form.
Ans. A dimension is considered confirmed if the same is attached to two fact tables. Examples of conformed dimensions would include calculating profits, revenue, price, margin, cost, etc.
Ans. The dimensions in the data are a group of unique values useful for locating and classifying data from the data storage or warehouse.
Ans. A data mart is a kind of data warehouse that is designed & developed for a specific unit or a section within an enterprise such as Sales, HR, Marketing, Finance, etc.
Ans. There are different types of Metadata in a data model.
Ans. Lack of reason, Creating huge data models, Developing unwanted surrogate keys, an unnecessary denormalization process, etc.
Ans. Junk Dimension, Conformed Dimension, Role-playing Dimension, Degenerated Dimension, Slowly Changing Dimension, Swappable Dimensions, etc.
Ans. The term OLAP refers to On-Line Analytical Processing, a type of technology that allows managers, analysts, etc. to get data insights in a secure, consistent, and fast manner. Moreover, the OLAP technology is useful in many smart solutions such as budgeting, analysis, planning, simulation models, etc.
Ans. PDAP refers to Praedico Data Platform which is a data cube useful for storing data as a summary. The data in the PDAP platform is stored in a way that the users can easily report it.
Ans. The term ‘Discrete data’ is a kind of fixed or limited data such as gender, and telephone numbers. On the other hand, Continuous data changes in a constant and ordered manner. Such as Age.
Batch starts on 10th Oct 2022, Weekday batch
Batch starts on 14th Oct 2022, Fast Track batch
Batch starts on 18th Oct 2022, Weekday batch