Welcome to the Informatica Data Quality (IDQ) tutorial. Data cleansing is a vital step in application development for any organization. It improves data quality and increases overall productivity. It ensures that the data doesn't contain any errors to get the highest quality information. Informatica Data Quality is an excellent tool for data management and data cleansing. In this tutorial, you will get in-depth knowledge of IDQ. We have specially designed this IDQ tutorial for beginners. Without further ado, let's get started.
Informatica Data Quality is an offering of Informatica that helps manage the quality of data across the whole enterprise. It offers features like data analysis, data cleansing, data matching, reporting, and monitoring capabilities, and many more. It ensures that data is consistent across the enterprise to meet the business objectives.
IDQ uses the Claire engine in the backend to make intelligent recommendations and assessments. It also uses AI-driven insights to streamline data discovery. It offers transformations like data standardization, validation, de-duplication. The IDQ is available on both Microsoft Azure and AWS public clouds. So the users can quickly spin up infrastructure on the cloud and start working with it.
Informatica Data Quality was awarded as the Data Quality Market Winner in 2018 by CRM Magazine.
To gain in-depth knowledge with practical experience in IDQ Course,Then explore to hkr's IDQ online training!
Below are the advantages of the IDQ tool,
Core components of IDQ
The IDQ has two core components.
Data Quality Workbench
It is like an IDE through which we can design, test, and deploy data quality plans. We can execute tests and plans through the workbench. It contains the Project Manager and File Manager on the left, and a workspace on the right where the plans are designed. Workbench offers 50 data components that we can use in our plans.
Data Quality Server
It is used to run plans in a networked environment. We cannot create or edit plans on the server. It communicates with the workbench through TCP/IP connection. It also enables plans and file sharing across networks.
Both the workbench and server will be installed with a Data Quality engine and a Data Quality repository.
IDQ Workbench Match Algorithms
IDQ Workbench offers four algorithms that we can select from, to perform matching analysis.
Hamming Distance algorithm
The hamming distance algorithm is useful when the positions of characters in a string are essential, for example, dates, telephone numbers, postal codes, etc. The strings to be analyzed should be of the same length because it implements transposing of one string into another.
It is useful when the prefix of the string is essential. It measures the match percentage of the characters of two strings. It also calculates the number of transpositions required to change one string to another.
Edit Distance algorithm
It is useful for matching small strings like name or short address field. This algorithm is an implementation of the Levenshtein distance algorithm, and that helps in calculating the number of operations needed to transform one string into another. The operations include insertion, deletion, or substitution of characters.
Bigram or Bigram frequency algorithm
It is useful for searching through long text strings like free format address lines and creates pairs of consecutive characters from both data strings and compares them to find common pairs. It will give a match score based on the common identical pairs between the two search strings.
A dictionary in IDQ refers to a data set that we can use to evaluate data in sources and mapping. When we apply dictionaries to a mapping, it will compare each input field in the mapping against the dictionary, and performs the specified actions. There are two types of dictionaries available in Informatica.
We can add a table in a database as a reference dictionary by using the relational dictionary. To connect to a table, we need to provide an ODBC data source, username, password, etc.
Flat File Dictionary
We can add a file from your local computer as a reference dictionary using the flat file dictionary. To read the data from the file, we need to give the name, description, and upload the file from your local computer.
Access level controls in IDQ
An organization implements role-based control to give access to individual users for specific data. Here are some of the types of roles that you want to define in your data quality project.
frequently asked frequently asked idq interview questions & answers for freshers & experienced professionals
The platform administrator installs software, performs version upgrades and emergency bug fixes. This person is responsible for maintaining subscription content.
An effort administrator is a front-line manager (like a project lead) for the project. This person can either grant access or approve access to project resources.
A developer builds mappings and workflows in IDQ workbench by taking advantage of the Effort Administrator's service connections. The developer also uses the full-featured model repository.
An operator is the front-line reviewer of results. This person manages the platform's effort to run data quality artifacts in the published and internal project folders.
An analyst manages specifications, reference tables, and scorecard notifications. This person is responsible for the identification of all data quality issues. The analyst role also includes all the capabilities of a basic analyst.
A reports developer creates and modifies reports using the developer tool and iReportsDesigner. The generated reports point to the dashboards and reports template star schema.
Integrating IDQ with MDM projects
Data cleansing will be a value-added feature for Master Data Management (MDM) project. We can easily integrate IDQ with MDM in three ways.
Informatica Platform Staging
Informatica has introduced this feature from version 10.x. Using platform staging, we can integrate MDM with IDQ thorough a setup. The setup requires configuring MDM hub, platform components, and connections to the data sources. Once the integration is complete, the tables will be available in the developer tool.
IDQ Cleanse Library
We can create functions in IDQ as operation mappings and deploy them as web services. These web services can be imported to Informatica MDM hub as a cleanse library. Features like delta detection, hard delete detection, audit trail are available in this process.
Informatica MDM as target
We can use Informatica MDM as a target for loading the data to landing tables in Informatica MDM. This way, we can create only one connection instead of multiple. Features like delta detection, hard delete detection, audit trail are available in this process.
Both the Informatica PowerCenter and Informatica Data Quality tools have their features that serve different purposes.
Informatica PowerCenter is an ETL tool that extracts, transforms, and loads data. Informatica Data Quality ensures the highest quality of data.
We can create re-usable rules and validations in Data Quality and integrate them into PowerCenter.
Most of the transformations available in PowerCenter are also available in Data Quality. In addition to them, Data Quality has some more transformations.
The way we use passive transformation in PowerCenter is different from IDQ.
Using IDQ ensures that only consistent data is in use across the organization. The customer holds complete control of the transformations, validations, and rules applied through mappings. We can even identify distinct patterns available within the data. IDQ is the best possible way to achieve the highest quality of data. It generates profiling reports and Data Quality reports. We can validate duplication, conformity, and integrity of data with this tool.
5th April | 08:00 AM