Quicken your software testing skill with recently designed HKR’s ab initio tutorial. This tutorial will cover contents like introduction, development testing, data extractions, and tester’s role, process, and data accuracy techniques. The ab initio is also known as the ETL tool, where ETL stands for extract, test, and load the heterogeneous data sources. With the help of this tutorial, you will be able to learn ETL fundamental techniques. This ab initio tutorial is designed for those who want to learn the basic ETL testing techniques. This is one of the crucial testing techniques for all software testing professionals. Let’s begin this tutorial;
Ab initio software is an American multinational enterprise which is located in Lexington, Massachusetts. Ab initio is also known as ETL testing tool, this composes six fundamental components such as cooperating systems, the component library, graphical development environment, enterprise meta environment, data profiler, and conduct environment set up. This is one of the powerful GUI-based parallel process tools developed for ETL data management and analysis tools. ETL tool is mainly used to load heterogeneous data sources in data warehouse applications. Ab initio ETL performs the following three operations:
1. ETL tool extracts the data from the transactional system. The software included is Oracle, IBM, and Microsoft.
2. Used to transform the data to a data warehouse system by performing data cleansing operations.
3. Finally loads the data into the OLTP data warehouse system.
To gain in-depth knowledge with practical experience, then explore Ab Initio Training
Ab initio is a suite of data applications that contains various data warehouse components. Generally when people call it “Ab initio” that means Ab Initio cooperating system. This is also known as a graphical user interface application-based ETL tool. This gives the ability to drag and drop the various components and attach them. Ab initio is an ETL parallel processing application tool and used to handle a large amount of data volume.
The components included are:
1. Co> operation system
2. Enterprise Meta Environment (EME)
3. Additional data tools
4. Data profiler application
5. PLAN information technology system.
Ab initio architecture explains the work nature and overall structure of the ETL tool:
The below diagram explains the Ab initio architecture in detail:
Ab initio means “starts from the beginning”. This software works well with the client-server model. Here the client system is called graphical development environment or GDE; this resides on your desktop. The server or back-end system is known as a co-operating system. This server-side cooperating system tool is located in a mainframe or UNIX remote machine. In general, the ab initio code is also known as a graph, which is a got .mp extension. This graph GDE is needed to deploy the .ksh or K-shell version. The cooperating system is used to run the .ksh files to perform the required tasks. As I said earlier ab initio is an ETL tool vendor and gradually emerging as the strongest player in the application data integration system with mission-critical components. They are:
1. ETL or data warehousing system
2. Real-time data analytical tool
3. CRM or customer relationship management
4. EAI or enterprise application integration
Ab initio offers a robust based architecture model, this provides a simple, fast, and highly secured data integration application system. This tool also integrates the diverse, continuous, and complex data stream which can range from gigabytes to terabytes.
1. ETL tool is mainly used to extract the data from various data sources, transform the extracted data, and load them into the data warehouse system.
2. Business integration tool is used to create interactive and ad-hoc data reports for end-users, dashboards, data visualizations, and senior lever data managements.
3. The most common ETL tools included are SAP BO data service technology (BODS), data Informatica, power center, MSIS, Oracle data integrator ODI tool, and clover ETL open-source tool.
4. Some popular BI tools included are SAP business objects, IBM Cognos, SAP lumira, Jasper soft, Microsoft BI platforms, Oracle business intelligence platform, and many more.
In this section, we are going to explain the application integration process by using ab initio. The below diagram explains the design architecture and overall structure used to integrate the data from the disparate source into the data warehouse and loaded them into the CRM application tool.
The major challenges included are:
1. Multiple sources: here data available from different sources like mainframes or oracle tables using different techniques, and data formats with different load frequencies.
2. Complex business logic: here you are going to achieve the data format with target systems, data cleansing techniques, and generic entities.
3. Redundancy: multiple data source truth because of data duplications.
With the help of ab initio, a cost-effective system solution can offer batch or real-time execution mechanisms. Here the scalable solution that extracts the data from distributed systems transforms multiple data formats into a common format, this creates the data warehouse, operational data stores, and aggregation or derivation of business intelligence, and loads the data into target systems.
The following are the major key functions of the Ab ignition system tool:
1. Loads the data into operational data stores.
2. Use the metadata-driven rules engine to generate various codes.
3. Provides the PAI facility to perform query operations.
4. Graphical interface with database for data extraction and loads the data.
5. Offers delta and before-after images of data.
6. Feeds target system and reporting tools or message queues.
Below are the advantages of Ab initio methods:
1. Ab initio can give insights into folding mechanisms.
2. Helps in the understanding of protein misfolding.
3. Does not require homologs techniques.
4. Only way to model new data folds.
5. Ab initio model is useful for de novo protein design.
Let us know a little more about the ETL process:
Procedures are as follows:
The below diagram explains the ETL process:
In this step, you will be allowed to extract the data from multiple heterogeneous data sources. The data extraction process varies as per the organization’s requirements. This type of data extraction can be done by running various job schedules in off-business hours like suppose if you are running a job at night or weekend.
In this step, you will be able to transform the data into a suitable format, so that it can be easily loaded into a data warehouse system. The various types of data transformation included are applying data calculations, joins operations, and specifying the primary and foreign keys. These types of data transformations involved are correction of data, remove the incorrect data, incomplete formation of data, and fixing the data errors. This also performs data integration and formats the incompatible data loading into a data warehouse system.
In this step, you will be allowed to load the data into various data warehouse systems to perform analytical reports and information. The targeted system can store flat files and data warehouse information.
ETL tool is made up of 3 layer architecture that uses a staging area, access layers, and data integration to perform ETL (extract, transfer, and load) operations.
1. Staging layer: the staging layer or staging database is used to store the data which is extracted from different data source systems.
2. Data integration layer: The data integration layer transforms the different data from the staging layer and transfers them to a database, here data will be arranged into a hierarchical group is known as dimensions and converted into facts or aggregate facts. This combination of aggregated facts and dimension tables in a data warehouse table is known as a schema.
3. Access layer: this access layer is mainly used by end-user to retrieve the data for data analytical reporting and information storage.
The below diagram explains the functions of the ETL tool:
ETL testing is done before the data is transferred into a production warehouse system. This process is known as table balancing or product reconciliation. ETL testing is different from another database testing in terms of scope and important steps to be taken.
In ETL testing important tasks to be performed:
1. Understanding the data to be used for analytical and reporting purposes.
2. Review the data model architecture
3. Source to perform target mapping
4. Helps to check the data on various sources.
5. Schema and packaging validations
6. Helps to perform data verifications in the target data source system
7. Verification of data transmission, calculations, and aggregation rules.
8. Sample data comparison between any source system and the target system
9. Data integration and data quality checks in the target system source.
10. Helps to perform various data testing
Frequently asked Ab Initio Interview Questions
Now it’s time to know the difference between ETL testing and database testing:
Both ETL testing and database testing involve data validation operations. But these two testing techniques are not the same. ETL testing is mainly used to perform on the data sources in the data warehouse system. Database testing is performed on the transactional system source here the data comes from various different applications into the transactional database.
ETL testing involves the below operations:
1. Perform data validation movements from the source system to the target system.
2. Verification of multiple data counts in the source system and the target system.
3. Verifies the data extraction, data transmission as per the requirements.
4. Verifying the table relations like joins and keys are preserved during the time of data transformation.
Database testing performs the following operations:
1. Verifies and maintains the primary and foreign keys.
2. Verifying the data column in a database table that contains valid data values.
3. Verifying the data accuracy in data columns. For instance: the number of months column should not exceed a value greater than 12 months.
4. Verifying the missing data values in the column. This is used to check if there are any null values present or not.
Ab initio ETL testing can be categorization on the basis of objectives testing and data reporting. The ETL testing is categorized on the below points.
1. Source to target count testing:
This type of testing category involves matching the count of data records in both the source and target systems.
2. Source to target data testing:
This type of testing category involves data validation between the source and the target systems. Here it helps to perform data integration and threshold data value check and also eliminate the duplicate data value in the target system.
3. Data mapping or data transformation testing:
This type of testing category confirms the data mapping of any object in both the source and target systems. This also involves checking the functionality of data in the target system.
4. End-user testing:
This involves the generation of data reports for end-users to verify if the available data in the reports as per your requirements. With the help of this testing, you can also find the deviations in the report and cross-checks the data in the target system to validate the report.
It involves fixing the bugs and defects in the data in the target system and running the data reports to perform data validation.
6. System integration testing:
This involves testing all the individual source systems and then combines the results to find if there are any deviations. There are three main approaches available here:
1. Top-down approach
2. Bottom-up approach
3. Hybrid approach
ETL testing can also be divided into the following categories:
1. New data warehouse system testing:
In this type of ETL testing, your new data warehouse system can be built and verify the data. Here the input data can be taken from end-users or customers forms the different data sources then a new data warehouse is created.
2. Migration testing:
In this type of testing, customers will have an existing data warehouse and ETL tool, but here they look for a new ETL tool to improve the data efficiency. This testing involves data migration from the existing system by using a new ETL tool.
3. Change testing:
In this change testing, new data will be added from the different data sources to an existing system. Here customers can also change the existing ETL rules or new rules can also be added.
4. Report testing:
In this type of testing, the user can create reports to perform data validations. Here the reports are the final output of any data warehouse system. Report testing can be done on the basis of layouts, data reports, and calculated values.
Related Article:AWS vs Azure
Ab Initio ETL testing techniques play a vital role and you need to apply these techniques before performing the actual testing process. These testing techniques should be applied by the testing team and they must be aware of these techniques. There are various types of testing techniques available, they are:
This type of testing technique is used to perform analytical data reporting and analysis function, and also checks for the validation data production in the system. The production validation testing can be done on the data that is moved to the production system. This technique is considered to be a crucial testing step as it involves a data validation method and compares these data with the source system.
This type of testing can be performed when the test team has less to perform any type of testing operations. The targeted testing checks the data count in the source as well as target systems. One more point to be considered here is it doesn’t involve ascending or descending the data values.
In this type of testing, a test team involves invalidating the data values from the source system to the target system. This checks the corresponding data values also available in the target system. This type of testing technique is also time-consuming and mainly used to work on banking projects.
In this type of testing, a test team will validate the data ranges. All the threshold data values in the target system will be checked if they are expecting the valid output. With the help of this technique, you can also perform data integration in the target system where data is available from the multiple data source system once you finish the data transmission and loading operations.
Application migration testing is normally performed when you move from an old application to a new application system. This type of testing saves a lot of your time and also helps in data extraction from legacy systems to a new application system.
This testing includes the various types of checks such as data type check, index check, and data length check. Here the testing engineer performs the following scenarios- primary key, foreign key, NULL, Unique, and NOT NULL checks.
This testing technique involves checking for the duplicate data situated in the target system. When there is a huge amount of data residing in the target system. It is also possible that there are also duplicate data available in the production system.
The following SQL statement used to perform this type of testing technique:
Select Customer_ID, Customer_NAME, Quantity, COUNT (*)
GROUP BY Customer_ID, Customer_NAME, Quantity Having COUNT (*) > 1;
Duplicate data appears in the target system because of these reasons:
1. If no primary key is specified, then the duplicate values may come.
2. This also arises due to incorrect mapping and environmental data issues.
3. Manual errors arise while transferring data from the source system to the target system.
The data transformation testing is not performed by using any single SQL statement. This testing technique is time-consuming and also helps to run multiple SQL queries to check for the transformation rules. Here the testing team needs to run the SQL statement queries and then compare the output.
This type of testing includes operations like number check, date check, null number check, and precision check, etc. Here the testing team also performs syntax tests to check any invalid characters and incorrect upper or lower case order. Reference tests also are done to check if the data is available according to the data model.
Incremental testing can be performed to verify if the data insert and update the SQL statements. These SQL statements are executed as per the expected results. This testing technique is performed step by step with old and new data values.
When you make changes to any data transformation and data aggregation rules used to add new functionality. This also helps the testing team to find a new error is called regression testing. The bug also comes in regression testing and this process is known as regression.
When the test team runs the tests after fixing any bugs is called retesting.
System integration testing includes the testing of system components and also integrates the modules. There are three ways of system integration available such as top-down, bottom-up approach, and hybrid method.
Navigation testing is also called front-end system testing. This involves the end-user point of view testing can be performed by checking all the aspects such as various fields, aggregation, and calculations.
This Ab Initio testing involves ETL lifecycles, and also helps to a better understanding of the business requirements.
Below are the common steps included in the ETL life cycle:
1. Helps in the understanding of the business requirements
2. Validation of the business requirement process.
3. Test estimation step is used to provide the estimated time to run the test cases and also completes the test summary process report.
4. Test planning methods involve finding the testing techniques based on the data inputs as per the business requirements.
5. Helps in the creation of test scenarios and test cases.
6. Once the test cases are ready to perform testing and approved, the next step is to perform any pre-execution check.
7. Enables you to execute all the test cases.
8. The last and final step to create a complete test summary report and file a closure test process.
The Ab initio ETL testing can be done with the help of SQL scripts and data gathering in spreadsheets. This type of approach makes ETL testing is very low, time-consuming, and error-prone. The most commonly used ETL testing tools are namely Query Surge and Informatica data validation process.
Query surge is nothing but a data testing solution that is designed to perform big data testing, data warehouse testing, and the ETL process. This process can automate the entire test process and fit nicely into your DevOps strategy.
The key feature of query surge is as follows:
1. It consists of query wizards to generate test Query pairs fast and easily without having to write your SQL statements.
2. This has a design library with reusable query snippets. Also, you can generate custom query pairs.
3. This process can compare data values from source files and data stored in the target data warehouse of big data.
4. This can also compare the millions of rows and columns of data values in minutes.
5. Query surge also allows the user to schedule any tests to run immediately at any time or date.
6. This can also produce informative data reports, view data updates, and auto email the result of your team.
To automate any process, the ETL tool should start with query surge using command API once this completes the ETL software load process. The query surge process runs automatically and the UN attends and executes all the test processes.
Informatica data validation offers an ETL testing tool that helps the testing team to accelerate and automate the ETL testing process during the time of development and production environment. This validation process helps you to deliver, complete, repeatable, and auditable test coverage in less time duration. And this validation process does not require any programming skill sets.
In this ab initio tutorial, we have explained the architecture overview, definition, integration, automation process, ETL function tool, ETL work nature, and testing techniques. Learning this Ab initio testing tool will help you to become a master in automation and ETL tools. This ab initio tool helps you to correct invalid data fields, and apply calculations. The ab initio tutorial is specially designed for those who want to begin their career in ETL testing. And this also is useful for those software testing professionals who are willing to perform data analysis to extract processes. Our technical team provides 24/7 online support for these course-related doubts.
Batch starts on 30th Sep 2022, Fast Track batch
Batch starts on 4th Oct 2022, Weekday batch
Batch starts on 8th Oct 2022, Weekend batch