Ab Initio Interview Questions

Understanding the data plays a key role in an organization, as it impacts decision making. To uncover the hidden insights from the data, we have to process it by applying some techniques. Processing data generates organized information that is easy to understand. So, the results will be accurate and reliable.

1. What are the components of Ab Initio architecture?

Ans: The following are the components of Ab Initio architecture.

  • Co>Operating system (Co>Op)
  • Component Library
  • Graphical Development Environment (GDE)
  • Enterprise Meta>Environment (EME)
  • Data Profiler
  • Conduct>IT

2. Tell me about the Co>Operating system in Ab Initio.

Ans: Co>Operating System operates on the top of the operating system and works as a base for all Ab Initio processes. It can run on operating systems like Windows, Linux, Solaris, AIX, HP-UX, and z/OS. It provides Ab Initio extensions through which ETL processes can be controlled. It manages metadata by interacting with EME, manages, and runs Ab Initio graphs.

3. Explain the relation between eme, gde, and co-operating system.

Ans: Co>Operating System is installed on the operating system. Enterprise Meta>Environment (EME) is nothing but a repository for storing and managing metadata. Graphical Development Environment (GDE) is the graphical application for designing and running Ab Initio graphs. Users can access the EME metadata through the GDE web browser or Co>Operating system command line.

4. What do you mean by dependency analysis in Ab Initio?

Ans: The dependency analysis is a process through which the EME analyzes the project for the dependencies within and between the graphs. It examines the entire project and tracks how data is being transformed and transferred from component to component and field by field. The steps involved in dependency analysis are translation and analysis.

5. How is Ab Initio EME segregated?

Ans: The Ab Initio EME is logically segregated into data integration portion and user interface to access metadata information.

6. What do you know about overflow errors?

Ans: Overflow errors are the errors raised when dealing with processing bulky sets of data. While processing data, bulky calculations might not fit the memory allocated for them. And when a character of more than 8-bits is stored, an overflow error is raised.

7. How to connect EME to Ab Initio Server?

Ans: EME can be connected to the Ab Initio server in several ways. The following are the ways to connect to EME.

8. What are the file extensions used in Ab Initio?

Ans: Below are the file extensions used in Ab Initio.

  • .mp - graph or graph component
  • .mpc - custom component or program
  • .dbc - database table files
  • .dat - data files
  • .mdc - dataset template files or custom dataset components
  • .ksh - shell scripting file
  • .xfr - transform function files
  • dml - record format files

9. What are the kinds of layouts that Ab Initio supports?

Ans: A layout defines which component should run where. Ab Initio has two kinds of layouts. 

  • Serial layout - the level of parallelism is 1.
  • Parallel layout - the level of parallelism depends on the data partition.

10. How to add default rules in the transformer?

Ans: Go to component properties, navigate to the parameter tab page, and double click on the transform parameter. The transform editor page will open. Click on the edit menu and select the 'Add Default Rules' option from the drop-down. You can choose from Match names and Wildcard options.

11. What is a local lookup?

Ans: The local lookup function will be used before the lookup function call when the lookup file is a multifile and partitioned/sorted on a particular key. It will be local to a partition, depending on the key. The data records in the lookup file can be loaded into memory. This way, the transform function retrieves records faster than retrieving from disk.

Ab initio Online Training

  • Master Your Craft
  • Lifetime LMS & Faculty Access
  • 24/7 online expert support
  • Real-world & Project Based Learning

 

12. What is the difference between a lookup file and lookup?

Ans: A lookup file represents one or more serial files, also known as Flat files. It is a physical file where lookup data is stored, which is small enough to be held in memory. Lookup is a component of the Ab Initio graph where data resides along with a key parameter. The data can be retrieved using this key parameter.

13. What information does a .dbc file extension provide to connect to the database?

Ans: The .dbc file provides the below information to connect to a database.

  • Name and version number of the database
  •  Name of the system on which the database server or instance is running. 
  • Name of the server or database instance

14. How can you execute a graph infinitely in Ab Initio?

Ans: Calling the .ksh file of the graph at the end script runs the graph infinitely. If the graph name is xyz.mp, then the end script of the graph should make a call to xyz.ksh. 

15. Explain the different types of parallelism used in Ab Initio.

Ans: 

  • Ab Initio provides three types of parallelism.
  • Component parallelism - It is used by the graph that has multiple processes executing simultaneously on separate data.
  • Data parallelism - It is used by the graph that works with data divided into segments, which operates on each segment, respectively. 
  • Pipeline parallelism - It is used by the graph that deals with multiple components executing simultaneously on the same data.

16. What do you mean by Sort Component in Ab Initio?

Ans: The Sort Component in Ab Initio is used for reordering data. It contains two parameters.

  • Key - It represents the collation order.
  • Max-core - It defines how often the sort component should dump data from memory to disk.

17. What does the dedup component and replicate component do?

Ans: 

  • Dedup component - It is used to remove duplicate records from the flow based on a specified key.
  • Replicate component - It is used to combine input records from multiple sources into one flow and write a copy of that flow to output ports.

18. What are the types of partition components in Ab Initio?

Ans: Ab Initio has the following partition components.

  • Partition by Round-Robin - It distributes the data evenly in block size chunks.
  • Partition by Range - Based on a set of partitioning ranges and key, it divides data evenly among nodes.
  • Partition by Key - It partitions data based on a key.
  • Partition by Percentage -  It distributes data in such a way that the output is proportional to fractions of 100.
  • Partition by Expression - It divides data based on a DML expression.
  • Partition by Load balance - It distributes data based on dynamic load balancing.

19. What is a surrogate key?

Ans: The system generated unique sequential number is called a surrogate key. It acts as a primary key. 

20. Explain about the sandbox.

Ans: A sandbox is like a directory that contains a collection of Ab Initio graphs and related files. It is local to any user and is a replica of the Project in the EME. It will be helpful for version control, migration, and navigation.

21. What is the relation between limit and ramp?

Ans: Limit and ramp are used to set the reject tolerance of a graph. Limit is the number of rejects and ramp is rate of rejection. The formula for rejects tolerance is,


limit + (ramp*no_of_records_processed)

Subscribe to our youtube channel to get new updates..!

 

22. How will you handle it if DML is changing dynamically?

Ans: We have a lot of ways through which we can handle dynamically changing DML. Some of the methods are,

  • Use a conditional DM
  • Call the vector functionality while calling the DMLs
  • Use the MULTI REFORMAT component

23. What are the air commands used in Ab Initio?

Ans: Here are some of the air commands in Ab Initio.

  • Air object ls <EME Path for the object – /Projects/edf/.. >
  • Air object rm <EME Path for the object – /Projects/edf/.. >
  • Air project modify <EME Path for the project – /Projects/edf/.. >
  • Air object versions -verbose <EME Path for the object – /Projects/edf/.. >
  • Air lock show -user <UNIX User ID>
  • Air sandbox status <file name with the relative path>

24. What are the types of output formats that we can get after processing data?

Ans: The output can be of the following formats.

  • Charts
  • Tables
  • Vectors
  • Plain Text files
  • Maps
  • Raw files
  • Image files

25. Explain about the rollup component.

Ans: A rollup component is used to group the records based on certain field values. It is a multi-stage transform function which contains functions like initialize, rollup and finalize.

26. What are is_valid and is_define used for?

Ans:

  • is_valid - It is used to test if a value is valid or not. If the expression is a valid data item, the value will be 1. If the expression is not a valid data item, the value will be 0.

  • is_define - It is used to test if an expression is not NULL. If the expression is non NULL, the value will be 1. The value will be 0 otherwise.

Ab initio Online Training

Weekday / Weekend Batches

 

27. How does the force_error function work?

Ans: If any mentioned conditions are not met, the force_error it forces an error. It will be useful when you want to stop the execution of a graph if it doesn't meet the set condition. It will send the records to the reject port and error message to the error port. 

28. How to improve the performance of a graph?

Ans: We can improve the performance of a graph through the following methods.

  • Use lookup instead of join and merge components.
  • When we have to join two files and don't want duplicates, use a union function instead of a duplicate remover.
  • Minimize the use of sort components.
  • Reduce the use of regular expression functions in the transfer functions.
  • Don’t use broadcast as partitioner for large datasets.
  • Use only the fields that are required in sort, reformat, and join components.

29. List the commonly used components in an Ab Initio graph?

Ans: The commonly used components in an Ab Initio graph are,

  • Input file/output file
  • Lookup file
  • Input table/output table
  • Join
  • Sort
  • Partition
  • Partition by key
  • Gather
  • Reformat
  • Concatenate

30. Explain the difference between conventional loading and direct loading.

Ans: 

  • Conventional load - All the table constraints are checked against the data before loading it.
  • Direct load - All the table constraints will be disabled, and the data is loaded directly. After the data load is done, the table constraints will be checked against the data.

Conclusion

Major companies like American Express, Citi Bank, JP Morgan Chase, Time Warner Cable, Home Depot, Premier, etc., use Ab Initio for their data processing and integration needs. The customers include 20% of Computer Software, 10% of Information Technology and Services, 9% of Higher Education, 9% of Education Management, etc. It has a market share of 5.12%. The Ab Initio developer and admin job posts are very high in demand. So, prepare well on the basics of Ab Initio, and you will have a high chance of cracking the interview.

Submit an interview question

Categories

Request for more information

Mudassir
Mudassir
DevOps ERP and IAM tools
Mudaasir is a programming developer for hkr trainings. He has a well knowledge of today’s technology and I’ve loved technology my entire life. And also been lucky enough to work for the programmer including science and technology. Big thanks to everyone who has followed me on LinkedIn and twitter.

WhatsApp
To Top