Azure Data Factory

Azure Data Factory is a cloud integration support tool of Microsoft Corporation. In this blog, we are going to explore all the features in Azure Data Factory as compared to the SSIS (SQL Server Integration Services) and will learn how it helps in solving the real-life data integration problems. Azure Data Factory (ADF) offers integration platform services with many different data sources. The Azure platform builds hybrid Extract Transform Load(ETL), Extract Load Transform (ELT) and other integration pipelines for providing solutions. With ADF you can able to perform the below functionalities. Copying the data from various sources of on-premises and cloud. Data transformation. Publishing the data which is copied and transformed and sending it for destination storage. Data flow monitoring.

What is Azure Data Factory?

Azure Data Factory (ADF) is a cloud integration system. The data can be moved between on-premises and cloud systems, scheduling and orchestrating data flows with the support of ADF. The platform of ADF is mostly based on Extract and Load Transform and Transform and Load rather than ETL (Extract Transform and Load) platforms. The following approaches are used in achieving the Extract and Load.

The data transfer between file systems and database systems which are located on-premises and on cloud is configured with simple ADF built-in features. Various databases such as SQL Server, Oracle, MySQL, DB2, Azure SQL Database, Azure Data Lake, blob storage, local file system and HDFS can be connected with ADF.
The SISS packages are initiated using ADF which can implement more sophisticated data movement and transformation tasks.

Purpose of using Azure Data Factory

These are the challenges which are faced by the Azure Data in moving data to or from the cloud. The below measures explains the purpose of using ADF.

Scheduling Jobs and Orchestration

The SQL Agent services like Azure scheduler and Azure automation will trigger the data integration tasks to move the data. The features like scheduling the jobs are also included in ADF. Event-based dataflows and dependencies are allowed in ADF.

Scalability

Within a few hours, the large volume of gigabytes of data can be transferred into the cloud. ADF has features like build-in parallelism and time slicing which can handle large volumes of data.

Security

ADF ensures security by encrypting the data transit between on-premises and on cloud sources.

Less Coding 

The ADF v2 provides an interactive interface which requires less coding in developing the components with the Azure portal and this is configured with JSON files.

Integration and delivery

ADF can integrate with GitHub to develop and deploy the build automatically into Azure. The entire configuration will be downloaded as Azure ARM Template and used to deploy ADF in other environments. The skilled PowerShell developers can able to create and deploy all the components of ADF.

Overview of components in Azure Data Factory

The following are the components in Azure Data Factory. To understand how ADF works it is important to know about the components of ADF. The below picture represents the ADF component resources comprising a pipeline, two datasets and one data flow.

1.Connectors 

The connectors are the linked services that are configured with the settings for accessing certain data sources. The setting preferences include server/database name, file folder, credentials etc. Each data flow may have one or more linked services which are dependant on the job nature.

2.Datasets

The configuration settings for Dataset include table name, file name, structure etc. Each dataset is referred by a linked service which determines a list of possible dataset properties.

3.Activities

The activities are the actions which are performed like data movement, data transfer. The activity configurations settings include database query, stored procedure name, parameters, script locations etc.

4.Data flows

The data flows allows the data engineers to develop a transformation logic visually without writing code. These data flow activity types are executed in ADF Pipeline on Azure Databricks for scaled out processing using Spark. ADF handles large amounts of data by controlling the data flow execution and code translation.

Microsoft Azure Certification Training

  • Master Your Craft
  • Lifetime LMS & Faculty Access
  • 24/7 online expert support
  • Real-world & Project Based Learning

5.Pipelines

The logical group of activities are called pipelines. A data factory can have one or more pipelines and each pipeline may have one or more activities. The task scheduling and monitoring the multiple logical activities become easy with pipelines.

6.Triggers

Scheduling the configuration for pipelines are called triggers. The settings include start/end date, execution frequency etc.


7.Integration runtime(IR)

ADF provides the data movement, compute capabilities across different network environments by running this integration service. It is a complete infrastructure made of below main runtime types.

Azure IR: It provides a fully managed serverless compute in Azure which handles the data moment activities in the cloud.
Self-hosted IR: The copy activities between a cloud data store and a data store in a private network along with transformation activities are managed by Self-hosted IR.
Azure SSIS IR: The SSIS packages are executed with Azure SSIS IR.

The below picture represents the overview of ADF and the relationship between different ADF entities of data set, activity, pipeline and linked services.

IMAGE

What makes Azure Data Factory different from other ETL Tools?

The features that distinguish in using ADF from other ETL tools are.

  • Running SSIS packages.
  • Fully managed PaaS product that auto-scales by given workload.
  • Gateway to bridge on-premise and Azure cloud.
  • Handling large data volumes.
  • Connecting and working together with other computing services such as Azure Batch, HDInsights.

Subscribe to our youtube channel to get new updates..!

Working with Azure Data Factory

Working with Azure Data Factory is very easy and simple. The ADF is designed with GUI features which offer to create/manage the activities and pipelines by reducing the coding effort. Only the complex transformations require coding skills.

Features in Azure Data Factory

The ADF contains the default connectors with all data sources including MySQL, SQL Server, Oracle DBs.

  • The branching feature support triggers the output of one activity to start another activity.
  • The tumbling window trigger feature supports creating the partitioned data and an event trigger automatically triggers a transformation for an event. 
  • ADF enables the parameters dynamically between datasets, pipelines and triggers. 
  • ADF provides the monitoring and alerting feature which can monitor the execution of different pipelines and also can set up alerts during failures.
  • ADF works well with Azure Databricks in scheduling the Machine Learning Algorithms.

Microsoft Azure Certification Training

Weekday / Weekend Batches

 Working with other Azure resources

The ADF integrates well with other Azure compute and storage resources with linked services that define the connection to the external resources. You can define two kinds of linked services.

Data Store service
This linked service provides the data storage services for Azure SQL Database, a Data Lake,  Azure SQL Data-warehouse, a filesystem, an on-premises database, a NoSQL DB, etc.

Compute service
This service is used in transforming and enriching the data for Azure HDInsight, Azure Machine Learning, Stored Procedure in any SQL, U-SQL activity, Data Lake Analytics Azure Databricks and/or Azure Batch.

IMAGE

Conclusion
The Azure Data Factory is a very unique application which can transform and enrich the complex data. It is very easy to integrate the cloud with on-premises data. The delivery of integration services is scalable and available at low costs that can develop the data flow building blocks for any data platform and machine learning projects.

Categories

SAP

Request for more information

Webinar

Python tutorial for beginners

5th April | 08:00 AM

150 Registered

John
John
Cloud Technologies & Cyber Security
John is a Post Graduate in Computer Science from Andhra University .She is currently working as an IT developer at hkr trainings.com. And he has great experience includes both IT development and operational roles. Connect with him on LinkedIn and Twitter. Thank you

WhatsApp
To Top