SlideShare a Scribd company logo
1 of 43
Download to read offline
© 2018 YASH Technologies | www.yash.com | Confidential
Azure Data Factory
- Mahesh Pandit
2
Agenda
 Why Azure data Factory
 Introduction
 Steps involves in ADF
 ADF Components
 ADF Activities
 Linked Services
 Integration Runtime and its
types
 How Azure Data Factory works
 Azure Data Factory V1 vs V2
 System Variables
 Functions in ADF
 Expressions in ADF
 Question- Answers
3
© 2018 YASH Technologies | www.yash.com | Confidential
Why Azure data Factory
 Modern DW for BI
 Modern DW for SaaS Apps
 Lift & Shift existing SSIS
Pkgs. to Cloud
4
Why Azure Data Factory
Azure SQL
DW
Azure Data Lake
Azure Data Factory
5
Modern DW for Business Intelligence
Log, Files & Media
(Unstructured)
On Prem., Cloud
Apps & Data
Business/Custom apps
(Structures)
Data Factory
Data Factory Azure Storage Azure Databricks
Spark
Ingest Store Prep & Train Model & Serve Intelligence
Azure SQL Data warehouse
Azure Analysis services
Analytical
Dashboards
(Power BI)
Azure Data Factory orchestrates data pipeline activity work flow & scheduling
6
Modern DW for SaaS Apps
Log, Files & Media
(Unstructured)
On Prem., Cloud
Apps & Data
Business/Custom apps
(Structures)
Data Factory
Data Factory Azure Storage Azure Databricks
Spark
Ingest Store Prep & Train Model & Serve Intelligence
SaaS App Browser/Devices
Azure Data Factory orchestrates data pipeline activity work flow & scheduling
App
Storage
7
Lift & Shift existing SSIS packages to Cloud
Cloud
On Premise
On-Premise Data Sources SQL Server
Azure Data Factory orchestrates data pipeline activity work flow & scheduling
Data Factory
Cloud Data Sources
SQL DB Managed Instance
VNET
8
Introduction
Azure Data
Factory
Cloud-based
Integration Service
 It is cloud-based integration service
that allows you to create data-
driven workflows in the cloud for
orchestrating and automating data
movement and data transformation.
 Scheduled data-driven workflows.
 Sources and Destinations can be
either on-premise or cloud.
 Transformation can be done using
Azure HDInsight Hadoop, Spark,
Azure Data Lake Analytics and ML.
9
How does it work?
 The pipelines (data-driven workflows) in Azure Data Factory typically
perform the following four steps:
10
© 2018 YASH Technologies | www.yash.com | Confidential
Steps involves in ADF
 Connect and collect
 Transform and enrich
 Publish
 Monitor
11
Connect and collect
 The first step in building an information production system is to connect to all the
required sources of data and processing, such as software-as-a-service (SaaS)
services, databases, file shares, and FTP web services.
 With Data Factory, you can use the Copy Activity in a data pipeline to move data
from both on-premises and cloud source data stores to a centralization data store in
the cloud for further analysis.
 For example, you can collect data in Azure Data Lake as well in Azure Blob
storage.
12
Transform and enrich
 After data is present in a centralized data store in
the cloud, process or transform the collected data
by using compute services such as
 HDInsight Hadoop
 Spark
 Data Lake Analytics
 Machine Learning.
13
Publish
 After the raw data has been refined into a business-ready consumable form, load the data into Azure Data
Warehouse, Azure SQL Database, Azure Cosmos DB or many more as per user’s need.
14
Monitor
 After you have successfully built and deployed your data integration pipeline, providing business value from
refined data, monitor the scheduled activities and pipelines for success and failure rates.
 Azure Data Factory has built-in support for pipeline monitoring via Azure Monitor, API, PowerShell, Log
Analytics, and health panels on the Azure portal.
15
© 2018 YASH Technologies | www.yash.com | Confidential
ADF Components
 Pipeline
 Activity
 Datasets
 Linked Services
16
ADF Components
DATA SET
(Table , File)
ACTIVITY
(hive, copy)
PIPELINE
(schedule, Monitor)
LINKED SERVICE
(SQL Server, Hadoop
Cluster)
Consume
Produces Is logical
Group of
Runs on
Represent
Data item
stored in
 An Azure subscription might have one or more Azure Data Factory instances (or data factories).
 Azure Data Factory is composed of four key components.
 These components work together to provide the platform on which you can compose data-driven workflows with
steps to move and transform data.
17
Pipeline
 A data factory might have one or more pipelines.
 A pipeline is a logical grouping of activities that performs a
unit of work.
 Together, the activities in a pipeline perform a task.
 The pipeline allows you to manage the activities as a set
instead of managing each one individually.
 The activities in a pipeline can be chained together to operate
sequentially, or they can operate independently in parallel.
 To create data factory pipeline, we can use any one of the
below method:
Data Factory UI Copy Data Tool Azure Power Shell Rest
Resource Manager Template .NET Python
18
Pipeline Execution
Triggers
 Triggers represent the unit of processing that
determines when a pipeline execution needs to
be kicked off.
 There are different types of triggers for different
types of events.
Pipeline Runs
 A pipeline run is an instance of the pipeline
execution.
 Pipeline runs are typically instantiated by
passing the arguments to the parameters
that are defined in pipelines.
 The arguments can be passed manually or
within the trigger definition.
Parameters
 Parameters are key-value pairs of read-only
configuration.
 Parameters are defined in the pipeline.
 The arguments for the defined parameters are
passed during execution from the run context
that was created by a trigger or a pipeline that
was executed manually.
 Activities within the pipeline consume the
parameter values.
Control Flow
 Control flow is an orchestration of pipeline
activities that includes chaining activities in a
sequence, branching, defining parameters at
the pipeline level, and passing arguments
while invoking the pipeline on-demand or
from a trigger.
 It also includes custom-state passing and
looping containers, that is, For-each
iterators.
19
© 2018 YASH Technologies | www.yash.com | Confidential
ADF Activities
 Data Movement Activities
 Data Transformation Activities
 Control Activities
20
Activity
 Activities represent a processing step in a pipeline.
 For example, you might use a copy activity to copy data from one data store to another data store.
 Data Factory supports three types of activities:
1. Data movement activities
2. Data transformation activities
3. Control activities.
Copy Activity
Copy Activity Azure Blob
Transformation
Activity
Copy Activity
Output data
Azure SQL
Data Warehouse
BI Tool
21
Data Movement Activities
 Copy Activity in Data Factory copies data from a source data store to a sink data store.
 Data from any source can be written to any sink.
…….
22
Data Transformation Activities
 Azure Data Factory supports the following transformation activities that can be added to pipelines either
individually or chained with another activity.
Compute
Environment
Data Transformation
Activity
Compute
Environment
Data Transformation
Activity
HDInsight
HDInsight
HDInsight
HDInsight
HDInsight
Azure SQL, Azure SQL DW
OR SQL Server
Azure VM
Azure Data Lake
Analytics
Azure Batch
Azure Databricks
23
Control Activities
 The following control flow activities are supported
Execute Pipeline Activity It allows a Data Factory pipeline
to invoke another pipeline.
For Each Activity
It defines a repeating control flow in your
pipeline.
Web Activity
It can be used to call a custom
REST endpoint from a Data Factory
pipeline.
Lookup Activity
It can be used to read or look up a record/
table name/ value from any external
source.
Get Metadata Activity
It can be used to retrieve metadata
of any data in Azure Data Factory.
Until Activity
It implements Do-Until loop that is similar to Do-
Until looping structure in programming languages.
It executes a set of activities in a loop until the
condition associated with the activity evaluates to
true.
If Condition Activity
It can be used to branch based on
condition that evaluates to true or
false.
Wait Activity
When you use a Wait activity in a pipeline, the
pipeline waits for the specified period of time
before continuing with execution of
subsequent activities.
24
Linked services
 Linked services are much like connection strings, which define the connection information that's needed for Data
Factory to connect to external resources.
 A linked service defines the connection to the data source.
 For example, an Azure Storage-linked service specifies a connection string to connect to the Azure Storage
account.
 Linked services are used for two purposes in Data Factory:
 To represent a data store that includes data stores located on-premises and in the cloud. E.g. Tables, Files,
Folders or Documents
 To represent a compute resource that can host the execution of an activity. For example, the HDInsightHive
activity runs on an HDInsight Hadoop cluster.
Tables
Files
HDInsight
Apache Spark
.......
Data Stores
Compute Resources
25
Integration Runtime
 Think it as a Bridge between 2 networks.
 It is compute infrastructure which provides capabilities across different N/W environments
Data
Movement
Activity
Dispatch
SSIS Package
Execution
 Copy data across data
stores in public network
and data stores in private
network (on-premises or
virtual private network).
 It provides support for
built-in connectors, format
conversion, column
mapping and scalable data
transfer.
 This capabilities are use
when compute services such
as Azure HDInsight, Azure
Machine Learning, Azure
SQL Database, SQL Server,
and more get used for
transformation activities.
 When SSIS packages
need to be executed in the
managed Azure Compute
Environment like HDInsight
then this capabilities are
used.
26
Integration runtime types
 These three types are:
IR type Public network Private network
Azure Data movement
Activity dispatch
Self-hosted Data movement
Activity dispatch
Data movement
Activity dispatch
Azure-SSIS SSIS package
execution
SSIS package
execution
27
How Azure Data Factory Works
Integration Runtime
Integration Runtime
Integration Runtime
Dataset
Dataset
Dataset
Pipeline
Activity Activity Activity
On-Premise
SQL Server
DB
28
© 2018 YASH Technologies | www.yash.com | Confidential
Data Factory V1 vs. V2
29
Data Factory V1 vs. V2
Data Factory V1
 Datasets
 Linked Services
 Pipelines
 On-Premises Gateway
 Schedule on Dataset
availability and Pipeline
start/end Time
Data Factory V2
 Datasets
 Linked Services
 Pipelines
 Self hosted Integration
Runtime
 Schedule triggers(time or
tumbling window)
 Host and Execute SSIS
Package Parameters
 New Control Flow
Activities
30
© 2018 YASH Technologies | www.yash.com | Confidential
System Variables
 Pipeline scope
 Schedule Trigger scope
 Tumbling Window Trigger scope
31
Pipeline Scope
These system variables can be referenced anywhere in the pipeline JSON.
@pipeline().DataFactory Name of the data factory the pipeline run is running within
@pipeline().Pipeline Name of the pipeline
@pipeline().RunId ID of the specific pipeline run
@pipeline().TriggerType Type of the trigger that invoked the pipeline (Manual, Scheduler)
@pipeline().TriggerId ID of the trigger that invokes the pipeline
@pipeline().TriggerName Name of the trigger that invokes the pipeline
@pipeline().TriggerTime Time when the trigger that invoked the pipeline. The trigger time is the
actual fired time, not the scheduled time.
32
Schedule Trigger Scope
 These system variables can be referenced anywhere in the trigger JSON if the trigger is of type:
"ScheduleTrigger."
@trigger().scheduledTime
Time when the trigger was scheduled to invoke the pipeline run.
For example, for a trigger that fires every 5 min, this variable would return 2017-06-
01T22:20:00Z, 2017-06-01T22:25:00Z, 2017-06-01T22:29:00Z respectively.
@trigger().startTime
Time when the trigger actually fired to invoke the pipeline run.
For example, for a trigger that fires every 5 min, this variable might return
something like this 2017-06-01T22:20:00.4061448Z, 2017-06-
01T22:25:00.7958577Z, 2017-06-01T22:29:00.9935483Zrespectively.
33
Tumbling window Trigger Scope
 These system variables can be referenced anywhere in the trigger JSON if the trigger is of type:
"TumblingWindowTrigger“.
@trigger().outputs.windowStartTime
Start of the window when the trigger was scheduled to invoke the pipeline run. If the
tumbling window trigger has a frequency of "hourly" this would be the time at the
beginning of the hour.
@trigger().outputs.windowEndTime
End of the window when the trigger was scheduled to invoke the pipeline run. If the
tumbling window trigger has a frequency of "hourly" this would be the time at the
end of the hour.
34
© 2018 YASH Technologies | www.yash.com | Confidential
Functions in Azure
 String Functions
 Collection Functions
 Logical Functions
 Conversion Functions
 Math Functions
 Date Functions
35
String Functions
Function Description Example
concat Combines any number of strings together. concat(‘Hi’, ‘team’) : Hi team
substring Returns a subset of characters from a
string.
substring('somevalue',1,3) : ome
replace Replaces a string with a given string. replace(‘Hi team', ‘Hi', ‘Hey') : Hey team
guid Generates a globally unique string guid() : c2ecc88d-88c8-4096-912c-d6
toLower Converts a string to lowercase. toLower('Two’) : two
toUpper Converts a string to uppercase. toUpper('Two’) : TWO
indexof Find the index of a value within a string
case insensitively.
indexof(Hi team', ‘Hi’) : 0
endswith Checks if the string ends with a value case
insensitively.
endswith(‘Hi team', ‘team') : true
startswith Checks if the string starts with a value
case insensitively.
startswith(‘Hi team', ‘team') : false
split Splits the string using a separator. split(‘Hi;team', ‘;') : [“Hi", “team“]
lastindexof Find the last index of a value within a
string case insensitively.
lastindexof('foofoo‘) : 3
36
Collection Functions
Function Description Example
contains Returns true if dictionary contains a key,
list contains value, or string contains
substring. .
contains('abacaba','aca')
: true
length Returns the number of elements in an
array or string.
length('abc')
: 3
empty Returns true if object, array, or string is
empty.
empty('')
: true
intersection Returns a single array or object with the
common elements between the arrays or
objects passed to it.
intersection([1, 2, 3], [101, 2, 1, 10],[6, 8,
1, 2])
: [1, 2]
union Returns a single array or object with all of
the elements that are in either array or
object passed to it.
union([1, 2, 3], [101, 2, 1, 10])
: [1, 2, 3, 10, 101]
first Returns the first element in the array or
string passed in.
first([0,2,3])
: 0
last Returns the last element in the array or
string passed in.
last('0123')
:3
take Returns the first Count elements from the
array or string passed in
take([1, 2, 3, 4], 2)
: [1, 2]
skip Returns the elements in the array starting
at index Count,
skip([1, 2 ,3 ,4], 2)
: [3, 4]
37
Logical Functions
Function Description Example
int Convert the parameter to an integer. int('100')
: 100
string Convert the parameter to a string. string(10)
: ‘10’
json Convert the parameter to a JSON type
value.
json('[1,2,3]') : [1,2,3]
json('{"bar" : "baz"}') : { "bar" : "baz" }
float Convert the parameter argument to a
floating-point number.
float('10.333')
: 10.333
bool Convert the parameter to a Boolean. bool(0)
: false
coalesce Returns the first non-null object in the
arguments passed in. Note: an empty
string is not null.
coalesce(pipeline().parameters.paramet
er1', pipeline().parameters.parameter2
,'fallback')
: fallback
array Convert the parameter to an array. array('abc')
: ["abc"]
createArray Creates an array from the parameters. createArray('a', 'c')
: ["a", "c"]
38
Math Functions
Function Description Example
add Returns the result of the addition of the
two numbers.
add(10,10.333): 20.333
sub Returns the result of the subtraction of the
two numbers.
sub(10,10.333): -0.333
mul Returns the result of the multiplication of
the two numbers.
mul(10,10.333): 103.33
div Returns the result of the division of the
two numbers.
div(10.333,10): 1.0333
mod Returns the result of the remainder after
the division of the two numbers (modulo).
mod(10,4) :2
min There are two different patterns for calling
this function. Note, all values must be
numbers
min([0,1,2]) :0
min(0,1,2) : 0
max There are two different patterns for calling
this function. Note, all values must be
numbers
max([0,1,2]) :2
max(0,1,2) : 2
range Generates an array of integers starting
from a certain number, and you define the
length of the returned array.
range(3,4) : [3,4,5,6]
rand Generates a random integer within the
specified range
rand(-1000,1000) : 42
39
Date Functions
Function Description Example
utcnow Returns the current timestamp as a string. . utcnow()
: 2019-02-21T13:27:36Z
addseconds Adds an integer number of seconds to a
string timestamp passed in. The number of
seconds can be positive or negative.
addseconds('2015-03-15T13:27:36Z', -36)
:2015-03-15T13:27:00Z
addminutes Adds an integer number of minutes to a
string timestamp passed in. The number of
minutes can be positive or negative.
addminutes('2015-03-15T13:27:36Z', 33)
:2015-03-15T14:00:36Z
addhours Adds an integer number of hours to a string
timestamp passed in. The number of hours
can be positive or negative.
addhours('2015-03-15T13:27:36Z', 12)
:2015-03-16T01:27:36Z
adddays Adds an integer number of days to a string
timestamp passed in. The number of days
can be positive or negative.
adddays('2015-03-15T13:27:36Z', -20)
:2015-02-23T13:27:36Z
formatDateTime Returns a string in date format. formatDateTime('2015-03-15T13:27:36Z',
'o')
:2015-02-23T13:27:36Z
40
Expressions in Azure Data Factory
 JSON values in the definition can be literal or expressions that are evaluated at runtime.
E. g. "name": "value“ OR "name": "@pipeline().parameters.password“
 Expressions can appear anywhere in a JSON string value and always result in another JSON value.
 If a JSON value is an expression, the body of the expression is extracted by removing the at-sign (@).
JSON value Result
"parameters" The characters 'parameters' are returned.
"parameters[1]" The characters 'parameters[1]' are
returned.
"@@" A 1 character string that contains '@' is
returned.
" @" A 2 character string that contains ' @' is
returned.
41
A dataset with a parameter
 Suppose the BlobDataset takes a parameter named path.
 Its value is used to set a value for the folderPath property by using the following expressions:
"folderPath": "@dataset().path"
A pipeline with a parameter
 In the following example, the pipeline takes inputPath and outputPath parameters.
 The path for the parameterized blob dataset is set by using values of these parameters.
 The syntax used here is: :
"path": "@pipeline().parameters.inputPath"
42
Question- Answers
© 2018 YASH Technologies | www.yash.com | Confidential
Feel free to write to me at:
mahesh.pandit@yash.com
in case of any queries / clarifications.

More Related Content

What's hot

Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSISMicrosoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSISMark Kromer
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2inovex GmbH
 
Azure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekAzure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekMark Kromer
 
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)Cathrine Wilhelmsen
 
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMark Kromer
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overviewJames Serra
 
Azure data factory
Azure data factoryAzure data factory
Azure data factoryBizTalk360
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAlex Tumanoff
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudMark Kromer
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Cathrine Wilhelmsen
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBaseJames Serra
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks FundamentalsDalibor Wijas
 
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...Edureka!
 
Azure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoAzure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoDimko Zhluktenko
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure DatabricksJames Serra
 
Azure Data Factory Data Flow
Azure Data Factory Data FlowAzure Data Factory Data Flow
Azure Data Factory Data FlowMark Kromer
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 

What's hot (20)

Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSISMicrosoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
 
Azure Data Factory for Azure Data Week
Azure Data Factory for Azure Data WeekAzure Data Factory for Azure Data Week
Azure Data Factory for Azure Data Week
 
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
 
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview Slides
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene Polonichko
 
Adf presentation
Adf presentationAdf presentation
Adf presentation
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the Cloud
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
Azure Data Factory | Moving On-Premise Data to Azure Cloud | Microsoft Azure ...
 
Azure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoAzure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene Polonichko
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Azure Data Factory Data Flow
Azure Data Factory Data FlowAzure Data Factory Data Flow
Azure Data Factory Data Flow
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 

Similar to Azure Data Factory Introduction.pdf

Transform your data with Azure Data factory
Transform your data with Azure Data factoryTransform your data with Azure Data factory
Transform your data with Azure Data factoryPrometix Pty Ltd
 
Core Concepts in azure data factory
Core Concepts in azure data factoryCore Concepts in azure data factory
Core Concepts in azure data factoryBRIJESH KUMAR
 
Azure Data Factory usage at Aucfanlab
Azure Data Factory usage at AucfanlabAzure Data Factory usage at Aucfanlab
Azure Data Factory usage at AucfanlabAucfan
 
Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfan
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Lace Lofranco
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptxFedoRam1
 
Azure Data Engineer Course | Azure Data Engineer Training Hyderabad.pptx
Azure Data Engineer Course | Azure Data Engineer Training Hyderabad.pptxAzure Data Engineer Course | Azure Data Engineer Training Hyderabad.pptx
Azure Data Engineer Course | Azure Data Engineer Training Hyderabad.pptxsivavisualpath
 
Migration to Databricks - On-prem HDFS.pptx
Migration to Databricks - On-prem HDFS.pptxMigration to Databricks - On-prem HDFS.pptx
Migration to Databricks - On-prem HDFS.pptxKshitija(KJ) Gupte
 
Exploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresExploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresCCG
 
ADF Demo_ppt.pptx
ADF Demo_ppt.pptxADF Demo_ppt.pptx
ADF Demo_ppt.pptxvamsytaurus
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...AboutYouGmbH
 
Data Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptxData Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptxArunPandiyan890855
 
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudShubham Tagra
 
Enterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft AzureEnterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft AzureKhalid Salama
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overviewvhrocca
 
Comprehensive Guide for Microsoft Fabric to Master Data Analytics
Comprehensive Guide for Microsoft Fabric to Master Data AnalyticsComprehensive Guide for Microsoft Fabric to Master Data Analytics
Comprehensive Guide for Microsoft Fabric to Master Data AnalyticsSparity1
 
Azure Data Factory for Redmond SQL PASS UG Sept 2018
Azure Data Factory for Redmond SQL PASS UG Sept 2018Azure Data Factory for Redmond SQL PASS UG Sept 2018
Azure Data Factory for Redmond SQL PASS UG Sept 2018Mark Kromer
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSAWS User Group Kochi
 

Similar to Azure Data Factory Introduction.pdf (20)

adf.docx
adf.docxadf.docx
adf.docx
 
Transform your data with Azure Data factory
Transform your data with Azure Data factoryTransform your data with Azure Data factory
Transform your data with Azure Data factory
 
Core Concepts in azure data factory
Core Concepts in azure data factoryCore Concepts in azure data factory
Core Concepts in azure data factory
 
Azure Data Factory usage at Aucfanlab
Azure Data Factory usage at AucfanlabAzure Data Factory usage at Aucfanlab
Azure Data Factory usage at Aucfanlab
 
Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
 
Azure Data Engineer Course | Azure Data Engineer Training Hyderabad.pptx
Azure Data Engineer Course | Azure Data Engineer Training Hyderabad.pptxAzure Data Engineer Course | Azure Data Engineer Training Hyderabad.pptx
Azure Data Engineer Course | Azure Data Engineer Training Hyderabad.pptx
 
Migration to Databricks - On-prem HDFS.pptx
Migration to Databricks - On-prem HDFS.pptxMigration to Databricks - On-prem HDFS.pptx
Migration to Databricks - On-prem HDFS.pptx
 
Exploring Microsoft Azure Infrastructures
Exploring Microsoft Azure InfrastructuresExploring Microsoft Azure Infrastructures
Exploring Microsoft Azure Infrastructures
 
ADF Demo_ppt.pptx
ADF Demo_ppt.pptxADF Demo_ppt.pptx
ADF Demo_ppt.pptx
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
 
Data Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptxData Modernization_Harinath Susairaj.pptx
Data Modernization_Harinath Susairaj.pptx
 
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the Cloud
 
Enterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft AzureEnterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft Azure
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Comprehensive Guide for Microsoft Fabric to Master Data Analytics
Comprehensive Guide for Microsoft Fabric to Master Data AnalyticsComprehensive Guide for Microsoft Fabric to Master Data Analytics
Comprehensive Guide for Microsoft Fabric to Master Data Analytics
 
Azure Data Factory for Redmond SQL PASS UG Sept 2018
Azure Data Factory for Redmond SQL PASS UG Sept 2018Azure Data Factory for Redmond SQL PASS UG Sept 2018
Azure Data Factory for Redmond SQL PASS UG Sept 2018
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
 

Recently uploaded

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Recently uploaded (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Azure Data Factory Introduction.pdf

  • 1. © 2018 YASH Technologies | www.yash.com | Confidential Azure Data Factory - Mahesh Pandit
  • 2. 2 Agenda  Why Azure data Factory  Introduction  Steps involves in ADF  ADF Components  ADF Activities  Linked Services  Integration Runtime and its types  How Azure Data Factory works  Azure Data Factory V1 vs V2  System Variables  Functions in ADF  Expressions in ADF  Question- Answers
  • 3. 3 © 2018 YASH Technologies | www.yash.com | Confidential Why Azure data Factory  Modern DW for BI  Modern DW for SaaS Apps  Lift & Shift existing SSIS Pkgs. to Cloud
  • 4. 4 Why Azure Data Factory Azure SQL DW Azure Data Lake Azure Data Factory
  • 5. 5 Modern DW for Business Intelligence Log, Files & Media (Unstructured) On Prem., Cloud Apps & Data Business/Custom apps (Structures) Data Factory Data Factory Azure Storage Azure Databricks Spark Ingest Store Prep & Train Model & Serve Intelligence Azure SQL Data warehouse Azure Analysis services Analytical Dashboards (Power BI) Azure Data Factory orchestrates data pipeline activity work flow & scheduling
  • 6. 6 Modern DW for SaaS Apps Log, Files & Media (Unstructured) On Prem., Cloud Apps & Data Business/Custom apps (Structures) Data Factory Data Factory Azure Storage Azure Databricks Spark Ingest Store Prep & Train Model & Serve Intelligence SaaS App Browser/Devices Azure Data Factory orchestrates data pipeline activity work flow & scheduling App Storage
  • 7. 7 Lift & Shift existing SSIS packages to Cloud Cloud On Premise On-Premise Data Sources SQL Server Azure Data Factory orchestrates data pipeline activity work flow & scheduling Data Factory Cloud Data Sources SQL DB Managed Instance VNET
  • 8. 8 Introduction Azure Data Factory Cloud-based Integration Service  It is cloud-based integration service that allows you to create data- driven workflows in the cloud for orchestrating and automating data movement and data transformation.  Scheduled data-driven workflows.  Sources and Destinations can be either on-premise or cloud.  Transformation can be done using Azure HDInsight Hadoop, Spark, Azure Data Lake Analytics and ML.
  • 9. 9 How does it work?  The pipelines (data-driven workflows) in Azure Data Factory typically perform the following four steps:
  • 10. 10 © 2018 YASH Technologies | www.yash.com | Confidential Steps involves in ADF  Connect and collect  Transform and enrich  Publish  Monitor
  • 11. 11 Connect and collect  The first step in building an information production system is to connect to all the required sources of data and processing, such as software-as-a-service (SaaS) services, databases, file shares, and FTP web services.  With Data Factory, you can use the Copy Activity in a data pipeline to move data from both on-premises and cloud source data stores to a centralization data store in the cloud for further analysis.  For example, you can collect data in Azure Data Lake as well in Azure Blob storage.
  • 12. 12 Transform and enrich  After data is present in a centralized data store in the cloud, process or transform the collected data by using compute services such as  HDInsight Hadoop  Spark  Data Lake Analytics  Machine Learning.
  • 13. 13 Publish  After the raw data has been refined into a business-ready consumable form, load the data into Azure Data Warehouse, Azure SQL Database, Azure Cosmos DB or many more as per user’s need.
  • 14. 14 Monitor  After you have successfully built and deployed your data integration pipeline, providing business value from refined data, monitor the scheduled activities and pipelines for success and failure rates.  Azure Data Factory has built-in support for pipeline monitoring via Azure Monitor, API, PowerShell, Log Analytics, and health panels on the Azure portal.
  • 15. 15 © 2018 YASH Technologies | www.yash.com | Confidential ADF Components  Pipeline  Activity  Datasets  Linked Services
  • 16. 16 ADF Components DATA SET (Table , File) ACTIVITY (hive, copy) PIPELINE (schedule, Monitor) LINKED SERVICE (SQL Server, Hadoop Cluster) Consume Produces Is logical Group of Runs on Represent Data item stored in  An Azure subscription might have one or more Azure Data Factory instances (or data factories).  Azure Data Factory is composed of four key components.  These components work together to provide the platform on which you can compose data-driven workflows with steps to move and transform data.
  • 17. 17 Pipeline  A data factory might have one or more pipelines.  A pipeline is a logical grouping of activities that performs a unit of work.  Together, the activities in a pipeline perform a task.  The pipeline allows you to manage the activities as a set instead of managing each one individually.  The activities in a pipeline can be chained together to operate sequentially, or they can operate independently in parallel.  To create data factory pipeline, we can use any one of the below method: Data Factory UI Copy Data Tool Azure Power Shell Rest Resource Manager Template .NET Python
  • 18. 18 Pipeline Execution Triggers  Triggers represent the unit of processing that determines when a pipeline execution needs to be kicked off.  There are different types of triggers for different types of events. Pipeline Runs  A pipeline run is an instance of the pipeline execution.  Pipeline runs are typically instantiated by passing the arguments to the parameters that are defined in pipelines.  The arguments can be passed manually or within the trigger definition. Parameters  Parameters are key-value pairs of read-only configuration.  Parameters are defined in the pipeline.  The arguments for the defined parameters are passed during execution from the run context that was created by a trigger or a pipeline that was executed manually.  Activities within the pipeline consume the parameter values. Control Flow  Control flow is an orchestration of pipeline activities that includes chaining activities in a sequence, branching, defining parameters at the pipeline level, and passing arguments while invoking the pipeline on-demand or from a trigger.  It also includes custom-state passing and looping containers, that is, For-each iterators.
  • 19. 19 © 2018 YASH Technologies | www.yash.com | Confidential ADF Activities  Data Movement Activities  Data Transformation Activities  Control Activities
  • 20. 20 Activity  Activities represent a processing step in a pipeline.  For example, you might use a copy activity to copy data from one data store to another data store.  Data Factory supports three types of activities: 1. Data movement activities 2. Data transformation activities 3. Control activities. Copy Activity Copy Activity Azure Blob Transformation Activity Copy Activity Output data Azure SQL Data Warehouse BI Tool
  • 21. 21 Data Movement Activities  Copy Activity in Data Factory copies data from a source data store to a sink data store.  Data from any source can be written to any sink. …….
  • 22. 22 Data Transformation Activities  Azure Data Factory supports the following transformation activities that can be added to pipelines either individually or chained with another activity. Compute Environment Data Transformation Activity Compute Environment Data Transformation Activity HDInsight HDInsight HDInsight HDInsight HDInsight Azure SQL, Azure SQL DW OR SQL Server Azure VM Azure Data Lake Analytics Azure Batch Azure Databricks
  • 23. 23 Control Activities  The following control flow activities are supported Execute Pipeline Activity It allows a Data Factory pipeline to invoke another pipeline. For Each Activity It defines a repeating control flow in your pipeline. Web Activity It can be used to call a custom REST endpoint from a Data Factory pipeline. Lookup Activity It can be used to read or look up a record/ table name/ value from any external source. Get Metadata Activity It can be used to retrieve metadata of any data in Azure Data Factory. Until Activity It implements Do-Until loop that is similar to Do- Until looping structure in programming languages. It executes a set of activities in a loop until the condition associated with the activity evaluates to true. If Condition Activity It can be used to branch based on condition that evaluates to true or false. Wait Activity When you use a Wait activity in a pipeline, the pipeline waits for the specified period of time before continuing with execution of subsequent activities.
  • 24. 24 Linked services  Linked services are much like connection strings, which define the connection information that's needed for Data Factory to connect to external resources.  A linked service defines the connection to the data source.  For example, an Azure Storage-linked service specifies a connection string to connect to the Azure Storage account.  Linked services are used for two purposes in Data Factory:  To represent a data store that includes data stores located on-premises and in the cloud. E.g. Tables, Files, Folders or Documents  To represent a compute resource that can host the execution of an activity. For example, the HDInsightHive activity runs on an HDInsight Hadoop cluster. Tables Files HDInsight Apache Spark ....... Data Stores Compute Resources
  • 25. 25 Integration Runtime  Think it as a Bridge between 2 networks.  It is compute infrastructure which provides capabilities across different N/W environments Data Movement Activity Dispatch SSIS Package Execution  Copy data across data stores in public network and data stores in private network (on-premises or virtual private network).  It provides support for built-in connectors, format conversion, column mapping and scalable data transfer.  This capabilities are use when compute services such as Azure HDInsight, Azure Machine Learning, Azure SQL Database, SQL Server, and more get used for transformation activities.  When SSIS packages need to be executed in the managed Azure Compute Environment like HDInsight then this capabilities are used.
  • 26. 26 Integration runtime types  These three types are: IR type Public network Private network Azure Data movement Activity dispatch Self-hosted Data movement Activity dispatch Data movement Activity dispatch Azure-SSIS SSIS package execution SSIS package execution
  • 27. 27 How Azure Data Factory Works Integration Runtime Integration Runtime Integration Runtime Dataset Dataset Dataset Pipeline Activity Activity Activity On-Premise SQL Server DB
  • 28. 28 © 2018 YASH Technologies | www.yash.com | Confidential Data Factory V1 vs. V2
  • 29. 29 Data Factory V1 vs. V2 Data Factory V1  Datasets  Linked Services  Pipelines  On-Premises Gateway  Schedule on Dataset availability and Pipeline start/end Time Data Factory V2  Datasets  Linked Services  Pipelines  Self hosted Integration Runtime  Schedule triggers(time or tumbling window)  Host and Execute SSIS Package Parameters  New Control Flow Activities
  • 30. 30 © 2018 YASH Technologies | www.yash.com | Confidential System Variables  Pipeline scope  Schedule Trigger scope  Tumbling Window Trigger scope
  • 31. 31 Pipeline Scope These system variables can be referenced anywhere in the pipeline JSON. @pipeline().DataFactory Name of the data factory the pipeline run is running within @pipeline().Pipeline Name of the pipeline @pipeline().RunId ID of the specific pipeline run @pipeline().TriggerType Type of the trigger that invoked the pipeline (Manual, Scheduler) @pipeline().TriggerId ID of the trigger that invokes the pipeline @pipeline().TriggerName Name of the trigger that invokes the pipeline @pipeline().TriggerTime Time when the trigger that invoked the pipeline. The trigger time is the actual fired time, not the scheduled time.
  • 32. 32 Schedule Trigger Scope  These system variables can be referenced anywhere in the trigger JSON if the trigger is of type: "ScheduleTrigger." @trigger().scheduledTime Time when the trigger was scheduled to invoke the pipeline run. For example, for a trigger that fires every 5 min, this variable would return 2017-06- 01T22:20:00Z, 2017-06-01T22:25:00Z, 2017-06-01T22:29:00Z respectively. @trigger().startTime Time when the trigger actually fired to invoke the pipeline run. For example, for a trigger that fires every 5 min, this variable might return something like this 2017-06-01T22:20:00.4061448Z, 2017-06- 01T22:25:00.7958577Z, 2017-06-01T22:29:00.9935483Zrespectively.
  • 33. 33 Tumbling window Trigger Scope  These system variables can be referenced anywhere in the trigger JSON if the trigger is of type: "TumblingWindowTrigger“. @trigger().outputs.windowStartTime Start of the window when the trigger was scheduled to invoke the pipeline run. If the tumbling window trigger has a frequency of "hourly" this would be the time at the beginning of the hour. @trigger().outputs.windowEndTime End of the window when the trigger was scheduled to invoke the pipeline run. If the tumbling window trigger has a frequency of "hourly" this would be the time at the end of the hour.
  • 34. 34 © 2018 YASH Technologies | www.yash.com | Confidential Functions in Azure  String Functions  Collection Functions  Logical Functions  Conversion Functions  Math Functions  Date Functions
  • 35. 35 String Functions Function Description Example concat Combines any number of strings together. concat(‘Hi’, ‘team’) : Hi team substring Returns a subset of characters from a string. substring('somevalue',1,3) : ome replace Replaces a string with a given string. replace(‘Hi team', ‘Hi', ‘Hey') : Hey team guid Generates a globally unique string guid() : c2ecc88d-88c8-4096-912c-d6 toLower Converts a string to lowercase. toLower('Two’) : two toUpper Converts a string to uppercase. toUpper('Two’) : TWO indexof Find the index of a value within a string case insensitively. indexof(Hi team', ‘Hi’) : 0 endswith Checks if the string ends with a value case insensitively. endswith(‘Hi team', ‘team') : true startswith Checks if the string starts with a value case insensitively. startswith(‘Hi team', ‘team') : false split Splits the string using a separator. split(‘Hi;team', ‘;') : [“Hi", “team“] lastindexof Find the last index of a value within a string case insensitively. lastindexof('foofoo‘) : 3
  • 36. 36 Collection Functions Function Description Example contains Returns true if dictionary contains a key, list contains value, or string contains substring. . contains('abacaba','aca') : true length Returns the number of elements in an array or string. length('abc') : 3 empty Returns true if object, array, or string is empty. empty('') : true intersection Returns a single array or object with the common elements between the arrays or objects passed to it. intersection([1, 2, 3], [101, 2, 1, 10],[6, 8, 1, 2]) : [1, 2] union Returns a single array or object with all of the elements that are in either array or object passed to it. union([1, 2, 3], [101, 2, 1, 10]) : [1, 2, 3, 10, 101] first Returns the first element in the array or string passed in. first([0,2,3]) : 0 last Returns the last element in the array or string passed in. last('0123') :3 take Returns the first Count elements from the array or string passed in take([1, 2, 3, 4], 2) : [1, 2] skip Returns the elements in the array starting at index Count, skip([1, 2 ,3 ,4], 2) : [3, 4]
  • 37. 37 Logical Functions Function Description Example int Convert the parameter to an integer. int('100') : 100 string Convert the parameter to a string. string(10) : ‘10’ json Convert the parameter to a JSON type value. json('[1,2,3]') : [1,2,3] json('{"bar" : "baz"}') : { "bar" : "baz" } float Convert the parameter argument to a floating-point number. float('10.333') : 10.333 bool Convert the parameter to a Boolean. bool(0) : false coalesce Returns the first non-null object in the arguments passed in. Note: an empty string is not null. coalesce(pipeline().parameters.paramet er1', pipeline().parameters.parameter2 ,'fallback') : fallback array Convert the parameter to an array. array('abc') : ["abc"] createArray Creates an array from the parameters. createArray('a', 'c') : ["a", "c"]
  • 38. 38 Math Functions Function Description Example add Returns the result of the addition of the two numbers. add(10,10.333): 20.333 sub Returns the result of the subtraction of the two numbers. sub(10,10.333): -0.333 mul Returns the result of the multiplication of the two numbers. mul(10,10.333): 103.33 div Returns the result of the division of the two numbers. div(10.333,10): 1.0333 mod Returns the result of the remainder after the division of the two numbers (modulo). mod(10,4) :2 min There are two different patterns for calling this function. Note, all values must be numbers min([0,1,2]) :0 min(0,1,2) : 0 max There are two different patterns for calling this function. Note, all values must be numbers max([0,1,2]) :2 max(0,1,2) : 2 range Generates an array of integers starting from a certain number, and you define the length of the returned array. range(3,4) : [3,4,5,6] rand Generates a random integer within the specified range rand(-1000,1000) : 42
  • 39. 39 Date Functions Function Description Example utcnow Returns the current timestamp as a string. . utcnow() : 2019-02-21T13:27:36Z addseconds Adds an integer number of seconds to a string timestamp passed in. The number of seconds can be positive or negative. addseconds('2015-03-15T13:27:36Z', -36) :2015-03-15T13:27:00Z addminutes Adds an integer number of minutes to a string timestamp passed in. The number of minutes can be positive or negative. addminutes('2015-03-15T13:27:36Z', 33) :2015-03-15T14:00:36Z addhours Adds an integer number of hours to a string timestamp passed in. The number of hours can be positive or negative. addhours('2015-03-15T13:27:36Z', 12) :2015-03-16T01:27:36Z adddays Adds an integer number of days to a string timestamp passed in. The number of days can be positive or negative. adddays('2015-03-15T13:27:36Z', -20) :2015-02-23T13:27:36Z formatDateTime Returns a string in date format. formatDateTime('2015-03-15T13:27:36Z', 'o') :2015-02-23T13:27:36Z
  • 40. 40 Expressions in Azure Data Factory  JSON values in the definition can be literal or expressions that are evaluated at runtime. E. g. "name": "value“ OR "name": "@pipeline().parameters.password“  Expressions can appear anywhere in a JSON string value and always result in another JSON value.  If a JSON value is an expression, the body of the expression is extracted by removing the at-sign (@). JSON value Result "parameters" The characters 'parameters' are returned. "parameters[1]" The characters 'parameters[1]' are returned. "@@" A 1 character string that contains '@' is returned. " @" A 2 character string that contains ' @' is returned.
  • 41. 41 A dataset with a parameter  Suppose the BlobDataset takes a parameter named path.  Its value is used to set a value for the folderPath property by using the following expressions: "folderPath": "@dataset().path" A pipeline with a parameter  In the following example, the pipeline takes inputPath and outputPath parameters.  The path for the parameterized blob dataset is set by using values of these parameters.  The syntax used here is: : "path": "@pipeline().parameters.inputPath"
  • 43. © 2018 YASH Technologies | www.yash.com | Confidential Feel free to write to me at: mahesh.pandit@yash.com in case of any queries / clarifications.