Next Generation of Data Integration with Azure Data Factory by Tom Kerkhove
18 de Sep de 2018•0 recomendaciones
1 recomendaciones
Sé el primero en que te guste
ver más
•212 vistas
vistas
Total de vistas
0
En Slideshare
0
De embebidos
0
Número de embebidos
0
Denunciar
Tecnología
In this presentation, you'll learn the basics of Azure Data Factory. Tom Kerkhove will show you how you can take the data you have stored in your on-premise and cloud-based systems and create, manage and operate your own data pipelines.
Next Generation of Data Integration with Azure Data Factory by Tom Kerkhove
Next Generation of Data Integration with
Azure Data Factory
Learn more about Azure Data Factory, the easiest cloud-based data integration
service at scale.
Hi, I’m Tom Kerkhove
1
Agenda
| Basics of Azure Data Factory
| Demo
| How is it different from Logic Apps?
| Q&A
September 2018 Next Generation of Data Integration with Azure Data Factory 3
Azure Serverless
Azure Logic AppsAzure Functions Azure Event Grid Azure Data Factory
September 2018 Next Generation of Data Integration with Azure Data Factory 4
What is Azure Data Factory?
September 2018 Next Generation of Data Integration with Azure Data Factory 8
| Managed data orchestration service
| Allows you to run pipelines
| Support for hybrid scenarios
| Support for executing SSIS packages
| Data movement-as-a-service with 70+ connectors
| Visual tooling & programmability
| .NET, Python, REST, ARM
Basics of Azure Data Factory
Anatomy of a data pipeline
September 2018 Next Generation of Data Integration with Azure Data Factory 9
Basics of Azure Data Factory
Trigger(s) Activity ActivityActivity
Activity
Activity
Anatomy of a data pipeline
September 2018 Next Generation of Data Integration with Azure Data Factory 10
| A pipeline represents a business process which contains one or more “steps”
which are called activities.
| Triggers initiate a specific pipeline which can contain parameters
| Activities represent a step in a business process that perform a specific action.
| This is based on the outcome of the previous step and can be on success, failure, skipped
or completion
Basics of Azure Data Factory
Anatomy of a data pipeline
September 2018 Next Generation of Data Integration with Azure Data Factory 11
Basics of Azure Data Factory
Trigger(s) Activity ActivityActivity
Activity
Activity
Integration Runtime (IR)
September 2018 Next Generation of Data Integration with Azure Data Factory 12
| Compute infrastructure used by Data Factory
| Azure, Azure-SSIS or Self-Hosted (Any cloud or on-prem)
| Core capabilities
| Data movement
| Pipeline activity execution
| SSIS package execution
| Pipelines issues commands & control, integration runtime executes
| Data movement is from IR to IR
| All executions are happening in sources & sinks
Basics of Azure Data Factory
13
| Azure Data Factory
| Scheduling of pipelines
| Orchestrating the activities across
Integration Runtimes
| Monitoring the progress
| Integration Runtime (IR)
| Execution engine
| Core capabilities:
| Data movement
| Pipeline activity execution
| SSIS package execution
High-Level Overview
What is Azure Data Factory?
September 2018 Next Generation of Data Integration with Azure Data Factory 15
Basics of Azure Data Factory
Trigger(s) Activity ActivityActivity
Activity
Activity
Triggers
September 2018 Next Generation of Data Integration with Azure Data Factory 16
| Different types of triggers
| On-Demand
| Triggered via REST API, .NET, etc.
| Azure API Management can make this easier (http://bit.ly/api-management-adf)
| Scheduled / Wall-clock
| Tumbling Windows (aka “data slicing”)
| Event-based (New file is added to blob storage)
| Support for passing parameters
Basics of Azure Data Factory
Activities, data sets & linked services
September 2018 Next Generation of Data Integration with Azure Data Factory 17
| An activity can produce or consume a data set. It is a representation of a data
structure in a data store that can be used as a source or sink.
| Linked Services define how an activity can connect to an external system. This
external system can be a data store or compute resource.
Basics of Azure Data Factory
Activities, data sets & linked services
September 2018 Next Generation of Data Integration with Azure Data Factory 18
Basics of Azure Data Factory
Activity
Data
Set
Linked
Service
Represents data
stored in
Produces
Consumes
Activities
September 2018 Next Generation of Data Integration with Azure Data Factory 19
| Data Movement
| Azure, Databases, NoSQL, File, SaaS, Web, etc
| Data Transformation
| Pig, Hive, Stored Procedure, U-SQL, ML, Spark, MapReduce, etc.
| Control Flow
| Web call, Lookup, Get Metadata, If, Wait, ForEach, Execute Pipeline, etc
| Custom
| Run commands on an Azure Batch cluster
| Run R scripts on a HDInsight cluster
Basics of Azure Data Factory
Data Transformation
September 2018 Next Generation of Data Integration with Azure Data Factory 20
| Support for running activities against other Azure services
| Provide capability to perform schema column mappings (with UI support)
| Visual Data Flow Authoring Private Preview
| Serverless scale-out transformation execution engine
| No knowledge of Spark, Scala, Python or Java is required
| Walkthrough on YouTube (http://bit.ly/adf-data-flow-preview)
Basics of Azure Data Factory
Visual Data Flow Authoring Private Preview
September 2018 Next Generation of Data Integration with Azure Data Factory 21
Running SSIS packages in Azure
September 2018 Next Generation of Data Integration with Azure Data Factory 22
| Stores SSISDB in Azure SQL DB or Managed Instance
| Azure-SSIS integration runtime as compute-layer
| Compute part for running SSIS
| Managed cluster of Azure VMs
| Can be linked to VNET for hybrid scenarios
| Lift & shift packages to the cloud
Basics of Azure Data Factory
Running SSIS packages in Azure
September 2018 Next Generation of Data Integration with Azure Data Factory 23
Basics of Azure Data Factory
Security
September 2018 Next Generation of Data Integration with Azure Data Factory 24
| Native support for Managed Service Identity (MSI)
| Native integration with Azure Key Vault
| Encrypted-in-transit via HTTPS
| Supports encryption-at-rest with data stores
Basics of Azure Data Factory
Monitoring
September 2018 Next Generation of Data Integration with Azure Data Factory 25
| Visual monitoring in the portal
| Monitoring per pipeline run
| Detailed information per activity
| Azure Monitor integration
| Diagnostic Logs
| Metrics
| Alerts
Basics of Azure Data Factory
Demo - Using Azure Serverless to
become GDPR compliant
26
Using Azure Serverless to become GDPR compliant
September 2018 Next Generation of Data Integration with Azure Data Factory 27
| Every user should be capable of requesting their data
User Profile
information
StackExchange
Data Set
Kerkhove.tom
@gmail.com
Using Azure Serverless to become GDPR compliant
September 2018 Next Generation of Data Integration with Azure Data Factory 28
Send
Summary
Copy User
Info from DB
Consolidate
User Info
Copy
Consolidated
Data
Send
“Consolidation
Started”
Using Azure Serverless to become GDPR compliant
September 2018 Next Generation of Data Integration with Azure Data Factory 29
How is this different from Logic Apps?
September 2018 Next Generation of Data Integration with Azure Data Factory 31
| Serverless orchestration
| Pay for what you use
| Data-centric vs Application-centric workflows
| Work together seamlessly
Conclusion
September 2018 Next Generation of Data Integration with Azure Data Factory 33
| Azure Data Factory is a great way to orchestrate data processes and build
data-integration pipelines
| Allows you to get to market very quickly with the built-in connectors
| Very powerful for data-centric workloads
| A perfect match with Azure Logic Apps
| Unsung hero in the serverless space
Any question(s)?
Read more about my demo and other Azure adventures on codit.eu/blog!
Thank you for your attention!
34