Your expectations and what we will cover;
Azure Data Factory V2; The Data Flows
• The Abstract - “Mapping Data Flows is the fantastic new feature of Azure Data Factory, which
combines a visual designer with the full power of Databricks to deliver a robust and hugely
scalable data flow pipeline, like a constantly evolving integration services cloud service. In this
hands-on session we'll design and build a data transformation in the data flow designer using
the new Data Factory flow user interface, and talk about the underlying architectural
components.”
• We will start with a high level introduction to Azure Data
Factory
• Then we’ll discuss some of Data Flows
• Then let us build something!
About me
• Thomas Sykes MCT, Azure Certified, MCSE
• Senior Consultant for Quorum based in Edinburgh, Scotland
• Working with SQL Server since version 7.0
• Now working ‘in the cloud’
• On twitter @sqltomato and use my notepad at sqltomato.com
About you
How many of you
have used SQL
Server Integration
Services (SSIS)?
How many of you
have used Azure?
Have you looked
into or used Azure
Data Factory?
What is Azure Data Factory?
• Azure Data Factory is a cloud-based data integration
service that allows you to create data-driven workflows
in the cloud for orchestrating and automating data
movement and data transformation
https://docs.microsoft.com/en-gb/azure/data-factory/introduction
What is the Data Flow?
• It is a visual designer native to Microsoft Azure that provides
robust and scalable data flow pipeline, like a constantly
evolving integration services cloud service
• It transforms Azure Data Factory from a Data Movement
tool to a full Extract, Transform and Load tool with a
graphical interface
Azure Data Factory Data Flows ADFDF
• When typing this ADFDF acronym
• Looks live I’ve fallen asleep on the keyboard
• It’s almost meant for our standard Dvorak QWERTY
keyboard
QWERTY keyboard - https://en.wikipedia.org/wiki/Dvorak_Simplified_Keyboard#Original_Dvorak_layout
Pipelines
• “… pipeline is a logical grouping of activities that performs a
unit of work …”
• Within the pipelines we have essentially control flow and
data flows
Linked Services
• “… Linked services are much like connection strings …”
• They can represent data stores (such as Azure Blob Storage
or Azure SQL Database) or a compute resource (such as the
HDInsightHive activity)
https://docs.microsoft.com/en-gb/azure/data-factory/introduction
Data Flows
• “… Data Flows allow data engineers to develop graphical
data transformation logic without writing code. The resulting
data flows are executed as activities within Azure Data
Factory Pipelines using scaled-out Azure Databricks clusters
…”
• Similar to SQL Server Integration Services the native Azure
Data Flows boast a graphical ‘no code’ interface with a rich
array of connectors
• Being actively developed
https://docs.microsoft.com/en-us/azure/data-factory/data-flow-expression-functions
Building a simple data flow – the problem
• A simple real world problem
• A courier required the post town to match the post code or would
not accept the packages as the system would reject them
• Each Postcode EH11 4EP has a Post District EH11 which has a
associated Post Town EDINBURGH
• Some of the data now has more than one entry, to keep this
simple I’ve used the first entry and used EH and G only
Post district data - https://en.wikipedia.org/wiki/List_of_postcode_districts_in_the_United_Kingdom
Building a simple data flow - prerequisites
• Azure SQL Server and Database
• Azure Data Factory
• Azure Storage Blob
• Microsoft Azure Storage Explorer
• Post District Data – EH and G data loaded into database
• ‘Customer’ Mailing Data – Some fictious customer data
Building a simple data flow – concepts
• Flat File data source stored in Azure Blob Storage
• Azure SQL Database – Azure PaaS Database
• Simple expression
• INNER JOIN
• Output to a ‘sink’
“This Page left Intentionally Blank” – Demo!
This Page left Intentionally Blank
Want to build this yourself?
What will you need?
• Azure Subscription
• Microsoft Azure Storage Explorer
• Post District Data
• ‘Customer’ File
• All details at sqltomato.com blog post Data Flows
https://azure.microsoft.com/en-gb/free/
https://azure.microsoft.com/en-gb/features/storage-explorer/
https://en.wikipedia.org/wiki/List_of_postcode_districts_in_the_United_Kingdom