Publicidad

Azure Data Factory V2; The Data Flows

16 de Sep de 2019
Publicidad

Más contenido relacionado

Publicidad

Azure Data Factory V2; The Data Flows

  1. Azure Data Factory V2 The Data Flows
  2. Your expectations and what we will cover; Azure Data Factory V2; The Data Flows • The Abstract - “Mapping Data Flows is the fantastic new feature of Azure Data Factory, which combines a visual designer with the full power of Databricks to deliver a robust and hugely scalable data flow pipeline, like a constantly evolving integration services cloud service. In this hands-on session we'll design and build a data transformation in the data flow designer using the new Data Factory flow user interface, and talk about the underlying architectural components.” • We will start with a high level introduction to Azure Data Factory • Then we’ll discuss some of Data Flows • Then let us build something!
  3. About me • Thomas Sykes MCT, Azure Certified, MCSE • Senior Consultant for Quorum based in Edinburgh, Scotland • Working with SQL Server since version 7.0 • Now working ‘in the cloud’ • On twitter @sqltomato and use my notepad at sqltomato.com
  4. About you How many of you have used SQL Server Integration Services (SSIS)? How many of you have used Azure? Have you looked into or used Azure Data Factory?
  5. What is Azure Data Factory? • Azure Data Factory is a cloud-based data integration service that allows you to create data-driven workflows in the cloud for orchestrating and automating data movement and data transformation https://docs.microsoft.com/en-gb/azure/data-factory/introduction
  6. What is the Data Flow? • It is a visual designer native to Microsoft Azure that provides robust and scalable data flow pipeline, like a constantly evolving integration services cloud service • It transforms Azure Data Factory from a Data Movement tool to a full Extract, Transform and Load tool with a graphical interface
  7. Azure Data Factory Data Flows ADFDF • When typing this ADFDF acronym • Looks live I’ve fallen asleep on the keyboard • It’s almost meant for our standard Dvorak QWERTY keyboard QWERTY keyboard - https://en.wikipedia.org/wiki/Dvorak_Simplified_Keyboard#Original_Dvorak_layout
  8. Pipelines Linked Services Data Flows (Preview) Azure Data Factory Data Flows ADFDF
  9. Pipelines • “… pipeline is a logical grouping of activities that performs a unit of work …” • Within the pipelines we have essentially control flow and data flows
  10. Linked Services • “… Linked services are much like connection strings …” • They can represent data stores (such as Azure Blob Storage or Azure SQL Database) or a compute resource (such as the HDInsightHive activity) https://docs.microsoft.com/en-gb/azure/data-factory/introduction
  11. Data Flows • “… Data Flows allow data engineers to develop graphical data transformation logic without writing code. The resulting data flows are executed as activities within Azure Data Factory Pipelines using scaled-out Azure Databricks clusters …” • Similar to SQL Server Integration Services the native Azure Data Flows boast a graphical ‘no code’ interface with a rich array of connectors • Being actively developed https://docs.microsoft.com/en-us/azure/data-factory/data-flow-expression-functions
  12. Building a simple data flow – the problem • A simple real world problem • A courier required the post town to match the post code or would not accept the packages as the system would reject them • Each Postcode EH11 4EP has a Post District EH11 which has a associated Post Town EDINBURGH • Some of the data now has more than one entry, to keep this simple I’ve used the first entry and used EH and G only Post district data - https://en.wikipedia.org/wiki/List_of_postcode_districts_in_the_United_Kingdom
  13. Building a simple data flow - prerequisites • Azure SQL Server and Database • Azure Data Factory • Azure Storage Blob • Microsoft Azure Storage Explorer • Post District Data – EH and G data loaded into database • ‘Customer’ Mailing Data – Some fictious customer data
  14. Building a simple data flow – concepts • Flat File data source stored in Azure Blob Storage • Azure SQL Database – Azure PaaS Database • Simple expression • INNER JOIN • Output to a ‘sink’
  15. “This Page left Intentionally Blank” – Demo! This Page left Intentionally Blank
  16. Building a simple data flow – How it looks
  17. Database, Debug, Runtime and Costs
  18. Want to build this yourself? What will you need? • Azure Subscription • Microsoft Azure Storage Explorer • Post District Data • ‘Customer’ File • All details at sqltomato.com blog post Data Flows https://azure.microsoft.com/en-gb/free/ https://azure.microsoft.com/en-gb/features/storage-explorer/ https://en.wikipedia.org/wiki/List_of_postcode_districts_in_the_United_Kingdom
  19. Thank you for attending, please complete feedback and visit our sponsors!
Publicidad