SlideShare una empresa de Scribd logo
1 de 33
Descargar para leer sin conexión
ETL for the Masses
Régis Baccaro – IBM
@regbac
Our Sponsors
Introduction

Régis Baccaro

@regbac

http://Theblobfarm.wordpress.com
http://Thelovefarm.wordpress.com

regis@baccaro.com
•
•
•
•
•

Founder and lead organizer of SQL Saturday Denmark
PASS Regional Mentor
Works for IBM
Passionate about the community
.Net developer, BI dude, SharePoint fellow and accidental DBA
Agenda
• Power Query and the M language
• E and T and L with Power Query
• Data refresh techniques with PQ
• Next step
Introduction
• Power Query
• Get data experience
• Filter and combine
• Embedded M for repeatable mashup

• Power Query Formula Language (aka M)
•
•
•
•
•

Mostly pure
Higher-order
Dynamically typed
Partially lazy 
Functional programming language
Elements of language
• Expressions – central construct
• Evaluated to a single vlaue

• Values
•
•
•
•
•

Primitives
List – ordered seq.
Record – set of fields
Table
Function
Evaluation
• Excel-like (surprise !)
• Nested records
• In Records
• In Lists

• Lazy evaluation
• Lists and Records (and let)

• Eager evaluation
• Everything else
Functions and Standard Library
• Mapping from a set of values to a single value
• (named parameters) => function body

• Common set of definitions
Operators
• Meaning varies depending on kind of value

• & = text or list concatenation and records merge
Metadata
• Information about a value that is associated with a value
• A record
• Exists for every value
• Unobtrusive way to add information
• Accessed with Value.Metadata
Let .....in expression
• So far only literal values
• Let allows a set of value to be:
• Computed
• Named
• Used in subsequent expressions that follows the in
let
in

Source = Web.Page(Web.Contents("http://www.cvr.dk/Site/Forms/CompanySearch/CompanySearch.aspx?......),
RowCount = Table.RowCount(Source)
RowCount
IF expression
• Select between 2 expression based on logical condition
Error expression
• When an expression evaluation cannot yield a value
• Raised with error
• Handled with try
• Produces an Error record
• try...otherwise Used with default values
Keywords and Operators
• and as each else error false if in is let meta not
otherwise or section shared then true try type
#binary #date #datetime #datetimezone #duration
#infinity #nan #sections #shared #table #time
• , ; = < <= > >= <> + - * / & ( ) [ ] { } @ ! ? =>
.. ...
The ”E” - Why is Power Query great for Extracting data
• Multiple data sources

Hey wait ! Where is PDW ?
Query folding - A step toward declarative ETL approach
• Declarative vs Imperative
• Query folding similar to predicate pushdown
• Does Power Query have a Query Optimizer ?
• Demo
Query folding - the unofficial list:
• SQL Databases
• OData and OData based sources, such
as the Windows Azure Marketplace
and SharePoint Lists
• Active Directory
• HDFS.Files, Folder.Files, and
Folder.Contents (for basic operations
on paths)

•
•
•
•

Column removal
Renaming
Joins
Type conversions
Real life scenario – ETL for the masses
• Seen a lot of demos
• Build a lot of demos
• They are always so clean !
Real life scenario
Transform
• M is how the magic happens!
• Data manipulation
• Records
• Lists
• Tables

• Merging
• Function calls
What about our scenario?
• Where should I get my data from?
• Pure Excel
• Excel and MDS/DQS/SSIS/SQL
• Web, SQL, XML, ?

• Let me show you ! Input
• (cvr web)
Let’s go to homegrown data?
• Bad web service
• Bad HTML structure
• Let’s go with local data that we can control

Isolated DB

• SQL Server
• Excel

• Let’s Query!
Local storage
Clean up before you merge!
• DQS
Knowledge base with CVR
+ Cleansing project with LinkedIn input
________________________________________
= Demo2.1_AndreasStrandbyClean

+

• Hit ratio increased...

Hit

250

Total

100%
90%
80%

200

70%
60%

150

50%

=

40%

100

30%
20%

50

10%
0

0%

Clean
join

Nested Merge
join
Smarter Power Query
• Expression.Evaluate()
• Examples
• Load query text from file
• Load function from file
• Passing parameters (as constants)

• Demo
Refreshing Power Query data
• Different solutions
• All with flaws !
Refreshing Power Query data – with VB6 !
• Back from 2006
Plus

Minus

Can be scheduled

VB6 – are you kidding ?

More robust than the non-technical
solution

• From Kim GreenLee
Refreshing Power Query data – with PowerShell

Plus

Minus

Robust

Hard to troubleshoot
Can not run in a task in windows task
scheduler unless the user has checked
that the user has to be logged on to run
Refreshing Power Query data – The non-technical way
• Let me show you !
Plus

Minus

Very easy

Not very corporate !
The spreadsheet needs to be open
Excel file not saved
Locked out when it refreshes
Refreshing Power Query data – The non-technical way part 2
• Let me show you !
Plus

Minus

Very easy

Not very corporate !

Uses technique from previous

The spreadsheet needs to be open
Refreshing Power Query data – with SSIS

Plus

Minus

Robust

Requires a SQL Server (wait, it’s a plus!)
Needs a SSIS / C# developer
Refreshing Power Query data – with SSIS
• Using DQS for cleansing input

• Let me show you !
How is Power query going to be used?
• Data store accumulating interesting data points
• Hook into read only data for reporting purposes or data marts
• One file to accumulate (Produce)
• Multiple files or programs to report (Consume)
• I don’t believe in “Data Steward”
• I believe someone will be in charge of procuring and monitoring
data stores of disparate data (such as IT or DBA’s).
Conclusion
• A step toward declarative ETL approach
• Still much work to do !
We have
• A declarative data integration language
• Only surfaced in Power Query
• Can push data to an Excel spreadsheet
Imagine.....
• Connection to heterogenous data sources
THANK YOU!
@REGBAC
HTTP://THEBLOBFARM.WORDPRESS.COM
REGIS@BACCARO.COM

Más contenido relacionado

La actualidad más candente

PowerPivot and PowerQuery
PowerPivot and PowerQueryPowerPivot and PowerQuery
PowerPivot and PowerQueryin4400
 
Power bi introduction
Power bi introductionPower bi introduction
Power bi introductionBishwadeb Dey
 
Power BI - WHat It Is, How It Works, and Why It Matters
Power BI -  WHat It Is, How It Works, and Why It MattersPower BI -  WHat It Is, How It Works, and Why It Matters
Power BI - WHat It Is, How It Works, and Why It MattersJohn White
 
How to Get Lightning Fast Answers with Power BI Q&A and Cortana
How to Get Lightning Fast Answers with Power BI Q&A and CortanaHow to Get Lightning Fast Answers with Power BI Q&A and Cortana
How to Get Lightning Fast Answers with Power BI Q&A and CortanaVishal Pawar
 
PowerPivot, Power View and SharePoint Server
PowerPivot, Power View and SharePoint ServerPowerPivot, Power View and SharePoint Server
PowerPivot, Power View and SharePoint ServerSPC Adriatics
 
Leveraging Microsoft Power BI To Support Enterprise Business Intelligence
Leveraging Microsoft Power BI To Support Enterprise Business IntelligenceLeveraging Microsoft Power BI To Support Enterprise Business Intelligence
Leveraging Microsoft Power BI To Support Enterprise Business IntelligenceRightpoint
 
Powerbimadesimple 150206194215-conversion-gate01
Powerbimadesimple 150206194215-conversion-gate01Powerbimadesimple 150206194215-conversion-gate01
Powerbimadesimple 150206194215-conversion-gate01Zeeshan Ikram
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI OverviewJames Serra
 
Best practices to deliver data analytics to the business with power bi
Best practices to deliver data analytics to the business with power biBest practices to deliver data analytics to the business with power bi
Best practices to deliver data analytics to the business with power biSatya Shyam K Jayanty
 
SqlSaturday#699 Power BI - Create a dashboard from zero to hero
SqlSaturday#699 Power BI - Create a dashboard from zero to heroSqlSaturday#699 Power BI - Create a dashboard from zero to hero
SqlSaturday#699 Power BI - Create a dashboard from zero to heroVishal Pawar
 
Primer on Power BI 201506
Primer on Power BI 201506Primer on Power BI 201506
Primer on Power BI 201506Mark Tabladillo
 
Power BI - Finally I can make decisions based on facts
Power BI - Finally I can make decisions based on factsPower BI - Finally I can make decisions based on facts
Power BI - Finally I can make decisions based on factsUlysses Maclaren
 

La actualidad más candente (20)

PowerPivot and PowerQuery
PowerPivot and PowerQueryPowerPivot and PowerQuery
PowerPivot and PowerQuery
 
October2019 release
October2019 releaseOctober2019 release
October2019 release
 
Power bi introduction
Power bi introductionPower bi introduction
Power bi introduction
 
Power BI - WHat It Is, How It Works, and Why It Matters
Power BI -  WHat It Is, How It Works, and Why It MattersPower BI -  WHat It Is, How It Works, and Why It Matters
Power BI - WHat It Is, How It Works, and Why It Matters
 
Ai in power platform
Ai in power platform Ai in power platform
Ai in power platform
 
How to Get Lightning Fast Answers with Power BI Q&A and Cortana
How to Get Lightning Fast Answers with Power BI Q&A and CortanaHow to Get Lightning Fast Answers with Power BI Q&A and Cortana
How to Get Lightning Fast Answers with Power BI Q&A and Cortana
 
Dax & sql in power bi
Dax & sql in power biDax & sql in power bi
Dax & sql in power bi
 
PowerPivot, Power View and SharePoint Server
PowerPivot, Power View and SharePoint ServerPowerPivot, Power View and SharePoint Server
PowerPivot, Power View and SharePoint Server
 
Power BI
Power BIPower BI
Power BI
 
Leveraging Microsoft Power BI To Support Enterprise Business Intelligence
Leveraging Microsoft Power BI To Support Enterprise Business IntelligenceLeveraging Microsoft Power BI To Support Enterprise Business Intelligence
Leveraging Microsoft Power BI To Support Enterprise Business Intelligence
 
Powerbimadesimple 150206194215-conversion-gate01
Powerbimadesimple 150206194215-conversion-gate01Powerbimadesimple 150206194215-conversion-gate01
Powerbimadesimple 150206194215-conversion-gate01
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI Overview
 
Best practices to deliver data analytics to the business with power bi
Best practices to deliver data analytics to the business with power biBest practices to deliver data analytics to the business with power bi
Best practices to deliver data analytics to the business with power bi
 
August2019 release PowerBI
August2019 release PowerBI August2019 release PowerBI
August2019 release PowerBI
 
SqlSaturday#699 Power BI - Create a dashboard from zero to hero
SqlSaturday#699 Power BI - Create a dashboard from zero to heroSqlSaturday#699 Power BI - Create a dashboard from zero to hero
SqlSaturday#699 Power BI - Create a dashboard from zero to hero
 
Power BI for CEO
Power BI for CEOPower BI for CEO
Power BI for CEO
 
Primer on Power BI 201506
Primer on Power BI 201506Primer on Power BI 201506
Primer on Power BI 201506
 
Power BI - Finally I can make decisions based on facts
Power BI - Finally I can make decisions based on factsPower BI - Finally I can make decisions based on facts
Power BI - Finally I can make decisions based on facts
 
Power bi
Power biPower bi
Power bi
 
Tableau vs PowerBI
Tableau vs PowerBITableau vs PowerBI
Tableau vs PowerBI
 

Similar a ETL for the masses with Power Query and M

Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptxIke Ellis
 
Introduction to SharePoint for SQLserver DBAs
Introduction to SharePoint for SQLserver DBAsIntroduction to SharePoint for SQLserver DBAs
Introduction to SharePoint for SQLserver DBAsSteve Knutson
 
Data Ingestion Engine
Data Ingestion EngineData Ingestion Engine
Data Ingestion EngineAdam Doyle
 
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...Spark Summit
 
Migration from Redshift to Spark
Migration from Redshift to SparkMigration from Redshift to Spark
Migration from Redshift to SparkSky Yin
 
Text Mining & Sentiment Analysis with Power BI & Azure
Text Mining & Sentiment Analysis with Power BI & AzureText Mining & Sentiment Analysis with Power BI & Azure
Text Mining & Sentiment Analysis with Power BI & AzureSanil Mhatre
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analyticsIke Ellis
 
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...European SharePoint Conference
 
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...BIWUG
 
Taming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI OptionsTaming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI OptionsKellyn Pot'Vin-Gorman
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
Data Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsData Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsIke Ellis
 
DMann-SQLDeveloper4Reporting
DMann-SQLDeveloper4ReportingDMann-SQLDeveloper4Reporting
DMann-SQLDeveloper4ReportingDavid Mann
 
Data modeling trends for Analytics
Data modeling trends for AnalyticsData modeling trends for Analytics
Data modeling trends for AnalyticsIke Ellis
 
Hadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerHadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerIke Ellis
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems
 

Similar a ETL for the masses with Power Query and M (20)

Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptx
 
Introduction to SharePoint for SQLserver DBAs
Introduction to SharePoint for SQLserver DBAsIntroduction to SharePoint for SQLserver DBAs
Introduction to SharePoint for SQLserver DBAs
 
Data Ingestion Engine
Data Ingestion EngineData Ingestion Engine
Data Ingestion Engine
 
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
 
Migration from Redshift to Spark
Migration from Redshift to SparkMigration from Redshift to Spark
Migration from Redshift to Spark
 
Text Mining & Sentiment Analysis with Power BI & Azure
Text Mining & Sentiment Analysis with Power BI & AzureText Mining & Sentiment Analysis with Power BI & Azure
Text Mining & Sentiment Analysis with Power BI & Azure
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analytics
 
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
SQL Server and SharePoint - Best Practices presented by Steffen Krause, Micro...
 
Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
 
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...
1. SQL Server forSharePoint geeksA gentle introductionThomas Vochten • Septem...
 
Taming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI OptionsTaming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI Options
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Data Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsData Modeling on Azure for Analytics
Data Modeling on Azure for Analytics
 
Breaking data
Breaking dataBreaking data
Breaking data
 
DMann-SQLDeveloper4Reporting
DMann-SQLDeveloper4ReportingDMann-SQLDeveloper4Reporting
DMann-SQLDeveloper4Reporting
 
Data modeling trends for Analytics
Data modeling trends for AnalyticsData modeling trends for Analytics
Data modeling trends for Analytics
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
 
Hadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerHadoop for the Absolute Beginner
Hadoop for the Absolute Beginner
 
Power BI Live Data Sets
Power BI Live Data SetsPower BI Live Data Sets
Power BI Live Data Sets
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
 

Último

CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024Brian Pichman
 
Extra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfExtra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfInfopole1
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updateadam112203
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)IES VE
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0DanBrown980551
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4DianaGray10
 
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechProduct School
 
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxHow to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxKaustubhBhavsar6
 
Where developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingWhere developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingFrancesco Corti
 
.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptxHansamali Gamage
 
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2DianaGray10
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationKnoldus Inc.
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applicationsnooralam814309
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfTejal81
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Libraryshyamraj55
 
UiPath Studio Web workshop series - Day 1
UiPath Studio Web workshop series  - Day 1UiPath Studio Web workshop series  - Day 1
UiPath Studio Web workshop series - Day 1DianaGray10
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameKapil Thakar
 

Último (20)

CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024
 
Extra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfExtra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdf
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 update
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4
 
SheDev 2024
SheDev 2024SheDev 2024
SheDev 2024
 
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
 
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxHow to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptx
 
Where developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingWhere developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is going
 
.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx.NET 8 ChatBot with Azure OpenAI Services.pptx
.NET 8 ChatBot with Azure OpenAI Services.pptx
 
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its application
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applications
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile Brochure
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Library
 
UiPath Studio Web workshop series - Day 1
UiPath Studio Web workshop series  - Day 1UiPath Studio Web workshop series  - Day 1
UiPath Studio Web workshop series - Day 1
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First Frame
 

ETL for the masses with Power Query and M

  • 1. ETL for the Masses Régis Baccaro – IBM @regbac
  • 3. Introduction Régis Baccaro @regbac http://Theblobfarm.wordpress.com http://Thelovefarm.wordpress.com regis@baccaro.com • • • • • Founder and lead organizer of SQL Saturday Denmark PASS Regional Mentor Works for IBM Passionate about the community .Net developer, BI dude, SharePoint fellow and accidental DBA
  • 4. Agenda • Power Query and the M language • E and T and L with Power Query • Data refresh techniques with PQ • Next step
  • 5. Introduction • Power Query • Get data experience • Filter and combine • Embedded M for repeatable mashup • Power Query Formula Language (aka M) • • • • • Mostly pure Higher-order Dynamically typed Partially lazy  Functional programming language
  • 6. Elements of language • Expressions – central construct • Evaluated to a single vlaue • Values • • • • • Primitives List – ordered seq. Record – set of fields Table Function
  • 7. Evaluation • Excel-like (surprise !) • Nested records • In Records • In Lists • Lazy evaluation • Lists and Records (and let) • Eager evaluation • Everything else
  • 8. Functions and Standard Library • Mapping from a set of values to a single value • (named parameters) => function body • Common set of definitions
  • 9. Operators • Meaning varies depending on kind of value • & = text or list concatenation and records merge
  • 10. Metadata • Information about a value that is associated with a value • A record • Exists for every value • Unobtrusive way to add information • Accessed with Value.Metadata
  • 11. Let .....in expression • So far only literal values • Let allows a set of value to be: • Computed • Named • Used in subsequent expressions that follows the in let in Source = Web.Page(Web.Contents("http://www.cvr.dk/Site/Forms/CompanySearch/CompanySearch.aspx?......), RowCount = Table.RowCount(Source) RowCount
  • 12. IF expression • Select between 2 expression based on logical condition
  • 13. Error expression • When an expression evaluation cannot yield a value • Raised with error • Handled with try • Produces an Error record • try...otherwise Used with default values
  • 14. Keywords and Operators • and as each else error false if in is let meta not otherwise or section shared then true try type #binary #date #datetime #datetimezone #duration #infinity #nan #sections #shared #table #time • , ; = < <= > >= <> + - * / & ( ) [ ] { } @ ! ? => .. ...
  • 15. The ”E” - Why is Power Query great for Extracting data • Multiple data sources Hey wait ! Where is PDW ?
  • 16. Query folding - A step toward declarative ETL approach • Declarative vs Imperative • Query folding similar to predicate pushdown • Does Power Query have a Query Optimizer ? • Demo Query folding - the unofficial list: • SQL Databases • OData and OData based sources, such as the Windows Azure Marketplace and SharePoint Lists • Active Directory • HDFS.Files, Folder.Files, and Folder.Contents (for basic operations on paths) • • • • Column removal Renaming Joins Type conversions
  • 17. Real life scenario – ETL for the masses • Seen a lot of demos • Build a lot of demos • They are always so clean !
  • 19. Transform • M is how the magic happens! • Data manipulation • Records • Lists • Tables • Merging • Function calls
  • 20. What about our scenario? • Where should I get my data from? • Pure Excel • Excel and MDS/DQS/SSIS/SQL • Web, SQL, XML, ? • Let me show you ! Input • (cvr web)
  • 21. Let’s go to homegrown data? • Bad web service • Bad HTML structure • Let’s go with local data that we can control Isolated DB • SQL Server • Excel • Let’s Query! Local storage
  • 22. Clean up before you merge! • DQS Knowledge base with CVR + Cleansing project with LinkedIn input ________________________________________ = Demo2.1_AndreasStrandbyClean + • Hit ratio increased... Hit 250 Total 100% 90% 80% 200 70% 60% 150 50% = 40% 100 30% 20% 50 10% 0 0% Clean join Nested Merge join
  • 23. Smarter Power Query • Expression.Evaluate() • Examples • Load query text from file • Load function from file • Passing parameters (as constants) • Demo
  • 24. Refreshing Power Query data • Different solutions • All with flaws !
  • 25. Refreshing Power Query data – with VB6 ! • Back from 2006 Plus Minus Can be scheduled VB6 – are you kidding ? More robust than the non-technical solution • From Kim GreenLee
  • 26. Refreshing Power Query data – with PowerShell Plus Minus Robust Hard to troubleshoot Can not run in a task in windows task scheduler unless the user has checked that the user has to be logged on to run
  • 27. Refreshing Power Query data – The non-technical way • Let me show you ! Plus Minus Very easy Not very corporate ! The spreadsheet needs to be open Excel file not saved Locked out when it refreshes
  • 28. Refreshing Power Query data – The non-technical way part 2 • Let me show you ! Plus Minus Very easy Not very corporate ! Uses technique from previous The spreadsheet needs to be open
  • 29. Refreshing Power Query data – with SSIS Plus Minus Robust Requires a SQL Server (wait, it’s a plus!) Needs a SSIS / C# developer
  • 30. Refreshing Power Query data – with SSIS • Using DQS for cleansing input • Let me show you !
  • 31. How is Power query going to be used? • Data store accumulating interesting data points • Hook into read only data for reporting purposes or data marts • One file to accumulate (Produce) • Multiple files or programs to report (Consume) • I don’t believe in “Data Steward” • I believe someone will be in charge of procuring and monitoring data stores of disparate data (such as IT or DBA’s).
  • 32. Conclusion • A step toward declarative ETL approach • Still much work to do ! We have • A declarative data integration language • Only surfaced in Power Query • Can push data to an Excel spreadsheet Imagine..... • Connection to heterogenous data sources