SlideShare una empresa de Scribd logo
1 de 20
Scott Weinstein Principal Lab49 weblogs.asp.net/sweinstein   / @ScottWeinstein Using PowerShell to simplify your ETL
GOLD SPONSORS
SILVER SPONSOR
BRONZE SPONSORS
OTHER SPONSORS
SSIS Considered Harmful  Visual programming is a terrible interface for ETL work History of UI visual designers is instructive here No way to copy/share “code samples” Low information density Even worse It’s not even a good designer Poor defaults, poor layout Nested menus of property grids are hard to use No real VCS ability Sure it’s XML, but… No diff No merge Demo ?
But what about all the features? Some are trivial: Foreach Loop Container Execute SQL Task Send Mail Task OMG! Many are useful, but not hard to recreate Demos soon Others may be worth consideration Fuzzy matching
But what about performance The advantages for data flow are real, but not that large A 5M row test on my home machine gave a 12% performance advantage to SSIS over PowerShell + SqlBulkCopy
The Answer PowerShell A general purpose scripting and automation language Wide adoption MS: Windows Server, Windows 7, SQL Server 2008, VM Manager, IIS, Deployment Toolkit Others: VMWare, Quest, IBM Websphere
PowerShell Features Takes the best of Unix, VMS, Perl Adds in integration with .Net, COM, WMI Adds in  REPL environment Full scripting/programming Flexible typing model Pipes Regular expressions XML Navigation Providers Built in command parser Standardized syntax for commands
Introducing PSIS http://psis.codeplex.com/ Yes, the name is derivative Right now, just a playground for some demos Bulk Data Transfer Star-Schema Populator Execute SQL
PSIS demo Just the ETL No logging, error reporting, etc Some typical ETL tasks Clear staging tables Bulk transfer data from one DB to another Populate a star schema Fact table and dimension tables Extract data to a .csv file
Clear staging tables Use Invoke-MSSqlCommand Use Truncate <tablename>
Bulk transfer At core, we’ll use .Net’sSqlBulkCopy under the covers uses TDS’s BCP protocol For  concurrency we’ll use the Task Parallel Library We’ll simplify things by making the destination tables map directly to the source tables Select statements could provide needed mapping
Populate a Star-Schema Destination
Populate a Star-Schema Source a view on the staging tables, creating a flat de-normalized row of all dimension and fact value Use convention to indicate which columns are fact or dimension tables
Populate a Star-Schema Code generate a populator Create functions, given a source row which return/create the corresponding dimension key Run it with the TPL for performance
Populate a Star-Schema Areas of improvement Add support for slowly changing dimensions Add support for lower memory requirements  query DB for dimension values on demand Measure and tune performance at the micro-benchmark level Concurrent dictionaries Partial loads
Extract to file Invoke-MSSqlCommand Export-Csv
Others?

Más contenido relacionado

La actualidad más candente

Managing V Sphere With The Vesi
Managing V Sphere With The VesiManaging V Sphere With The Vesi
Managing V Sphere With The VesiEric Sloof
 
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY StyleAthens Big Data
 
Reactive programming
Reactive programmingReactive programming
Reactive programmingNick Hodge
 
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Bolke de Bruin
 
Grafana optimization for Prometheus
Grafana optimization for PrometheusGrafana optimization for Prometheus
Grafana optimization for PrometheusMitsuhiro Tanda
 
Spring.Net, Feb 2008, PostSharp: A Technical Introduction
Spring.Net, Feb 2008, PostSharp:  A Technical IntroductionSpring.Net, Feb 2008, PostSharp:  A Technical Introduction
Spring.Net, Feb 2008, PostSharp: A Technical Introductiongfraiteur
 
Using Amazon EC2 to Scale Your Web Application
Using Amazon EC2 to Scale Your Web ApplicationUsing Amazon EC2 to Scale Your Web Application
Using Amazon EC2 to Scale Your Web Applicationravipratapm
 
Threading Made Easy! A Busy Developer’s Guide to Kotlin Coroutines
Threading Made Easy! A Busy Developer’s Guide to Kotlin CoroutinesThreading Made Easy! A Busy Developer’s Guide to Kotlin Coroutines
Threading Made Easy! A Busy Developer’s Guide to Kotlin CoroutinesLauren Yew
 
Reactive programming and RxJS
Reactive programming and RxJSReactive programming and RxJS
Reactive programming and RxJSRavi Mone
 
Continuous deployment of Rails apps on AWS OpsWorks
Continuous deployment of Rails apps on AWS OpsWorksContinuous deployment of Rails apps on AWS OpsWorks
Continuous deployment of Rails apps on AWS OpsWorksTomaž Zaman
 
Analytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAnalytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAlex Palamides
 
Building Robust Pipelines with Airflow
Building Robust Pipelines with AirflowBuilding Robust Pipelines with Airflow
Building Robust Pipelines with AirflowErin Shellman
 
Serverless in azure
Serverless in azureServerless in azure
Serverless in azureVeresh Jain
 
Design Matters: Why In-Place Copy Data Management is the Right Choice
Design Matters: Why In-Place Copy Data Management is the Right Choice Design Matters: Why In-Place Copy Data Management is the Right Choice
Design Matters: Why In-Place Copy Data Management is the Right Choice Catalogic Software
 
An Introduction to the Heatmap / Histogram Plugin
An Introduction to the Heatmap / Histogram PluginAn Introduction to the Heatmap / Histogram Plugin
An Introduction to the Heatmap / Histogram PluginMitsuhiro Tanda
 
Reactive programming with rx java
Reactive programming with rx javaReactive programming with rx java
Reactive programming with rx javaCongTrung Vnit
 
Diminuendo! Tactics in Support of FaaS Migrations Slides
Diminuendo! Tactics in Support of FaaS Migrations SlidesDiminuendo! Tactics in Support of FaaS Migrations Slides
Diminuendo! Tactics in Support of FaaS Migrations SlidesSebastian Werner
 
Quarterly Technology Briefing, Manchester, UK September 2013
Quarterly Technology Briefing, Manchester, UK September 2013Quarterly Technology Briefing, Manchester, UK September 2013
Quarterly Technology Briefing, Manchester, UK September 2013Thoughtworks
 

La actualidad más candente (19)

Managing V Sphere With The Vesi
Managing V Sphere With The VesiManaging V Sphere With The Vesi
Managing V Sphere With The Vesi
 
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
 
Reactive programming
Reactive programmingReactive programming
Reactive programming
 
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
 
Grafana optimization for Prometheus
Grafana optimization for PrometheusGrafana optimization for Prometheus
Grafana optimization for Prometheus
 
Spring.Net, Feb 2008, PostSharp: A Technical Introduction
Spring.Net, Feb 2008, PostSharp:  A Technical IntroductionSpring.Net, Feb 2008, PostSharp:  A Technical Introduction
Spring.Net, Feb 2008, PostSharp: A Technical Introduction
 
Using Amazon EC2 to Scale Your Web Application
Using Amazon EC2 to Scale Your Web ApplicationUsing Amazon EC2 to Scale Your Web Application
Using Amazon EC2 to Scale Your Web Application
 
Threading Made Easy! A Busy Developer’s Guide to Kotlin Coroutines
Threading Made Easy! A Busy Developer’s Guide to Kotlin CoroutinesThreading Made Easy! A Busy Developer’s Guide to Kotlin Coroutines
Threading Made Easy! A Busy Developer’s Guide to Kotlin Coroutines
 
Riding the Streaming Wave DIY style
Riding the Streaming Wave  DIY styleRiding the Streaming Wave  DIY style
Riding the Streaming Wave DIY style
 
Reactive programming and RxJS
Reactive programming and RxJSReactive programming and RxJS
Reactive programming and RxJS
 
Continuous deployment of Rails apps on AWS OpsWorks
Continuous deployment of Rails apps on AWS OpsWorksContinuous deployment of Rails apps on AWS OpsWorks
Continuous deployment of Rails apps on AWS OpsWorks
 
Analytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAnalytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using R
 
Building Robust Pipelines with Airflow
Building Robust Pipelines with AirflowBuilding Robust Pipelines with Airflow
Building Robust Pipelines with Airflow
 
Serverless in azure
Serverless in azureServerless in azure
Serverless in azure
 
Design Matters: Why In-Place Copy Data Management is the Right Choice
Design Matters: Why In-Place Copy Data Management is the Right Choice Design Matters: Why In-Place Copy Data Management is the Right Choice
Design Matters: Why In-Place Copy Data Management is the Right Choice
 
An Introduction to the Heatmap / Histogram Plugin
An Introduction to the Heatmap / Histogram PluginAn Introduction to the Heatmap / Histogram Plugin
An Introduction to the Heatmap / Histogram Plugin
 
Reactive programming with rx java
Reactive programming with rx javaReactive programming with rx java
Reactive programming with rx java
 
Diminuendo! Tactics in Support of FaaS Migrations Slides
Diminuendo! Tactics in Support of FaaS Migrations SlidesDiminuendo! Tactics in Support of FaaS Migrations Slides
Diminuendo! Tactics in Support of FaaS Migrations Slides
 
Quarterly Technology Briefing, Manchester, UK September 2013
Quarterly Technology Briefing, Manchester, UK September 2013Quarterly Technology Briefing, Manchester, UK September 2013
Quarterly Technology Briefing, Manchester, UK September 2013
 

Similar a Using PowerShell to Simplify your ETL

SQL Server 2008 Integration Services
SQL Server 2008 Integration ServicesSQL Server 2008 Integration Services
SQL Server 2008 Integration ServicesEduardo Castro
 
Intro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data IntegrationIntro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data IntegrationPhilip Yurchuk
 
SQLSaturday#290_Kiev_AdHocMaintenancePlansForBeginners
SQLSaturday#290_Kiev_AdHocMaintenancePlansForBeginnersSQLSaturday#290_Kiev_AdHocMaintenancePlansForBeginners
SQLSaturday#290_Kiev_AdHocMaintenancePlansForBeginnersTobias Koprowski
 
Experiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure teamExperiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure teamBrian Benz
 
Linq 1224887336792847 9
Linq 1224887336792847 9Linq 1224887336792847 9
Linq 1224887336792847 9google
 
Silicon Valley Code Camp 2011: Play! as you REST
Silicon Valley Code Camp 2011: Play! as you RESTSilicon Valley Code Camp 2011: Play! as you REST
Silicon Valley Code Camp 2011: Play! as you RESTManish Pandit
 
Linq To The Enterprise
Linq To The EnterpriseLinq To The Enterprise
Linq To The EnterpriseDaniel Egan
 
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 minsSparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 minssparkflows
 
Building a Testable Data Access Layer
Building a Testable Data Access LayerBuilding a Testable Data Access Layer
Building a Testable Data Access LayerTodd Anglin
 
Azure DevOps for Developers
Azure DevOps for DevelopersAzure DevOps for Developers
Azure DevOps for DevelopersSarah Dutkiewicz
 
Making your RDBMS fast!
Making your RDBMS fast! Making your RDBMS fast!
Making your RDBMS fast! VictorSzoltysek
 
Introduction to Threading in .Net
Introduction to Threading in .NetIntroduction to Threading in .Net
Introduction to Threading in .Netwebhostingguy
 
Datastage Online Training @ Adithya Elearning
Datastage Online Training @ Adithya ElearningDatastage Online Training @ Adithya Elearning
Datastage Online Training @ Adithya Elearningshanmukha rao dondapati
 
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for Developers
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for DevelopersMSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for Developers
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for DevelopersDave Bost
 
Introduction to Codenvy / JugSummerCamp 2014
Introduction to Codenvy / JugSummerCamp 2014Introduction to Codenvy / JugSummerCamp 2014
Introduction to Codenvy / JugSummerCamp 2014Florent BENOIT
 
Moving from SQL Server to MongoDB
Moving from SQL Server to MongoDBMoving from SQL Server to MongoDB
Moving from SQL Server to MongoDBNick Court
 
Introduction to PowerShell - Be a PowerShell Hero - SPFest workshop
Introduction to PowerShell - Be a PowerShell Hero - SPFest workshopIntroduction to PowerShell - Be a PowerShell Hero - SPFest workshop
Introduction to PowerShell - Be a PowerShell Hero - SPFest workshopMichael Blumenthal (Microsoft MVP)
 

Similar a Using PowerShell to Simplify your ETL (20)

SQL Server 2008 Integration Services
SQL Server 2008 Integration ServicesSQL Server 2008 Integration Services
SQL Server 2008 Integration Services
 
Intro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data IntegrationIntro to Talend Open Studio for Data Integration
Intro to Talend Open Studio for Data Integration
 
SQLSaturday#290_Kiev_AdHocMaintenancePlansForBeginners
SQLSaturday#290_Kiev_AdHocMaintenancePlansForBeginnersSQLSaturday#290_Kiev_AdHocMaintenancePlansForBeginners
SQLSaturday#290_Kiev_AdHocMaintenancePlansForBeginners
 
Experiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure teamExperiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure team
 
Linq 1224887336792847 9
Linq 1224887336792847 9Linq 1224887336792847 9
Linq 1224887336792847 9
 
Silicon Valley Code Camp 2011: Play! as you REST
Silicon Valley Code Camp 2011: Play! as you RESTSilicon Valley Code Camp 2011: Play! as you REST
Silicon Valley Code Camp 2011: Play! as you REST
 
It ready dw_day3_rev00
It ready dw_day3_rev00It ready dw_day3_rev00
It ready dw_day3_rev00
 
Linq To The Enterprise
Linq To The EnterpriseLinq To The Enterprise
Linq To The Enterprise
 
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 minsSparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
 
Building a Testable Data Access Layer
Building a Testable Data Access LayerBuilding a Testable Data Access Layer
Building a Testable Data Access Layer
 
Azure DevOps for Developers
Azure DevOps for DevelopersAzure DevOps for Developers
Azure DevOps for Developers
 
Making your RDBMS fast!
Making your RDBMS fast! Making your RDBMS fast!
Making your RDBMS fast!
 
Introduction to Threading in .Net
Introduction to Threading in .NetIntroduction to Threading in .Net
Introduction to Threading in .Net
 
Datastage Online Training @ Adithya Elearning
Datastage Online Training @ Adithya ElearningDatastage Online Training @ Adithya Elearning
Datastage Online Training @ Adithya Elearning
 
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for Developers
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for DevelopersMSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for Developers
MSDN Presents: Visual Studio 2010, .NET 4, SharePoint 2010 for Developers
 
Introduction to Codenvy / JugSummerCamp 2014
Introduction to Codenvy / JugSummerCamp 2014Introduction to Codenvy / JugSummerCamp 2014
Introduction to Codenvy / JugSummerCamp 2014
 
Sparkflows Use Cases
Sparkflows Use CasesSparkflows Use Cases
Sparkflows Use Cases
 
SparkFlow
SparkFlow SparkFlow
SparkFlow
 
Moving from SQL Server to MongoDB
Moving from SQL Server to MongoDBMoving from SQL Server to MongoDB
Moving from SQL Server to MongoDB
 
Introduction to PowerShell - Be a PowerShell Hero - SPFest workshop
Introduction to PowerShell - Be a PowerShell Hero - SPFest workshopIntroduction to PowerShell - Be a PowerShell Hero - SPFest workshop
Introduction to PowerShell - Be a PowerShell Hero - SPFest workshop
 

Último

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 

Último (20)

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

Using PowerShell to Simplify your ETL

  • 1. Scott Weinstein Principal Lab49 weblogs.asp.net/sweinstein / @ScottWeinstein Using PowerShell to simplify your ETL
  • 6. SSIS Considered Harmful Visual programming is a terrible interface for ETL work History of UI visual designers is instructive here No way to copy/share “code samples” Low information density Even worse It’s not even a good designer Poor defaults, poor layout Nested menus of property grids are hard to use No real VCS ability Sure it’s XML, but… No diff No merge Demo ?
  • 7. But what about all the features? Some are trivial: Foreach Loop Container Execute SQL Task Send Mail Task OMG! Many are useful, but not hard to recreate Demos soon Others may be worth consideration Fuzzy matching
  • 8. But what about performance The advantages for data flow are real, but not that large A 5M row test on my home machine gave a 12% performance advantage to SSIS over PowerShell + SqlBulkCopy
  • 9. The Answer PowerShell A general purpose scripting and automation language Wide adoption MS: Windows Server, Windows 7, SQL Server 2008, VM Manager, IIS, Deployment Toolkit Others: VMWare, Quest, IBM Websphere
  • 10. PowerShell Features Takes the best of Unix, VMS, Perl Adds in integration with .Net, COM, WMI Adds in REPL environment Full scripting/programming Flexible typing model Pipes Regular expressions XML Navigation Providers Built in command parser Standardized syntax for commands
  • 11. Introducing PSIS http://psis.codeplex.com/ Yes, the name is derivative Right now, just a playground for some demos Bulk Data Transfer Star-Schema Populator Execute SQL
  • 12. PSIS demo Just the ETL No logging, error reporting, etc Some typical ETL tasks Clear staging tables Bulk transfer data from one DB to another Populate a star schema Fact table and dimension tables Extract data to a .csv file
  • 13. Clear staging tables Use Invoke-MSSqlCommand Use Truncate <tablename>
  • 14. Bulk transfer At core, we’ll use .Net’sSqlBulkCopy under the covers uses TDS’s BCP protocol For concurrency we’ll use the Task Parallel Library We’ll simplify things by making the destination tables map directly to the source tables Select statements could provide needed mapping
  • 15. Populate a Star-Schema Destination
  • 16. Populate a Star-Schema Source a view on the staging tables, creating a flat de-normalized row of all dimension and fact value Use convention to indicate which columns are fact or dimension tables
  • 17. Populate a Star-Schema Code generate a populator Create functions, given a source row which return/create the corresponding dimension key Run it with the TPL for performance
  • 18. Populate a Star-Schema Areas of improvement Add support for slowly changing dimensions Add support for lower memory requirements query DB for dimension values on demand Measure and tune performance at the micro-benchmark level Concurrent dictionaries Partial loads
  • 19. Extract to file Invoke-MSSqlCommand Export-Csv