SlideShare una empresa de Scribd logo
1 de 30
SSIS Design Pattern - Incremental Loads
Introduction
Loading data from a data source to SQL Server is a common task. It's used in Data
Warehousing, but increasingly data is being staged in SQL Server for non-Business-
Intelligence purposes.
Maintaining data integrity is key when loading data into any database. A common way of
accomplishing this is to truncate the destination and reload from the source. While this
method ensures data integrity, it also loads a lot of data that was just deleted.
Incremental loads are a faster and use less server resources. Only new or updated data is
touched in an incremental load.
When To Use Incremental Loads
Use incremental loads whenever you need to load data from a data source to SQL Server.
Incremental loads are the same regardless of which database platform or ETL tool you use.
You need to detect new and updated rows - and separate these from the unchanged rows.
Incremental Loads in Transact-SQL
I will start by demonstrating this with T-SQL:
0. (Optional, but recommended) Create two databases: a source and destination database for
this demonstration:
CREATE DATABASE [SSISIncrementalLoad_Source]
CREATE DATABASE [SSISIncrementalLoad_Dest]
1. Create a source named tblSource with the columns ColID, ColA, ColB, and ColC; make
ColID is a primary unique key:
USE SSISIncrementalLoad_Source
GO
CREATE TABLE dbo.tblSource
(ColID int NOT NULL
,ColA varchar(10) NULL
,ColB datetime NULL constraint df_ColB default (getDate())
,ColC int NULL
,constraint PK_tblSource primary key clustered (ColID))
2. Create a Destination table named tblDest with the columns ColID, ColA, ColB, ColC:
USE SSISIncrementalLoad_Dest
GO
CREATE TABLE dbo.tblDest
(ColID int NOT NULL
,ColA varchar(10) NULL
,ColB datetime NULL
,ColC int NULL)
3. Let's load some test data into both tables for demonstration purposes:
USE SSISIncrementalLoad_Source
GO
-- insert an "unchanged" row
INSERT INTO dbo.tblSource
(ColID,ColA,ColB,ColC)
VALUES(0, 'A', '1/1/2007 12:01 AM', -1)
-- insert a "changed" row
INSERT INTO dbo.tblSource
(ColID,ColA,ColB,ColC)
VALUES(1, 'B', '1/1/2007 12:02 AM', -2)
-- insert a "new" row
INSERT INTO dbo.tblSource
(ColID,ColA,ColB,ColC)
VALUES(2, 'N', '1/1/2007 12:03 AM', -3)
USE SSISIncrementalLoad_Dest
GO
-- insert an "unchanged" row
INSERT INTO dbo.tblDest
(ColID,ColA,ColB,ColC)
VALUES(0, 'A', '1/1/2007 12:01 AM', -1)
-- insert a "changed" row
INSERT INTO dbo.tblDest
(ColID,ColA,ColB,ColC)
VALUES(1, 'C', '1/1/2007 12:02 AM', -2)
4. You can view new rows with the following query:
SELECT s.ColID, s.ColA, s.ColB, s.ColC
FROM SSISIncrementalLoad_Source.dbo.tblSource s
LEFT JOIN SSISIncrementalLoad_Dest.dbo.tblDest d ON d.ColID = s.ColID
WHERE d.ColID IS NULL
This should return the "new" row - the one loaded earlier with ColID = 2 and ColA = 'N'.
Why? The LEFT JOIN and WHERE clauses are the key. Left Joins return all rows on the left
side of the join clause (SSISIncrementalLoad_Source.dbo.tblSource in this case) whether there's
a match on the right side of the join clause (SSISIncrementalLoad_Dest.dbo.tblDest in this
case) or not. If there is no match on the right side, NULLs are returned. This is why the
WHERE clause works: it goes after rows where the destination ColID is NULL. These rows
have no match in the LEFT JOIN, therefore they must be new.
This is only an example. You occasionally find database schemas that are this easy to load.
Occasionally. Most of the time you have to include several columns in the JOIN ON clause to
isolate truly new rows. Sometimes you have to add conditions in the WHERE clause to refine
the definition of truly new rows.
Incrementally load the row ("rows" in practice) with the following T-SQL statement:
INSERT INTO SSISIncrementalLoad_Dest.dbo.tblDest
(ColID, ColA, ColB, ColC)
SELECT s.ColID, s.ColA, s.ColB, s.ColC
FROM SSISIncrementalLoad_Source.dbo.tblSource s
LEFT JOIN SSISIncrementalLoad_Dest.dbo.tblDest d ON d.ColID = s.ColID
WHERE d.ColID IS NULL
5. There are many ways by which people try to isolate changed rows. The only sure-fire way
to accomplish it is to compare each field. View changed rows with the following T-SQL
statement:
SELECT d.ColID, d.ColA, d.ColB, d.ColC
FROM SSISIncrementalLoad_Dest.dbo.tblDest d
INNER JOIN SSISIncrementalLoad_Source.dbo.tblSource s ON s.ColID = d.ColID
WHERE (
(d.ColA != s.ColA)
OR (d.ColB != s.ColB)
OR (d.ColC != s.ColC)
)
This should return the "changed" row we loaded earlier with ColID = 1 and ColA = 'C'. Why? The INNER JOIN
and WHERE clauses are to blame - again. The INNER JOIN goes after rows with matching ColID's because of
the JOIN ON clause. The WHERE clause refines the resultset, returning only rows where the ColA's,
ColB's, or ColC's don't match and the ColID's match. This is important. If there's a difference in
anyor some or all the rows (except ColID), we want to update it.
Extract-Transform-Load (ETL) theory has a lot to say about when and how to update changed data. You will
want to pick up a good book on the topic to learn more about the variations.
To update the data in our destination, use the following T-SQL:
UPDATE d
SET
d.ColA = s.ColA
,d.ColB = s.ColB
,d.ColC = s.ColC
FROM SSISIncrementalLoad_Dest.dbo.tblDest d
INNER JOIN SSISIncrementalLoad_Source.dbo.tblSource s ON s.ColID = d.ColID
WHERE (
(d.ColA != s.ColA)
OR (d.ColB != s.ColB)
OR (d.ColC != s.ColC)
)
Incremental Loads in SSIS
Let's take a look at how you can accomplish this in SSIS using the Lookup
Transformation (for the join functionality) combined with the Conditional Split (for the
WHERE clause conditions) transformations.
Before we begin, let's reset our database tables to their original state using the following
query:
USE SSISIncrementalLoad_Source
GO
TRUNCATE TABLE dbo.tblSource
-- insert an "unchanged" row
INSERT INTO dbo.tblSource
(ColID,ColA,ColB,ColC)
VALUES(0, 'A', '1/1/2007 12:01 AM', -1)
-- insert a "changed" row
INSERT INTO dbo.tblSource
(ColID,ColA,ColB,ColC)
VALUES(1, 'B', '1/1/2007 12:02 AM', -2)
-- insert a "new" row
INSERT INTO dbo.tblSource
(ColID,ColA,ColB,ColC)
VALUES(2, 'N', '1/1/2007 12:03 AM', -3)
USE SSISIncrementalLoad_Dest
GO
TRUNCATE TABLE dbo.tblDest
-- insert an "unchanged" row
INSERT INTO dbo.tblDest
(ColID,ColA,ColB,ColC)
VALUES(0, 'A', '1/1/2007 12:01 AM', -1)
-- insert a "changed" row
INSERT INTO dbo.tblDest
(ColID,ColA,ColB,ColC)
VALUES(1, 'C', '1/1/2007 12:02 AM', -2)
Next, create a new project using Business Intelligence Development Studio (BIDS). Name
the project SSISIncrementalLoad:
Once the project loads, open Solution Explorer and rename Package1.dtsx to
SSISIncrementalLoad.dtsx:
When prompted to rename the package object, click the Yes button. From the toolbox, drag a
Data Flow onto the Control Flow canvas:
Double-click the Data Flow task to edit it. From the toolbox, drag and drop an OLE DB
Source onto the Data Flow canvas:
Double-click the OLE DB Source connection adapter to edit it:
Click the New button beside the OLE DB Connection Manager dropdown:
Click the New button here to create a new Data Connection:
Enter or select your server name. Connect to the SSISIncrementalLoad_Source database you
created earlier. Click the OK button to return to the Connection Manager configuration
dialog. Click the OK button to accept your newly created Data Connection as the Connection
Manager you wish to define. Select "dbo.tblSource" from the Table dropdown:
Click the OK button to complete defining the OLE DB Source Adapter.
Drag and drop a Lookup Transformation from the toolbox onto the Data Flow canvas.
Connect the OLE DB connection adapter to the Lookup transformation by clicking on the
OLE DB Source and dragging the green arrow over the Lookup and dropping it. Right-click
the Lookup transformation and click Edit (or double-click the Lookup transformation) to edit:
When the editor opens, click the New button beside the OLE DB Connection Manager
dropdown (as you did earlier for the OLE DB Source Adapter). Define a new Data
Connection - this time to the SSISIncrementalLoad_Dest database. After setting up the new
Data Connection and Connection Manager, configure the Lookup transformation to connect
to "dbo.tblDest":
Click the Columns tab. On the left side are the columns currently in the SSIS data flow
pipeline (from SSISIncrementalLoad_Source.dbo.tblSource). On the right side are columns
available from the Lookup destination you just configured (from
SSISIncrementalLoad_Dest.dbo.tblDest). Follow the following steps:
1. We'll need all the rows returned from the destination table, so check all the checkboxes
beside the rows in the destination. We need these rows for our WHERE clauses and for our
JOIN ON clauses.
2. We do not want to map all the rows between the source and destination - we only want to
map the columns named ColID between the database tables. The Mappings drawn between
the Available Input Columns and Available Lookup Columns define the JOIN ON clause.
Multi-select the Mappings between ColA, ColB, and ColC by clicking on them while holding
the Ctrl key. Right-click any of them and click "Delete Selected Mappings" to delete these
columns from our JOIN ON clause.
3. Add the text "Dest_" to each column's Output Alias. These rows are being appended to the
data flow pipeline. This is so we can distinguish between Source and Destination rows farther
down the pipeline:
Next we need to modify our Lookup transformation behavior. By default, the Lookup
operates as an INNER JOIN - but we need a LEFT (OUTER) JOIN. Click the "Configure
Error Output" button to open the "Configure Error Output" screen. On the "Lookup Output"
row, change the Error column from "Fail component" to "Ignore failure". This tells the
Lookup transformation "If you don't find an INNER JOIN match in the destination table for
the Source table's ColID value, don't fail." - which also effectively tells the Lookup "Don't act
like an INNER JOIN, behave like a LEFT JOIN":
Click OK to complete the Lookup transformation configuration.
From the toolbox, drag and drop a Conditional Split Transformation onto the Data Flow
canvas. Connect the Lookup to the Conditional Split as shown. Right-click the Conditional
Split and click Edit to open the Conditional Split Editor:
Expand the NULL Functions folder in the upper right of the Conditional Split Transformation
Editor. Expand the Columns folder in the upper left side of the Conditional Split
Transformation Editor. Click in the "Output Name" column and enter "New Rows" as the
name of the first output. From the NULL Functions folder, drag and drop the "ISNULL(
<<expression>> )" function to the Condition column of the New Rows condition:
Next, drag Dest_ColID from the columns folder and drop it onto the "<<expression>>" text
in the Condition column. "New Rows" should now be defined by the condition "ISNULL(
[Dest_ColID] )". This defines the WHERE clause for new rows - setting it to "WHERE
Dest_ColID Is NULL".
Type "Changed Rows" into a second Output Name column. Add the expression "(ColA !=
Dest_ColA) || (ColB != Dest_ColB) || (ColC != Dest_ColC)" to the Condition column for the
Changed Rows output. This defines our WHERE clause for detecting changed rows - setting
it to "WHERE ((Dest_ColA != ColA) OR (Dest_ColB != ColB) OR (Dest_ColC != ColC))".
Note "||" is used to convey "OR" in SSIS Expressions:
Change the "Default output name" from "Conditional Split Default Output" to "Unchanged
Rows":
Click the OK button to complete configuration of the Conditional Split transformation.
Drag and drop an OLE DB Destination connection adapter and an OLE DB Command
transformation onto the Data Flow canvas. Click on the Conditional Split and connect it to
the OLE DB Destination. A dialog will display prompting you to select a Conditional Split
Output (those outputs you defined in the last step). Select the New Rows output:
Next connect the OLE DB Command transformation to the Conditional Split's "Changed
Rows" output:
Your Data Flow canvas should appear similar to the following:
Configure the OLE DB Destination by aiming at the SSISIncrementalLoad_Dest.dbo.tblDest
table:
Click the Mappings item in the list to the left. Make sure the ColID, ColA, ColB, and ColC
source columns are mapped to their matching destination columns (aren't you glad we
prepended "Dest_" to the destination columns?):
Click the OK button to complete configuring the OLE DB Destination connection adapter.
Double-click the OLE DB Command to open the "Advanced Editor for OLE DB Command"
dialog. Set the Connection Manager column to your SSISIncrementalLoad_Dest connection
manager:
Click on the "Component Properties" tab. Click the elipsis (button with "...") beside the
SQLCommand property:
The String Value Editor displays. Enter the following parameterized T-SQL statement into
the String Value textbox:
UPDATE dbo.tblDest
SET
ColA = ?
,ColB = ?
,ColC = ?
WHERE ColID = ?
The question marks in the previous parameterized T-SQL statement map by ordinal to
columns named "Param_0" through "Param_3". Map them as shown below - effectively
altering the UPDATE statement for each row to read:
UPDATE SSISIncrementalLoad_Dest.dbo.tblDest
SET
ColA = SSISIncrementalLoad_Source.dbo.ColA
,ColB = SSISIncrementalLoad_Source.dbo.ColB
,ColC = SSISIncrementalLoad_Source.dbo.ColC
WHERE ColID = SSISIncrementalLoad_Source.dbo.ColID
Note the query is executed on a row-by-row basis. For performance with large amounts of
data, you will want to employ set-based updates instead.
Click the OK button when mapping is completed.
Your Data Flow canvas should look like that pictured below:
If you execute the package with debugging (press F5), the package should succeed and
appear as shown here:
Note one row takes the "New Rows" output from the Conditional Split, and one row takes the
"Changed Rows" output from the Conditional Split transformation. Although not visible, our
third source row doesn't change, and would be sent to the "Unchanged Rows" output - which
is simply the default Conditional Split output renamed. Any row that doesn't meet any of the
predefined conditions in the Conditional Split is sent to the default output.

Más contenido relacionado

La actualidad más candente

Introduction of sql server indexing
Introduction of sql server indexingIntroduction of sql server indexing
Introduction of sql server indexingMahabubur Rahaman
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architectureBishal Khanal
 
The Great Debate: PostgreSQL vs MySQL
The Great Debate: PostgreSQL vs MySQLThe Great Debate: PostgreSQL vs MySQL
The Great Debate: PostgreSQL vs MySQLEDB
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and UsesSuvradeep Rudra
 
Warehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasWarehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasEric Matthews
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing conceptspcherukumalla
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingEyad Manna
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoopjoelcrabb
 
Mongodb - NoSql Database
Mongodb - NoSql DatabaseMongodb - NoSql Database
Mongodb - NoSql DatabasePrashant Gupta
 
SQL Server Tuning to Improve Database Performance
SQL Server Tuning to Improve Database PerformanceSQL Server Tuning to Improve Database Performance
SQL Server Tuning to Improve Database PerformanceMark Ginnebaugh
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7abdulrahmanhelan
 
Oracle Database in-Memory Overivew
Oracle Database in-Memory OverivewOracle Database in-Memory Overivew
Oracle Database in-Memory OverivewMaria Colgan
 
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | EdurekaData Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | EdurekaEdureka!
 
The Relational Database Model
The Relational Database ModelThe Relational Database Model
The Relational Database ModelShishir Aryal
 
Columnar Databases (1).pptx
Columnar Databases (1).pptxColumnar Databases (1).pptx
Columnar Databases (1).pptxssuser55cbdb
 

La actualidad más candente (20)

Introduction of sql server indexing
Introduction of sql server indexingIntroduction of sql server indexing
Introduction of sql server indexing
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architecture
 
The Great Debate: PostgreSQL vs MySQL
The Great Debate: PostgreSQL vs MySQLThe Great Debate: PostgreSQL vs MySQL
The Great Debate: PostgreSQL vs MySQL
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
NoSql
NoSqlNoSql
NoSql
 
Warehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasWarehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemas
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Mongodb - NoSql Database
Mongodb - NoSql DatabaseMongodb - NoSql Database
Mongodb - NoSql Database
 
SQL Server Tuning to Improve Database Performance
SQL Server Tuning to Improve Database PerformanceSQL Server Tuning to Improve Database Performance
SQL Server Tuning to Improve Database Performance
 
Database
DatabaseDatabase
Database
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
The CAP Theorem
The CAP Theorem The CAP Theorem
The CAP Theorem
 
Graph database
Graph database Graph database
Graph database
 
Oracle Database in-Memory Overivew
Oracle Database in-Memory OverivewOracle Database in-Memory Overivew
Oracle Database in-Memory Overivew
 
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | EdurekaData Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
 
The Relational Database Model
The Relational Database ModelThe Relational Database Model
The Relational Database Model
 
Columnar Databases (1).pptx
Columnar Databases (1).pptxColumnar Databases (1).pptx
Columnar Databases (1).pptx
 

Similar a Incremental load

BI Tutorial (Copying Data from Oracle to Microsoft SQLServer)
BI Tutorial (Copying Data from Oracle to Microsoft SQLServer)BI Tutorial (Copying Data from Oracle to Microsoft SQLServer)
BI Tutorial (Copying Data from Oracle to Microsoft SQLServer)Ifeanyi I Nwodo(De Jeneral)
 
SQL Database Performance Tuning for Developers
SQL Database Performance Tuning for DevelopersSQL Database Performance Tuning for Developers
SQL Database Performance Tuning for DevelopersBRIJESH KUMAR
 
Managing Oracle Streams Using Enterprise Manager Grid Control
Managing Oracle Streams Using Enterprise Manager Grid ControlManaging Oracle Streams Using Enterprise Manager Grid Control
Managing Oracle Streams Using Enterprise Manager Grid Controlscottb411
 
Learning sql from w3schools
Learning sql from w3schoolsLearning sql from w3schools
Learning sql from w3schoolsfarhan516
 
Accessing data with android cursors
Accessing data with android cursorsAccessing data with android cursors
Accessing data with android cursorsinfo_zybotech
 
Accessing data with android cursors
Accessing data with android cursorsAccessing data with android cursors
Accessing data with android cursorsinfo_zybotech
 
Mds cdc implementation
Mds cdc implementationMds cdc implementation
Mds cdc implementationSainatth Wagh
 
Ms sql server ii
Ms sql server  iiMs sql server  ii
Ms sql server iiIblesoft
 
PPT of Common Table Expression (CTE), Window Functions, JOINS, SubQuery
PPT  of Common Table Expression (CTE), Window Functions, JOINS, SubQueryPPT  of Common Table Expression (CTE), Window Functions, JOINS, SubQuery
PPT of Common Table Expression (CTE), Window Functions, JOINS, SubQueryAbhishek590097
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007paulguerin
 
Simple ado program by visual studio
Simple ado program by visual studioSimple ado program by visual studio
Simple ado program by visual studioAravindharamanan S
 
Simple ado program by visual studio
Simple ado program by visual studioSimple ado program by visual studio
Simple ado program by visual studioAravindharamanan S
 

Similar a Incremental load (20)

BI Tutorial (Copying Data from Oracle to Microsoft SQLServer)
BI Tutorial (Copying Data from Oracle to Microsoft SQLServer)BI Tutorial (Copying Data from Oracle to Microsoft SQLServer)
BI Tutorial (Copying Data from Oracle to Microsoft SQLServer)
 
Oracle: Commands
Oracle: CommandsOracle: Commands
Oracle: Commands
 
Oracle: DDL
Oracle: DDLOracle: DDL
Oracle: DDL
 
SQL Database Performance Tuning for Developers
SQL Database Performance Tuning for DevelopersSQL Database Performance Tuning for Developers
SQL Database Performance Tuning for Developers
 
Managing Oracle Streams Using Enterprise Manager Grid Control
Managing Oracle Streams Using Enterprise Manager Grid ControlManaging Oracle Streams Using Enterprise Manager Grid Control
Managing Oracle Streams Using Enterprise Manager Grid Control
 
Learning sql from w3schools
Learning sql from w3schoolsLearning sql from w3schools
Learning sql from w3schools
 
dbadapters
dbadaptersdbadapters
dbadapters
 
Accessing data with android cursors
Accessing data with android cursorsAccessing data with android cursors
Accessing data with android cursors
 
Accessing data with android cursors
Accessing data with android cursorsAccessing data with android cursors
Accessing data with android cursors
 
Mds cdc implementation
Mds cdc implementationMds cdc implementation
Mds cdc implementation
 
Cdc
CdcCdc
Cdc
 
SQL report
SQL reportSQL report
SQL report
 
Sq lite module8
Sq lite module8Sq lite module8
Sq lite module8
 
Ms sql server ii
Ms sql server  iiMs sql server  ii
Ms sql server ii
 
PPT of Common Table Expression (CTE), Window Functions, JOINS, SubQuery
PPT  of Common Table Expression (CTE), Window Functions, JOINS, SubQueryPPT  of Common Table Expression (CTE), Window Functions, JOINS, SubQuery
PPT of Common Table Expression (CTE), Window Functions, JOINS, SubQuery
 
Sql DML
Sql DMLSql DML
Sql DML
 
Sql DML
Sql DMLSql DML
Sql DML
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007
 
Simple ado program by visual studio
Simple ado program by visual studioSimple ado program by visual studio
Simple ado program by visual studio
 
Simple ado program by visual studio
Simple ado program by visual studioSimple ado program by visual studio
Simple ado program by visual studio
 

Último

Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 

Último (20)

Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Incremental load

  • 1. SSIS Design Pattern - Incremental Loads Introduction Loading data from a data source to SQL Server is a common task. It's used in Data Warehousing, but increasingly data is being staged in SQL Server for non-Business- Intelligence purposes. Maintaining data integrity is key when loading data into any database. A common way of accomplishing this is to truncate the destination and reload from the source. While this method ensures data integrity, it also loads a lot of data that was just deleted. Incremental loads are a faster and use less server resources. Only new or updated data is touched in an incremental load. When To Use Incremental Loads Use incremental loads whenever you need to load data from a data source to SQL Server. Incremental loads are the same regardless of which database platform or ETL tool you use. You need to detect new and updated rows - and separate these from the unchanged rows. Incremental Loads in Transact-SQL I will start by demonstrating this with T-SQL: 0. (Optional, but recommended) Create two databases: a source and destination database for this demonstration: CREATE DATABASE [SSISIncrementalLoad_Source] CREATE DATABASE [SSISIncrementalLoad_Dest] 1. Create a source named tblSource with the columns ColID, ColA, ColB, and ColC; make ColID is a primary unique key: USE SSISIncrementalLoad_Source GO CREATE TABLE dbo.tblSource (ColID int NOT NULL ,ColA varchar(10) NULL ,ColB datetime NULL constraint df_ColB default (getDate()) ,ColC int NULL ,constraint PK_tblSource primary key clustered (ColID)) 2. Create a Destination table named tblDest with the columns ColID, ColA, ColB, ColC: USE SSISIncrementalLoad_Dest GO CREATE TABLE dbo.tblDest (ColID int NOT NULL ,ColA varchar(10) NULL
  • 2. ,ColB datetime NULL ,ColC int NULL) 3. Let's load some test data into both tables for demonstration purposes: USE SSISIncrementalLoad_Source GO -- insert an "unchanged" row INSERT INTO dbo.tblSource (ColID,ColA,ColB,ColC) VALUES(0, 'A', '1/1/2007 12:01 AM', -1) -- insert a "changed" row INSERT INTO dbo.tblSource (ColID,ColA,ColB,ColC) VALUES(1, 'B', '1/1/2007 12:02 AM', -2) -- insert a "new" row INSERT INTO dbo.tblSource (ColID,ColA,ColB,ColC) VALUES(2, 'N', '1/1/2007 12:03 AM', -3) USE SSISIncrementalLoad_Dest GO -- insert an "unchanged" row INSERT INTO dbo.tblDest (ColID,ColA,ColB,ColC) VALUES(0, 'A', '1/1/2007 12:01 AM', -1) -- insert a "changed" row INSERT INTO dbo.tblDest (ColID,ColA,ColB,ColC) VALUES(1, 'C', '1/1/2007 12:02 AM', -2) 4. You can view new rows with the following query: SELECT s.ColID, s.ColA, s.ColB, s.ColC FROM SSISIncrementalLoad_Source.dbo.tblSource s LEFT JOIN SSISIncrementalLoad_Dest.dbo.tblDest d ON d.ColID = s.ColID WHERE d.ColID IS NULL This should return the "new" row - the one loaded earlier with ColID = 2 and ColA = 'N'. Why? The LEFT JOIN and WHERE clauses are the key. Left Joins return all rows on the left side of the join clause (SSISIncrementalLoad_Source.dbo.tblSource in this case) whether there's a match on the right side of the join clause (SSISIncrementalLoad_Dest.dbo.tblDest in this case) or not. If there is no match on the right side, NULLs are returned. This is why the WHERE clause works: it goes after rows where the destination ColID is NULL. These rows have no match in the LEFT JOIN, therefore they must be new. This is only an example. You occasionally find database schemas that are this easy to load. Occasionally. Most of the time you have to include several columns in the JOIN ON clause to isolate truly new rows. Sometimes you have to add conditions in the WHERE clause to refine
  • 3. the definition of truly new rows. Incrementally load the row ("rows" in practice) with the following T-SQL statement: INSERT INTO SSISIncrementalLoad_Dest.dbo.tblDest (ColID, ColA, ColB, ColC) SELECT s.ColID, s.ColA, s.ColB, s.ColC FROM SSISIncrementalLoad_Source.dbo.tblSource s LEFT JOIN SSISIncrementalLoad_Dest.dbo.tblDest d ON d.ColID = s.ColID WHERE d.ColID IS NULL 5. There are many ways by which people try to isolate changed rows. The only sure-fire way to accomplish it is to compare each field. View changed rows with the following T-SQL statement: SELECT d.ColID, d.ColA, d.ColB, d.ColC FROM SSISIncrementalLoad_Dest.dbo.tblDest d INNER JOIN SSISIncrementalLoad_Source.dbo.tblSource s ON s.ColID = d.ColID WHERE ( (d.ColA != s.ColA) OR (d.ColB != s.ColB) OR (d.ColC != s.ColC) ) This should return the "changed" row we loaded earlier with ColID = 1 and ColA = 'C'. Why? The INNER JOIN and WHERE clauses are to blame - again. The INNER JOIN goes after rows with matching ColID's because of the JOIN ON clause. The WHERE clause refines the resultset, returning only rows where the ColA's, ColB's, or ColC's don't match and the ColID's match. This is important. If there's a difference in anyor some or all the rows (except ColID), we want to update it. Extract-Transform-Load (ETL) theory has a lot to say about when and how to update changed data. You will want to pick up a good book on the topic to learn more about the variations. To update the data in our destination, use the following T-SQL: UPDATE d SET d.ColA = s.ColA ,d.ColB = s.ColB ,d.ColC = s.ColC FROM SSISIncrementalLoad_Dest.dbo.tblDest d INNER JOIN SSISIncrementalLoad_Source.dbo.tblSource s ON s.ColID = d.ColID WHERE ( (d.ColA != s.ColA) OR (d.ColB != s.ColB) OR (d.ColC != s.ColC) ) Incremental Loads in SSIS Let's take a look at how you can accomplish this in SSIS using the Lookup Transformation (for the join functionality) combined with the Conditional Split (for the WHERE clause conditions) transformations. Before we begin, let's reset our database tables to their original state using the following
  • 4. query: USE SSISIncrementalLoad_Source GO TRUNCATE TABLE dbo.tblSource -- insert an "unchanged" row INSERT INTO dbo.tblSource (ColID,ColA,ColB,ColC) VALUES(0, 'A', '1/1/2007 12:01 AM', -1) -- insert a "changed" row INSERT INTO dbo.tblSource (ColID,ColA,ColB,ColC) VALUES(1, 'B', '1/1/2007 12:02 AM', -2) -- insert a "new" row INSERT INTO dbo.tblSource (ColID,ColA,ColB,ColC) VALUES(2, 'N', '1/1/2007 12:03 AM', -3) USE SSISIncrementalLoad_Dest GO TRUNCATE TABLE dbo.tblDest -- insert an "unchanged" row INSERT INTO dbo.tblDest (ColID,ColA,ColB,ColC) VALUES(0, 'A', '1/1/2007 12:01 AM', -1) -- insert a "changed" row INSERT INTO dbo.tblDest (ColID,ColA,ColB,ColC) VALUES(1, 'C', '1/1/2007 12:02 AM', -2) Next, create a new project using Business Intelligence Development Studio (BIDS). Name the project SSISIncrementalLoad:
  • 5. Once the project loads, open Solution Explorer and rename Package1.dtsx to SSISIncrementalLoad.dtsx: When prompted to rename the package object, click the Yes button. From the toolbox, drag a
  • 6. Data Flow onto the Control Flow canvas: Double-click the Data Flow task to edit it. From the toolbox, drag and drop an OLE DB Source onto the Data Flow canvas:
  • 7. Double-click the OLE DB Source connection adapter to edit it:
  • 8. Click the New button beside the OLE DB Connection Manager dropdown:
  • 9. Click the New button here to create a new Data Connection:
  • 10. Enter or select your server name. Connect to the SSISIncrementalLoad_Source database you created earlier. Click the OK button to return to the Connection Manager configuration dialog. Click the OK button to accept your newly created Data Connection as the Connection Manager you wish to define. Select "dbo.tblSource" from the Table dropdown:
  • 11. Click the OK button to complete defining the OLE DB Source Adapter. Drag and drop a Lookup Transformation from the toolbox onto the Data Flow canvas. Connect the OLE DB connection adapter to the Lookup transformation by clicking on the OLE DB Source and dragging the green arrow over the Lookup and dropping it. Right-click the Lookup transformation and click Edit (or double-click the Lookup transformation) to edit:
  • 12. When the editor opens, click the New button beside the OLE DB Connection Manager dropdown (as you did earlier for the OLE DB Source Adapter). Define a new Data Connection - this time to the SSISIncrementalLoad_Dest database. After setting up the new Data Connection and Connection Manager, configure the Lookup transformation to connect to "dbo.tblDest":
  • 13. Click the Columns tab. On the left side are the columns currently in the SSIS data flow pipeline (from SSISIncrementalLoad_Source.dbo.tblSource). On the right side are columns available from the Lookup destination you just configured (from SSISIncrementalLoad_Dest.dbo.tblDest). Follow the following steps: 1. We'll need all the rows returned from the destination table, so check all the checkboxes beside the rows in the destination. We need these rows for our WHERE clauses and for our
  • 14. JOIN ON clauses. 2. We do not want to map all the rows between the source and destination - we only want to map the columns named ColID between the database tables. The Mappings drawn between the Available Input Columns and Available Lookup Columns define the JOIN ON clause. Multi-select the Mappings between ColA, ColB, and ColC by clicking on them while holding the Ctrl key. Right-click any of them and click "Delete Selected Mappings" to delete these columns from our JOIN ON clause. 3. Add the text "Dest_" to each column's Output Alias. These rows are being appended to the data flow pipeline. This is so we can distinguish between Source and Destination rows farther down the pipeline:
  • 15. Next we need to modify our Lookup transformation behavior. By default, the Lookup operates as an INNER JOIN - but we need a LEFT (OUTER) JOIN. Click the "Configure Error Output" button to open the "Configure Error Output" screen. On the "Lookup Output" row, change the Error column from "Fail component" to "Ignore failure". This tells the Lookup transformation "If you don't find an INNER JOIN match in the destination table for the Source table's ColID value, don't fail." - which also effectively tells the Lookup "Don't act like an INNER JOIN, behave like a LEFT JOIN":
  • 16. Click OK to complete the Lookup transformation configuration. From the toolbox, drag and drop a Conditional Split Transformation onto the Data Flow canvas. Connect the Lookup to the Conditional Split as shown. Right-click the Conditional Split and click Edit to open the Conditional Split Editor:
  • 17. Expand the NULL Functions folder in the upper right of the Conditional Split Transformation Editor. Expand the Columns folder in the upper left side of the Conditional Split Transformation Editor. Click in the "Output Name" column and enter "New Rows" as the name of the first output. From the NULL Functions folder, drag and drop the "ISNULL( <<expression>> )" function to the Condition column of the New Rows condition:
  • 18. Next, drag Dest_ColID from the columns folder and drop it onto the "<<expression>>" text in the Condition column. "New Rows" should now be defined by the condition "ISNULL( [Dest_ColID] )". This defines the WHERE clause for new rows - setting it to "WHERE Dest_ColID Is NULL". Type "Changed Rows" into a second Output Name column. Add the expression "(ColA != Dest_ColA) || (ColB != Dest_ColB) || (ColC != Dest_ColC)" to the Condition column for the Changed Rows output. This defines our WHERE clause for detecting changed rows - setting it to "WHERE ((Dest_ColA != ColA) OR (Dest_ColB != ColB) OR (Dest_ColC != ColC))".
  • 19. Note "||" is used to convey "OR" in SSIS Expressions: Change the "Default output name" from "Conditional Split Default Output" to "Unchanged Rows":
  • 20. Click the OK button to complete configuration of the Conditional Split transformation. Drag and drop an OLE DB Destination connection adapter and an OLE DB Command transformation onto the Data Flow canvas. Click on the Conditional Split and connect it to the OLE DB Destination. A dialog will display prompting you to select a Conditional Split Output (those outputs you defined in the last step). Select the New Rows output:
  • 21. Next connect the OLE DB Command transformation to the Conditional Split's "Changed Rows" output: Your Data Flow canvas should appear similar to the following:
  • 22. Configure the OLE DB Destination by aiming at the SSISIncrementalLoad_Dest.dbo.tblDest table:
  • 23. Click the Mappings item in the list to the left. Make sure the ColID, ColA, ColB, and ColC source columns are mapped to their matching destination columns (aren't you glad we prepended "Dest_" to the destination columns?):
  • 24. Click the OK button to complete configuring the OLE DB Destination connection adapter. Double-click the OLE DB Command to open the "Advanced Editor for OLE DB Command" dialog. Set the Connection Manager column to your SSISIncrementalLoad_Dest connection manager:
  • 25. Click on the "Component Properties" tab. Click the elipsis (button with "...") beside the SQLCommand property:
  • 26. The String Value Editor displays. Enter the following parameterized T-SQL statement into the String Value textbox: UPDATE dbo.tblDest SET ColA = ? ,ColB = ? ,ColC = ?
  • 27. WHERE ColID = ? The question marks in the previous parameterized T-SQL statement map by ordinal to columns named "Param_0" through "Param_3". Map them as shown below - effectively altering the UPDATE statement for each row to read: UPDATE SSISIncrementalLoad_Dest.dbo.tblDest SET ColA = SSISIncrementalLoad_Source.dbo.ColA ,ColB = SSISIncrementalLoad_Source.dbo.ColB ,ColC = SSISIncrementalLoad_Source.dbo.ColC WHERE ColID = SSISIncrementalLoad_Source.dbo.ColID Note the query is executed on a row-by-row basis. For performance with large amounts of data, you will want to employ set-based updates instead.
  • 28. Click the OK button when mapping is completed. Your Data Flow canvas should look like that pictured below:
  • 29. If you execute the package with debugging (press F5), the package should succeed and appear as shown here:
  • 30. Note one row takes the "New Rows" output from the Conditional Split, and one row takes the "Changed Rows" output from the Conditional Split transformation. Although not visible, our third source row doesn't change, and would be sent to the "Unchanged Rows" output - which is simply the default Conditional Split output renamed. Any row that doesn't meet any of the predefined conditions in the Conditional Split is sent to the default output.