2. Overview
The project is for a fictitious construction company called AllWorks. We designed
and built a SQL Server 2005 database to track employee and customer
information, timesheet and labor rates data, as well as job order information, job
materials, and customer invoices. This information was previously stored in Excel
Spreadsheets, XML files, and .CSV files.
The objective of the SSIS project was to write ETLs to populate a SQL database
named AllWorksDBStudent from source files in the form of Excel Spreadsheets,
XML files, and .CSV files. The following were the packages prepared.
Employee Master Package
(the starting project contains a blank version of this one)
Employee Rates Package
Client Masters Package
Client Grouping Package
Division Master Package
Client Groupings to Client Xref Master Package
Project (“Job”) Master Package
Job TimeSheets Package
3. A child package was also created to generate an email to
biproject@setfocus.com for the results of every package, including
rows inserted and any errors generated (for example, if data loads
contain child timesheet records with an invalid parent record).
A separate package was created to handle nightly database backups
and another package was created to re-index all files and shrink the
database.
Finally a Master Package was created to call all the above packages.
The packages were scheduled to run nightly at midnight. This job was
created in SQL Server Management Studio.
4. The EmailResultPackage.dtsx Package
This is a common package that is called
from the 1st seven packages.
Cool Feature: It gets its variables about
how many records were affected,
PackageName, Success/Failure etc. set
from the parent package using SSIS ->
Package Configurations -> Parent Package
Variable.
Cool Feature: The email is sent from:
PackageAdmin@setfocus.com
5. The EmployeeMasterPackage.dtsx Package
This package inserts / updates data from
C:SetFocusBISourceDataEmployees.xls,
Employees sheet.
It also has a child package called
EmailResultPackage.dtsx to email results
of running the package.
7. This package inserts / updates data from C:SetFocusBISourceDataEmployees.xls, Employee Rates
sheet.
EmployeePK in the Employee Rates must exist in the Employee Master, else the record is not inserted
into the table and the record is sent to the log file
C:SetFocusBIProjectsStudentVersionSSISStudentProjectLogsEmployeeRatesLog.txt . The log files
are always appended to, not overwritten.
It also has a child package called EmailResultPackage.dtsx to email results of running the package.
8. The ClientMasterPackage.dtsx Package
Thisp c g ins rts/ up a sd tafro
a ka e e d te a m
C tFo us
:Se c BISo e ta lie
urc Da C ntGe g p s ,
o ra hie .xls
C nt Lis
lie tings e
he t.
All c lum fro theXL s e a no lo d dintothe
o ns m he t re t a e
DB.
It a oha ac
ls s hildp c g c lle
a ka e a d
Em ilRe ultP c g .d x toe a re ultso running
a s a ka e ts m il s f
thep c g .
a ka e
10. I have taken the liberty to design this package slightly differently – to showcase
a different way of doing this task, and to learn something new. The next
package which has a similar logic follows a different technique.
Cool Feature: In this package a SQL Command is run to give us the distinct
rows that are needed. So right from the beginning we cut down on a lot of data
being brought in. It is a good optimization technique.
This package inserts / updates data from
C:SetFocusBISourceDataClientGeographies.xls, Special Groupings sheet.
It picks the distinct ClientGroupingPK and its corresponding Name only from that
sheet.
It also has a child package called EmailResultPackage.dtsx to email results of
running the package.
11. The DivisionMasterPackage.dtsx Package
This package inserts / updates data from
C:SetFocusBISourceDataClientGeographies.xls, Division Definitions sheet.
It picks the distinct DivisionPK and its corresponding DivisionName only from
that sheet.
It also has a child package called EmailResultPackage.dtsx to email results
of running the package.
Cool Feature: In this package we will use the AGGREGATE DATA FLOW
ITEM TASK to achieve getting the distinct DivisionPK and its corresponding
DivisionName.
14. This package inserts / updates data from
C:SetFocusBISourceDataClientGeographies.xls, Special Groupings
sheet.
It is manifests a one to many relation of the ClientGrouping and Clients.
The Account Key in the XLS is the ClientPK in Clients Master table.
Before insertion, the package makes sure that the ClientPK and
ClientGroupingPK are valid by checking its existence in their appropriate
master files. Non matching rows are sent to
C:SetFocusBIProjectsStudentVersionSSISStudentProjectLogsClientGro
upingsXClientLog.txt . The log files are always appended to, not
overwritten.
It also has a child package called EmailResultPackage.dtsx to email
results of running the package.
15. The ProjectMasterPackage.dtsx Package
This package inserts / updates data from
C:SetFocusBISourceDataProjectMaster.xls, Project
Master Sheet into the Job Master Table. Project and Job
are synonyms. ProjectID and JobMasterPK mean one and
the same.
Before insertion, the package makes sure that the
ClientPK is valid by checking its existence in the Clients
master file. Non matching rows are sent to
C:SetFocusBIProjectsStudentVersionSSISStudentProje
ctLogsProjectMasterLog.txt . The log files are always
appended to, not overwritten.
It also has a child package called
EmailResultPackage.dtsx to email results of running the
package.
18. This package reads data from all .CSV files in the
C:SetFocusBISourceDatatime directory and processes it. It
expects those files to have Timesheet related data and inserts the
timesheet records in the JobTimeSheets Table.
Before insertion, the package makes sure that the EmployeePK is
valid by checking its existence in the Employees master table. Non
matching rows are sent to
C:SetFocusBIProjectsStudentVersionSSISStudentProjectLogsPr
ojectTimesheetLog.txt . The log files are always appended to, not
overwritten.
Before insertion, the package also makes sure that the
JobMasterPK is valid by checking its existence in the JobsMaster
table. Non matching rows are sent to
C:SetFocusBIProjectsStudentVersionSSISStudentProjectLogsPr
ojectTimesheetLog.txt . The log files are always appended to, not
overwritten.
19. The EmployeePK, JobMasterPK and the Workdate
combination must be unique for each record to be qualified
for a new insert, else it will update the record.
Cool Feature: It also sends email notification about the
results of running the package. To do this it uses a user
variable to store the intermittent status of each file
processed and appends the new results to it. After all files
are processed in the ForEachLoop, the email is sent with
the complete message with status of processing all files, the
names of the files processed, the # rows inserted / updated
etc. the in the user variable.
Cool Feature: The email is sent from:
PackageAdmin@setfocus.com
20. The MiscTasksPackage.dtsx
This was a quick package written without
much frills just to get data into the County
Master.
23. The MasterControllerPackage.dtsx
This is the Master Controller package that calls 10
other packages which do various tasks like loading
Master tables, Timesheets Table, Database Backup,
ReIndexing and Shrinking DB, etc.
The precedence control is set to 'On Completion' (Blue)
because we want it to execute whether the called
package executes with or without error as the called
package is self contained and does its own error
handling.
Note that the sequence of calling some of the
packages is important.