Ssis Best Practices Israel Bi U Ser Group Itay Braun
1. { Integration Services Best
Practices}
Itay Braun
BI and SQL Server Consultant
Email: itay@twingo.co.il
Blog: http://blogs.microsoft.co.il/blogs/itaybraun/
2. New website for SQL Server in Hebrew:
www.sqlserver.co.il
Twingo is looking for experienced BI / SQL
Server developers. At least two years
experience. Please contact
itay@twingo.co.il for more details
If you are looking for employees or looking
for a job, please contact Yossi Elkayam
yelkayam@microsoft.com
3. If it moves – Log it!
Establishing performance baseline
Package Configuration
Lookup Optimization
Data Profiling
Other tips and tricks
5. Used to capture run-time information about
a package
Helps to audit and troubleshoot a package
every time it is run
Integration Services includes the following
log providers:
The Text File log provider (CSV)
The SQL Server Profiler log provider
The SQL Server log provider (sysssislog
table)
The Windows Event log provider
The XML File log provider
6. All tasks share the same basic events
Each task also has unique events
7. Build manually the table and events
Allows better control on the collected data
For Ex.
Row count
Important step was finished
8. Simple SSIS package within the package
Mostly used to response to OnError events
Log and sending email
9. SQL 2008 – sysssislog table
http://technet.microsoft.com/en-
us/library/ms186984.aspx
SQL 2005 – sysdtslog90
http://msdn.microsoft.com/en-
us/library/ms186984(SQL.90).aspx
Analyze:
Total execution time
SSAS partition processing time
Errors and Warnings
Time elapsed between PackageStart and
PackageEnd
10. Don’t forget to monitor the execution of the
ETL jobs.
Use Reporting Services to write simple
reports about the ETL execution process.
11. If it moves – Log it!
Establishing performance baseline
Package Configuration
Lookup Optimization
Data Profiling
Other tips and tricks
13. Processor time
Process / % Processor Time (Total)
sqlservr.exe and dtexec.exe
Do the tasks run in parallel
14. Process / Private Bytes (DTEXEC.exe) –
The amount of memory currently in use by
Integration Services.
Process / Working Set (DTEXEC.exe) – The
total amount of allocated memory by
Integration Services.
SQL Server: Memory Manager / Total Server
Memory: The total amount of memory
allocated by SQL Server.
Memory / Page Reads / sec – Represents to
total memory pressure on the system.
If this consistently goes above 500, the system is
under memory pressure.
15. SSIS Pipeline/ Buffers in use - the number
of pipeline buffers in use throughout the
pipeline.
Buffer Spooled / Buffer Spooled - The
number of buffers spooled to disk. Buffer
spooled has initial value of 0. When it goes
above 0, it indicates that the engine has
started memory swapping.
Rows Read - The number of rows read from
all data sources in total.
Rows Written - The number of rows written
to all data destinations in total.
16. To ensure that Integration Services is
minimally writing to disk, SSIS should only
hit the disk when it reads from the source
and writes to the target.
For SAN / NAS use the vendors
applications
17. SSIS moves data as fast as the network is
able to handle it.
Network Interface / Current Bandwidth: This
counter provides an estimate of current
bandwidth.
Network Interface / Bytes Total / sec: The
rate at which bytes are sent and received over
each network adapter.
Network Interface / Transfers/sec: Tells how
many network transfers per second are
occurring.
If it is approaching 40,000 IOPs, then get another
NIC card and use teaming between the NIC cards.
18. If it moves – Log it!
Establishing performance baseline
Package Configuration
Lookup Optimization
Data Profiling
Other tips and tricks
19. the package needs to know where it is
moving data from and where it is moving
data to
Typically Integration Services packages are
built on a different environment to where
they are intended to be executed in
production.
20. Object which can be configures:
Tasks
Containers
Variables
Connection Managers
Data Flow Components
21. XML Configuration File
Most popular configuration type
Easy deployment
Disadvantage - Path to the .dtsconfig file must
be hard coded within the package
Environment Variable Configuration
Takes the value for a property from whatever is
stored in a named environment vriable
Stores the property path inside the package
and the value outside the package
22. Parent Package Configuration
Fetch a value from a variable in a calling
package
Stores the property path inside the package
and the value outside the package.
Registry Configuration
The value to be applied to a package property
is stored in a registry entry
stores the property path inside the package
and the value outside the package
23. SQL Server Configuration
stored in a SQL Server table.
The table can have any name you like, and can
be in any database on any server that you like.
24. Consider command-line options as an
alternative to configurations
The /SET option used to apply a value to some
property in the package that is being run
The /CONFIGFILE option used to tell the
package to use an XML configuration file, even
if one has not been defined in the package
Configure Only the ConnectionString
Property for Connection Managers
Instead of Servername, initialCatalog,
UserName, Password
Don’t save the password in XML files
25. If it moves – Log it!
Establishing performance baseline
Package Configuration
Lookup Optimization
Data Profiling
Other tips and tricks
26. Use the NOLOCK or TABLOCK hints to
remove locking overhead
To optimize memory usage, SELECT only
the columns you actually need
If possible, perform datetime conversions at
the source or target databases, as it is more
expensive to perform within Integration
Services.
In SQL Server 2008 Integration Services,
there is a new feature of the shared lookup
cache.
27. Commit size 0 is fastest on heap bulk
targets
because only one transaction is committed
If commit size = 0 is not possible, use the
highest possible value of commit size
to reduce the overhead of multiple-batch
writing
Commit size = 0 is a bad idea if inserting
into a Btree
all incoming rows must be sorted at once into
the target Btree
28. Batchsize = 0 is ideal for inserting into a
heap.
For an indexed destination, I recommend
testing between 100,000 and 1,000,000 as
batch size.
Use a commit size of <5000 to avoid lock
escalation when inserting
Use partitions and partition SWITCH
command
More info here: Getting Optimal
Performance with Integration Services
Lookups.
29. If it moves – Log it!
Establishing performance baseline
Package Configuration
Lookup Optimization
Data Profiling
Other tips and tricks
30. New Feature in SSIS 2008
Used to profile the data
Null values
Values distribution
Column length
31. If it moves – Log it!
Establishing performance baseline
Package Configuration
Lookup Optimization
Data Profiling
Other tips and tricks
32. Make data types as narrow as possible so
you will allocate less memory for your
transformation
Watch precision issues when using the
money, float, and decimal types.
money is faster than decimal, and money has
fewer precision considerations than float
33. Do not sort within Integration Services
unless it is absolutely necessary.
In order to perform a sort, Integration Services
allocates the memory space of the entire data
set that needs to be transformed
There are times where using Transact-SQL
will be faster than processing the data in
SSIS.
As a general rule, any and all set-based
operations will perform faster in Transact-SQL.
34. To perform delta detection, you can use a
change detection mechanism such as the
new SQL Server 2008 Change Data
Capture (CDC) functionality
35. Custom logging using event handlers:
http://blogs.conchango.com/jamiethomson/
archive/2005/06/11/SSIS_3A00_-Custom-
Logging-Using-Event-Handlers.aspx
Best Practices for Integration Services
Configurations -
http://msdn.microsoft.com/en-
us/library/cc671628.aspx
Other best practices - http://bi-
polar23.blogspot.com/2007/11/ssis-best-
practices-part-1.html
Be sure to welcome people to the presentation. Start by stating our direction: We will look at the challenges facing IT regarding Mission Critical applications. We will then show how SQL Server 2008 addresses those challenges.