Home
Articles
Forums
Tips
Training
FAQ's
Blogs
Software
Books
About Us
RSS Feeds
Sign in
|
Join
Article Topics
All Articles
Performance Tuning
Audit
Business Intelligence
Clustering
Reporting Services
SQL Azure
Developer
General DBA
ASP.NET / ADO.NET
SQL Azure
USEFUL SITES :
ASP.NET Tutorials
Windows and SQL Azure Tutorials
Cloud Hosting Magazine
SharePoint Tutorials
Windows Server Help
Write for Us
Share your SQL Server knowledge with others and raise your profile in the community
More...
Latest Articles
A High Level Comparison Between Oracle and SQL Server - Part ...
A High Level Comparison Between Oracle and SQL Server - Part ...
A High Level Comparison Between Oracle and SQL Server - Part ...
A High Level Comparison Between Oracle and SQL Server
More
Latest FAQ's
Add Node to A SQL Server failover Cluster failed with invalid ...
SQL Server Destination remote server error
Setting Up Data And Log Files For SQL Server
Will Check Constraints Improve Database Performance?
More
Latest Software Reviews
dbForge Review
Spotlight on ApexSQL Diff - Server-based database comparison tool ...
Spotlight on ApexSQL Data Diff - Server-based database comparison tool ...
Spotlight on ApexSQL Doc 2008
More
articles
>>
business intelligence
>>
SSIS New Features in SQL Server 2008 ...
SSIS New Features in SQL Server 2008 - Part 4
By :
Arshad Ali
Feb 16, 2009
Behind the scenes
Backpressure (where source is fast and destination is slow) is an internal memory limiter inside SSIS of SQL Server 2008. If a source or an asynchronous component is too fast (compared to the transformations or destination down the path), the source is suspended when its execution tree gets too many buffers (currently this is fixed at 5 buffers). If the source is slow (i.e. transforms and destinations can process data faster than sources generate it), the back-pressure mechanism does not get involved and sources can run with full speed.
Note:
Asynchronous component/transformation and SSIS buffers is discussed in the last article viz. “SQL Server Integration Services - An Inside View”. Unfortunately, in SQL 2005 there were no diagnostics that would tell a user what is the slow part of the flow - the source or the transforms/destination. In SQL Server 2008 if the back-pressure kicks in during package execution, at the end of package execution it reports the total time that the source had to wait because of this mechanism.
Using this information, if the source is the slowest part of the data flow, then the focus is on optimizing the source. E.g. remove unused columns from the query, simplify SQL statements, create indexes, etc and if the source is fast enough and then concentrate on performance of transforms and destinations. If a source has been suspended for several minutes because of slower transformation/destination down the path, then it’s better to stage the data temporarily in mid (somewhere in raw file, which has a proprietary binary format) instead of keeping a database connection open for longer time.
Pipeline Performance - Multicast Transform and Threading in SQL Server 2008
A previous article described execution tree, scope and its association with buffer (for more details refer to the last article viz. “SQL Server Integration Services - An Inside View”). An execution tree is executed by only one thread either by source thread or by worker thread (though one thread may execute multiple execution trees), if a user has a simple package with one or two execution trees, there are only one or two processors (threads) being used, and the package might not benefit from a multiprocessor machine with more than a few processors. Even if users logically split the data flow by using multicast, all output paths of a multicast belong to the same execution tree, and they are executed serially by the SQL Server 2005 SSIS data flow task.
In SQL Server 2008 SSIS, the data flow task has been redesigned to do dynamic scheduling and can now execute multiple components in parallel, even if they belong to the same execution tree. Several threads can work together to do the work that a single thread is forced to do by itself in SQL Server 2005 SSIS. This can give a noticeable speed-up in ETL performance. The below example will outline this:
In the SSIS 2005 data flow task image below, notice two execution trees being created, execution tree 1 starts from the source and ends at the first asynchronous transformation that is “Union All” whereas execution tree 2 starts from Union All and ends at destination. Now the conclusion is only two threads will be assigned to this data flow task during execution even though there are two branches in the first execution tree (and these two branches can be executed in parallel by two parallel threads) after the Lookup transformation.
In the SSIS 2008 data flow task image below, notice the “Execution Tree” has got the new name “Path”. Similar to execution trees of SSIS 2005, here two Paths will be created, Path 1 starts from the source and ends at first asynchronous transformation that is “Union All” whereas Path 2 starts from Union All and ends at destination. Furthermore the Path 1 has been divided in to Sub Path 0 and Sub Path 1. During execution these two sub-paths will get two threads and will execute in parallel.
Note:
To achieve a high level of parallelism on a multiprocessor machine:
SQL Server 2005
– add an asynchronous transform to create a new tree (last tree might be executed by one thread and later one by another). For example, insert "Union All" asynchronous transform with one input, though it does not change the outputs, but splits the execution tree into two new trees - each one can be executed on its own processor. The drawback of this approach is, though there are two execution trees to be executed by two threads in parallel, the inclusion of asynchronous transform will require additional memory to copy the data into the new buffers created after the transformation, hence additional overhead.
SQL Server 2008
– data flow pipeline engine can divide a Path (which was called Execution Tree in SSIS 2005) into several Sub Paths - each one can be executed on its own processor/thread. This threading optimization promises to have many more enhancements on performance in other parts of SSIS such as packages with deep parallel execution paths. With increasing performance on most systems, data flow pipeline engine reduces the need for manual configuration of SSIS packages to increase parallelism, and therefore increased developer productivity.
Multicast transformation
The Multicast transformation in SSIS 2005 uses one input dataset and generates output datasets that are an exact copy of the input dataset which is extremely useful at times and have been used on a number of projects. The downside is that those outputs are synchronous outputs and therefore on the exact same thread as the input dataset. In SSIS 2008 the threading model around SSIS has changed to be more performing and take advantage of today's multi core, multi-processor boxes. The Multicast transform is an excellent example of this optimization. Each output on the transform is broken down into parallel execution paths and this allows those outputs to go out and get their own engine threads.
In this example (see the image below), data is being retrieved from the source and using Multicast transform to output data to 4 different destinations. Analyse how SSIS executes this package on SSIS 2005 and on SSIS 2008:
Notice in the below table:
Within SSIS 2005 only one execution tree is created to execute the whole data flow operations. As one thread will be allocated to an execution tree, this only thread will pull data from source and upload it onto all four destinations. It’s a waste of hardware resources with a SMP (Symmetric Multi Processor) computer as the single thread is using only one processor. Though, as discussed above, a trick can be applied to break the execution tree in two by inserting an asynchronous transformation in the way but it has overhead of creating new buffers and copying the data to new buffers.
In SSIS 2008, for the data flow task the pipeline divides the single Path into four different sub paths as seen in the table below. The pipeline allocates four threads to work on these four sub paths in parallel and utilizes the hardware resources and improves the performance.
Note:
Normally a synchronous output is good as no memory has to swap buffers and it is exceptionally quick. A problem arises if one or more of those outputs to a blocking Asynchronous transform for example Aggregate Transformation. When this occurs processing of the other outputs is halted until the Asynchronous transform completes. This can damage performance. Until SQL Server 2005, a new buffer had to be forced on the outputs of the multicast by adding a UNION ALL transform with only one input. This had the effect of generating a new execution tree and therefore engine thread.
The final artilce in the series will explore VSTA support for the Script Task and Script Component, Import And Export Wizard Enhancements and how SSIS leverages SQL Server 2008 features.
C# Help and Tutorials
|
PHP MySQL Tutorial
|
Sharepoint Tutorial
|
Azure Tutorial
|
Cloud Hosting Magazine
|
ASP.NET Tutorials
|
Windows Server Help
|
Windows Phone Pro
|
Silverlight Ace
|
Visual Studio Tutorials
|
Home
|
Peformance Articles
|
Audit Articles
|
Business Intelligence Articles
|
Clustering Articles
|
Developer Articles
|
Reporting Services Articles
|
DBA Articles
|
ASP.NET / ADO.NET Articles
|
SQL Server Training Videos
|
DBA FAQ's
|
Developer Peformance FAQ's
|
DBA Peformance FAQ's
|
Developer FAQ's
|
Clustering FAQ's
|
Error Messages
|
Audit Tool Reviews
|
Backup Tool Reviews
|
Coding Tool Reviews
|
Compare Tool Reviews
|
Documentation Tool Reviews
|
Design Tool Reviews
|
Monitoring Tool Reviews
|
Log Tool Reviews
|
Reporting Tool Reviews
|
Clustering Tool Reviews
|
Security Tool Reviews
|
Change Management Tool Reviews
|
Remote Access Tool Reviews
|
Book Reviews
|
Security Tool Reviews
|
ADO.NET / ASP.NET
|
Administration
|
Analysis/OLAP Services
|
Application Development
|
Configuration
|
Components
|
ETL
|
Hardware
|
High Availability
|
Hints
|
Index
|
Misc
|
Operating Systems
|
Performance Tuning
|
Replication
|
T-SQL
|
Views
© 2010 Jude O'Kelly. All rights reserved