SSIS New Features in SQL Server 2008 – Part 4
Behind the scenes
Backpressure (where source is fast and destination is slow) is an internal memory limiter inside SSIS of SQL Server 2008. If a source or an asynchronous component is too fast (compared to the transformations or destination down the path), the source is suspended when its execution tree gets too many buffers (currently this is fixed at 5 buffers). If the source is slow (i.e. transforms and destinations can process data faster than sources generate it), the back-pressure mechanism does not get involved and sources can run with full speed.
Asynchronous component/transformation and SSIS buffers is discussed in the last article viz. “SQL Server Integration Services – An Inside View”. Unfortunately, in SQL 2005 there were no diagnostics that would tell a user what is the slow part of the flow – the source or the transforms/destination. In SQL Server 2008 if the back-pressure kicks in during package execution, at the end of package execution it reports the total time that the source had to wait because of this mechanism.
Using this information, if the source is the slowest part of the data flow, then the focus is on optimizing the source. E.g. remove unused columns from the query, simplify SQL statements, create indexes, etc and if the source is fast enough and then concentrate on performance of transforms and destinations. If a source has been suspended for several minutes because of slower transformation/destination down the path, then it’s better to stage the data temporarily in mid (somewhere in raw file, which has a proprietary binary format) instead of keeping a database connection open for longer time.
Pipeline Performance – Multicast Transform and Threading in SQL Server 2008
A previous article described execution tree, scope and its association with buffer (for more details refer to the last article viz. “SQL Server Integration Services – An Inside View”). An execution tree is executed by only one thread either by source thread or by worker thread (though one thread may execute multiple execution trees), if a user has a simple package with one or two execution trees, there are only one or two processors (threads) being used, and the package might not benefit from a multiprocessor machine with more than a few processors. Even if users logically split the data flow by using multicast, all output paths of a multicast belong to the same execution tree, and they are executed serially by the SQL Server 2005 SSIS data flow task.
In SQL Server 2008 SSIS, the data flow task has been redesigned to do dynamic scheduling and can now execute multiple components in parallel, even if they belong to the same execution tree. Several threads can work together to do the work that a single thread is forced to do by itself in SQL Server 2005 SSIS. This can give a noticeable speed-up in ETL performance. The below example will outline this:
In the SSIS 2005 data flow task image below, notice two execution trees being created, execution tree 1 starts from the source and ends at the first asynchronous transformation that is “Union All” whereas execution tree 2 starts from Union All and ends at destination. Now the conclusion is only two threads will be assigned to this data flow task during execution even though there are two branches in the first execution tree (and these two branches can be executed in parallel by two parallel threads) after the Lookup transformation.
In the SSIS 2008 data flow task image below, notice the “Execution Tree” has got the new name “Path”. Similar to execution trees of SSIS 2005, here two Paths will be created, Path 1 starts from the source and ends at first asynchronous transformation that is “Union All” whereas Path 2 starts from Union All and ends at destination. Furthermore the Path 1 has been divided in to Sub Path 0 and Sub Path 1. During execution these two sub-paths will get two threads and will execute in parallel.
To achieve a high level of parallelism on a multiprocessor machine:
SQL Server 2005 – add an asynchronous transform to create a new tree (last tree might be executed by one thread and later one by another). For example, insert “Union All” asynchronous transform with one input, though it does not change the outputs, but splits the execution tree into two new trees – each one can be executed on its own processor. The drawback of this approach is, though there are two execution trees to be executed by two threads in parallel, the inclusion of asynchronous transform will require additional memory to copy the data into the new buffers created after the transformation, hence additional overhead.
SQL Server 2008 – data flow pipeline engine can divide a Path (which was called Execution Tree in SSIS 2005) into several Sub Paths – each one can be executed on its own processor/thread. This threading optimization promises to have many more enhancements on performance in other parts of SSIS such as packages with deep parallel execution paths. With increasing performance on most systems, data flow pipeline engine reduces the need for manual configuration of SSIS packages to increase parallelism, and therefore increased developer productivity.
The Multicast transformation in SSIS 2005 uses one input dataset and generates output datasets that are an exact copy of the input dataset which is extremely useful at times and have been used on a number of projects. The downside is that those outputs are synchronous outputs and therefore on the exact same thread as the input dataset. In SSIS 2008 the threading model around SSIS has changed to be more performing and take advantage of today’s multi core, multi-processor boxes. The Multicast transform is an excellent example of this optimization. Each output on the transform is broken down into parallel execution paths and this allows those outputs to go out and get their own engine threads.
In this example (see the image below), data is being retrieved from the source and using Multicast transform to output data to 4 different destinations. Analyse how SSIS executes this package on SSIS 2005 and on SSIS 2008:
Notice in the below table:
Within SSIS 2005 only one execution tree is created to execute the whole data flow operations. As one thread will be allocated to an execution tree, this only thread will pull data from source and upload it onto all four destinations. It’s a waste of hardware resources with a SMP (Symmetric Multi Processor) computer as the single thread is using only one processor. Though, as discussed above, a trick can be applied to break the execution tree in two by inserting an asynchronous transformation in the way but it has overhead of creating new buffers and copying the data to new buffers.
In SSIS 2008, for the data flow task the pipeline divides the single Path into four different sub paths as seen in the table below. The pipeline allocates four threads to work on these four sub paths in parallel and utilizes the hardware resources and improves the performance.
Normally a synchronous output is good as no memory has to swap buffers and it is exceptionally quick. A problem arises if one or more of those outputs to a blocking Asynchronous transform for example Aggregate Transformation. When this occurs processing of the other outputs is halted until the Asynchronous transform completes. This can damage performance. Until SQL Server 2005, a new buffer had to be forced on the outputs of the multicast by adding a UNION ALL transform with only one input. This had the effect of generating a new execution tree and therefore engine thread.
The final artilce in the series will explore VSTA support for the Script Task and Script Component, Import And Export Wizard Enhancements and how SSIS leverages SQL Server 2008 features.