SSIS – An Inside View Part 1
This article is part 1 of a 4 part series that explores the internals of SQL Server Integration Services (SSIS).
The last article discussed advanced features and properties of SSIS; namely Event Logging, Event Handlers, Transaction Support, Checkpoint Restart-ability and SSIS validation process. This article, will discuss SSIS architecture, SSIS buffer management, types of transformation, execution tree and parallel processing. Comprension of these things will enhance understanding of the internals of SSI. For troubleshooting SSIS issues, especially performance issues, exceptions, etc; it is important to understand some high level concepts with respect to SSIS architecture. Note:
This article, where reference is made to “SSIS 2008” it indicates the SSIS version with SQL Server 2008 whereas “SSIS 2005” indicates the SSIS version with SQL Server 2005.
SSIS is a component of SQL Server 2005/2008 and is the successor of DTS (Data Transformation Services) which formed part of SQL Server 7.0/2000. From an end-user perspective DTS and SSIS appear similar, however they are quite different. SSIS has been completely written from the scratch (it is a new enterprise ETL product) and overcomes several limitations of DTS. Though the list of differences between DTS and SSIS is quite large, one thing to note is the internal architecture of SSIS is completely different from DTS. It has segregated the Data Flow Engine from the Control Flow Engine or SSIS Runtime Engine; designed to achieve a high degree of parallelism and improve the overall performance (see the architecture image below).
The SSIS architecture consists of two main components as given below: SSIS Runtime Engine – The SSIS runtime engine handles the control flow of a package. It saves the layout of packages, runs packages and provides support for logging, breakpoints, configuration, connections and transactions. The run-time engine is a parallel control flow engine that coordinates the execution of tasks or units of work within SSIS and manages the engine threads that carry out those tasks. The SSIS runtime engine executes the tasks inside a package in an orderly fashion. When the runtime engine encounters a data flow task in a package during execution it creates a data flow pipeline and lets that data flow task run in the pipeline. Note:
The Integration Services service (a windows service) is not the same as the SSISruntime engine/service. It is not required if only the design and execute Integration Services packages are wanted. This windows service can be started to manage SSIS packages, for example to connect to multiple SSIS servers, start/stop package remotely/locally, manage the package store, import/export packages etc. SSIS Data Flow Engine/Pipeline – SSIS Data Flow Engine or Data Flow Pipeline or Transformation pipeline engine manages the flow of data from data sources, through transformations, and on to destination targets. When the Data Flow task executes, the SSIS data flow engine extracts data from one or more data sources, performs any necessary transformations on the extracted data and then delivers the data to one or more destinations. The Data flow engine is buffer oriented architecture (more details will be discussed in a later section), it pulls data from the source and stores it in a buffer (memory structure) and does the transformation in buffer/memory itself instead of processing on a row-by-row basis. The benefit of this in-memory processing is that processing is much faster as there is no need to physically copy/stage the data at each step of the data integration; the data flow engine manipulates data as it is transferred from source to destination. Note:
Both of the above discussed components are heavily influenced by conditions external to SSIS, such as network bandwidth and interaction with external systems such as database servers, FTP servers, or email servers.
Discussions about other components of SSIS architecture viz. SSIS Designer, Import/Export Wizard, Command line utilities, SSIS API, Event Handlers, and Connection Managers are available in the two articles called SQL Server Integration Services – An Introduction and SQL Server Integration Services – Features and Properties, refer to them for more details. The next article in this series on SSIS Internals will explore the SSIS Transformation and Execution Tree.]]>