SSIS New Features in SQL Server 2008 – Part 5
VSTA support for Script Task and Script Component
Unlike SQL Server 2005, SQL Server 2008 provides VSTA (Visual Studio Tools for Applications) environment for writing Script Tasks and Script Components instead of VSA (Visual Studio for Applications) environment. VSTA includes all the standard features of the Visual Studio environment, such as the color-coded Visual Studio editor, IntelliSense, and Object Browser and debugging features like breakpoints, watch/auto/locals windows. The script that the Script component uses is stored in the package definition. When designing the package, the script code is temporarily written to a project file that includes an auto-generated class, named ScriptMain, which represents the component metadata, once built and IDE closed, the content (compiled) is saved as part of the package definition and the temporary files will be removed from the disk.
VSTA is a richer environment for extending applications; taking reference to a DLL from any location was not possible with VSA, and the Add Reference dialog box has all the tabs of VS IDE – .NET, COM, Projects, Browse, and Recent. The Add Web Reference allows references to a web service right inside the SSIS pipeline instead of creating a proxy class, registering the class on the machine, and then referencing that class in the script in VSA.
It also has support for C#, providing more options than writing code in just Visual Basic .NET.
Auto migration – Existing script components in SQL Server 2005 will automatically invoke the new VSTA environment, which makes the migration of old script components seamless.
In SSIS 2005, if the Script Task is right clicked, a Script Task Editor will appear as shown below, here notice even though there is a combo-box to select scripting language; it has only one option, “Microsoft Visual Basic .Net”.
Clicking on “Design Script” button, the old time VSA (Visual Studio for Applications) environment will appear to write the script in Microsoft Visual Basic .Net.
In SSIS 2008, right clicking on the Script Task, a Script Task Editor will appear as shown below, here notice there is a combo-box which allows selection of scripting language for the task; the possible scripting language options are “Microsoft Visual Basic 2008” and “Microsoft Visual C# 2008”. Notice the Script tab has got priority over General tab and became the first tab.
Click on “Design Script” button, the new VSTA (Visual Studio Tool for Applications) environment will appear to write the script in selected scripting language that is either in “Microsoft Visual Basic 2008” or “Microsoft Visual C# 2008”. In the example below the environment for writing script task is “Microsoft Visual C# 2008” language.
When upgrading the packages from SSIS 2005, most of the changes are done automatically the package is opened in VSTA IDE; however, in script task or component references need to be changed manually from IDTSxxx90 to IDTSxxx100.
Import And Export Wizard Enhancements
Import And Export Wizard is smarter, improved with the following features:
Data Type Conversion – New page added in the wizard which pops up if there is any mismatch in column data types between source and destination (as shown below), double clicking on row will bring up another pop up providing details of source and destination data type mismatch and XML mapping file being used. Providing a choice to include a Data Conversion Transformation or not in the created package:
As a Data Conversion Transformation is selected for includsion, below is the created package design:
Better scaling – Large numbers of tables will be split across multiple data flow tasks (up to five tables per data flow task). For example if taking seven tables, two data flow tasks will be created in the package, one data flow task will contain five data flow for five tables and second data flow task will contain two data flow for last two tables.
ADO .Net Support – ADO .Net support is not available in Import and Export wizard, however it is available SSIS designer. SSIS 2008 now has an ADO.Net source (replacing the DataReader source) and destination adapters. The function is similar to the OLEDB ones, supporting a custom UI, drop down list of tables/views, and query builder. Its properties may contain expressions.
SSIS leverages SQL Server 2008 features
Change Data Capture
Change Data Capture (CDC) is designed to capture insert, update and delete activity on a SQL table and place the information into a separate relational table. It uses an asynchronous capture mechanism that reads the SQL transaction logs and populates the CDC table with the row’s data which change. The CDC table mirrors the column structure of the tracked table, together with metadata regarding the type of changes (Insert, Update and Delete) etc. This feature is entrenched in SQL transaction log architecture and thus a lot of the metadata in CDC is related around the concept of a Log Sequence Number (LSN). Every record in the SQL transaction log is uniquely identified by a LSN. LSNs are ordered such that if LSN2 is greater than LSN1, the change described by the log record referred to by LSN2 occurred after the change described by the log record LSN1.
CDC is a useful and effective method to get incremental data from source to data warehouse, especially when there is no primary key and delta column (UpdateDate) on source table, without taking any alternate approaches that include timestamp columns, triggers, or complex queries often hurt performance and increase complexity.
SQLServerAgent must be running for CDC to work. To enable CDC, we need to enable CDC for a database and then enable CDC for a given table. Once enabled, several system tables, SPs, DMVs, UDFs will be created to get the details of changes.
Does a CDC table continue growing? It doesn’t, there is an automatic cleanup process that occurs every three days (which is configurable). For more intense environments the manual method can be leveraged using the system stored procedure: sys.sp_cdc_cleanup_change_table. When this system procedure is executed, the low LSN is specificied and any change records occurring before this point are removed and the start_lsn is set to the low LSN we specified.
MERGE SQL statement
The word UPSERT is a fusion of the words UPDATE and INSERT. The UPSERT command inserts rows that don’t exist and updates the rows that do exist. In SQL Server 2008, Microsoft introduces the UPSERT functionality through the MERGE SQL command. With the introduction of the MERGE SQL command, developers can more effectively handle common data warehousing scenarios, such as checking whether a row exists, and then executing an insert or update or delete.
The MERGE statement merges data from a source result set to a target table based on a condition specified if the data from the source already exists in the target or not. The new SQL command combines the sequence of conditional INSERT and UPDATE commands in a single atomic statement, depending on the existence of a record. The merging operation can be any of insert, update or delete based on a match or a non match of rows between the source and target.
THEN <merge_matched> ]
[WHEN [TARGET] NOT MATCHED
THEN <merge_not_matched> ]
[WHEN SOURCE NOT MATCHED
THEN <merge_ matched> ];
The MERGE statement works as separate insert, update, and delete statements all within the same statement. Specify a “Source” record set and a “Target” table, and the join between the two. Then specify the type of data modification to occur when the records between the two data are matched or are not matched. MERGE is useful, especially when loading data warehouse tables, that can be very large and require specific actions to be taken when rows are or are not present.
MERGE statement takes same kind of locks minus one Intent Shared (IS) Lock that was due to the select statement in the ‘IF EXISTS’ as in previous version of SQL Server.
Microsoft’s vision to make SSIS an Enterprise ETL tool, seems to have eventuated as SSIS 2008 has set an ETL world record of loading 1 TB of data in around 30 minutes. Thanks to the SSIS team for making a great enterprise ETL tool.
This article series talked of SQL Server Integration Services (SSIS), a platform for building high performance data integration and workflow solutions. How it achieves high performance by segregating data flow engine from Runtime engine and by doing multithreading (allowing multiple executables/data flow to run in parallel).
Lookup Transformation http://msdn.microsoft.com/en-us/library/ms141821.aspx
Data Profiling Task http://msdn.microsoft.com/en-us/library/bb895263(SQL.100).aspx
Pipeline Performance http://technet.microsoft.com/en-us/library/cc278097(SQL.100).aspxVSTA
VSTA Support for Script Task and Component http://technet.microsoft.com/en-us/library/bb522527(SQL.100).aspx
MERGE Statement http://msdn2.microsoft.com/en-us/library/bb510625(SQL.100).aspx
Change Data Capture http://technet.microsoft.com/en-us/library/bb522489(SQL.100).aspx
SQL Server Books Online (BOL) / MSDN