This article is part 2 of a 5 part series that explores the SSIS features introduced in SQL Server 2008. Focussing on Lookup Transformations in SSIS 2008.
Lookup transformations are one of the most widely used transformations in the SSIS; Microsoft have recognised this and thus have improved it better performance. For example:
- In SQL Server 2005, the Lookup component could only get data from specific OLEDB connections, the cache could be populated only by using a SQL query (now it can be from a cache file). In SQL Server 2008, source data can be used from almost anywhere, such as from text files, raw files, XML files, Excel files, proprietary-format binary files, web services, a record-set, or anything that can be accessed using an ADO.Net provider.
- SQL Server 2005 reloads the cache every time it is used. In SQL Server 2008, a cache can be used in multiple lookups thus solving the problem where each iteration in a loop needs to repopulate the cache. The cache can be saved to virtual memory or permanent file storage. Therefore within the same package, multiple Lookup components can share the same cache. The cache can also be saved to a file and shared with other packages. If more than one package needs to access the same reference data, it can be loaded in a persistent cache and shared by all the packages, rather than be loaded and unloaded for each package. The cache file format is optimized for speed and access to it can be orders of magnitude faster than reloading the reference dataset from the original relational source. Also SQL Statements can be built dynamically within an OLE DB Command that can now change at execution-time where the data is stored in the cache. For this purpose, use Cache Transform transformation which writes data from a connected data source in the data flow to a Cache connection manager (only unique rows). Configure the Cache connection manager to save the data to a cache file (.caw) which can be referenced / shared by lookup transformations of other data flow tasks or by other packages as well.
- In SQL Server 2005, the lookup transformation has only two outputs, one regular output through which matching records pass and another one is error output through which all non-matching records and records with error pass. SQL Server 2008 made a distinction between these two. It now has three outputs, one regular output for “Matching” records; second one for “No Matching” records and third one for records with error.
If the Lookup transformation is configured to treat the rows without matching entries as errors, the rows are redirected to the error output instead of No Matching output.
- Aside from the above, other improvements are: miss-cache feature, which saves time by optionally loading into cache the key values that have no matching entries in the reference dataset and can contribute up to a 40% performance improvement in some scenarios, a more intuitive user interface that simplifies the configuration of the Lookup component, in particular the caching options, Optimized I/O routines leading to faster cache loading and lookup operations etc.
- If the reference database is remote, or under heavy load, consider using the Cache Connection Manager instead of an OLEDB connection.
- Once a cache is used (or created) in an SSIS package, it will be kept in memory until the package has finished executing. The cache can be reused across multiple data flows, and shared between multiple lookups in the same data flow. It can also be persisted to disk, and reused across package executions.
- When the cache is saved to a file, the system loads the cache faster. This improves the performance of the Lookup transformation and the package. Remember, when using a cache file, the data being worked on is not as current as the data in the database.
In summary, with SQL Server 2008 the Lookup transformation has been improved to allow explicit control over the lookup data, a new breed of connection manager viz. Cache Connection Manager has been introduced to store cache to file and share cached lookup data among different components and packages and finally the more intuitive UI for designing Lookup transformation.
The below example uses the same package as discussed in the previous section, however this time using SSIS 2008 with all three outputs of the lookup transformation. Right click on the Lookup transformation and click on Edit, a new more intuitive lookup transformation editor will come up as shown below, it has now five tabs than three tabs in previous version. On General, specify the caching mode, the connection type to use for reference table and how to handle non matching records. On Advance tab, can specify the size of the cache for 32-bit and 64-bit machines (Default value is 25 MB).
So the final result of running the package, 542 rows are flowing from source to the lookup as input, reference table matches with 124 rows which passes through “Match Output”, another 418 rows which could not be matched are passed through “No Match Output” path. Since there are no erroneous records, there are no records passing through “Error Output”.