USEFUL SITES :
Write for Us
Note: In this article where reference is made to “SSIS 2008” it refers to the SSIS version with SQL Server 2008 whereas “SSIS 2005” refers to the SSIS version with SQL Server 2005. Lookup Transformation Enhancement Before covering further details on enhancement of Lookup transformation, below is a brief explantion of what Lookup transformation is and how it was until SQL Server 2005. If Lookup transformation in SSIS 2005 is already known, then jump to the next article on Lookup Transformation in SSIS 2008. Lookup Transformation in SSIS 2005 The Lookup transformation performs lookups by joining (using equi-join, a process where each row in the transformation input must match at least one row from the reference dataset) data in input columns with columns in a reference dataset/table. The reference dataset can be an existing table or view, a new table or the result of an SQL statement. If there is no matching entry in the reference dataset, no join occurs and no values are returned from the reference dataset. This is an error, and the transformation fails, unless it is configured to ignore errors or redirect error rows to the error output. If there are multiple matches in the reference table, the lookup returns only the first match based on the lookup query. Note:
The Lookup transformation uses an OLE DB connection manager to connect to the database that contains the data that is the source of the reference dataset. The Lookup transformation has the following inputs and outputs:
Note: By default, the Lookup transformation treats rows without matching entries as errors. To optimize performance, the lookup transformation may cache the reference table in memory depending on the caching mode being used (part or all of the reference dataset can be specified to be cached to improve performance, by using a caching SQL statement) as discussed below: Full Cache/Pre-Cache – By default, the lookup uses Full cache mode. In this mode, the database is queried once during the pre-execute phase of the data flow. The entire reference set is pulled into memory. Afterward, SQL is not used anymore to process this data. This approach uses a lot of memory, while not needing SQL indexes anymore, having enough RAM to hold all the reference data is crucial. This adds additional startup time for the data flow, as the entire caching takes place before any rows are read from the data flow source(s). The trade off is that the lookup operations will be very fast during execution. Note that the lookup will not swap memory out to disk, so the data flow will fail if memory runs out. When to use this cache mode:
Keys to using this cache mode:
Partial Cache – In this mode, the lookup cache starts off empty at the beginning of the data flow. When a new row comes in, the lookup transform checks its cache for the matching values. If no match is found, it queries the database. If the match is found at the database, the values are cached to be used the next time a matching row comes in. Since no caching is done during the pre-execute phase, the startup time using a partial cache mode is less than it would be for a full cache. However, lookup operations will be slower, as the database is queried more often and a SQL index on the reference table is required for this to perform well. When to use this cache mode:
No Cache – As the name implies, in this mode the lookup transform doesn't maintain a lookup cache (with an exception to the last match around, as the memory has already been allocated). In most situations, this means the database will be hit for every row; and like Partial-Cache mode a SQL index on the reference table is needed for this to perform well. When to use this cache mode:
Keys to using this cache mode
Note:
Example: There are several scenarios where lookup transformation is required, consider a scenario of a sales table containing sales transactions of all the customers belonging from all over the world. If wanting sales records for those customers belonging from a specific region, US, EMEA, ASIA PACIFIC etc; the sales table can be used as input to the lookup transformation and region specific list of customers as the reference table. The output will contain, as expected, sales transactions only for the customer of that specific region. The below example demonstrates this in action. Create a simple package with a data flow task that pulls order records from the source for the selected product only, the package will look something like this; here the input to lookup transformation is order table whereas the product table is used as reference table: To configure the lookup transformation, right click on the Lookup transformation and click on Edit, a lookup transformation editor appear as shown below, it has three tabs, the first viz. “Reference Table” is a place to specify the source of reference table and whether the data will come from table/view or as a result of an SQL query. The middle tab viz. “Columns”, specify the mapping of column(s) between the source table and the reference table, used in matching the records. Here output column(s) are selected from the reference table. The image below shows the column “Name” is selected to be output as a new column viz. “NewName” from the lookup transformation. The last tab viz. “Advanced”, specify whether to use the Partial-Cache or No-Cache feature (By Default “Enable memory restriction” is not checked and hence Full-Caching will be used), also specify what size the cache will be when it is running on 32-bit machine and 64-bit machine (Default value is 5 MB). The final result of running the package, 542 rows are flowing from source to lookup as input, reference table matches with 124 rows, another 418 rows which could not be matched are outputted to Lookup Error Output path. The next article in this series will examine the changes to the Lookup Transformation in SSIS 2008.