Recently, I was called upon to convert a file from an external client that contained 5 million records at 497 bytes per record. Unfortunately, the “out of the box” version of Microsoft SQL Server 2005 Business Intelligence Development Studio (BIDS) does not support a conversion routine to properly transform between two different codepages e.g. EBCDIC (37) and ASCII (1252). After some persistent searching I was able to find a sample custom component on Microsoft’s website called “CodePageConvert”. Once this msi is properly compiled and added to the menu of data flow transformations, you should be able to transform data between two codepages.
To give you a little background, the machine that is currently executing this process is running Windows Server 2003 with SP2 and a Dual 2.66 GHz Intel Xeon X5355 with 3840MB of memory. The package calls an external file server for the raw data file and writes data to an OLE DB connection (SQL Server database) running Microsoft SQL Server 2000 on another external file server.
Ideally, I wanted to be able to process this data more efficiently with less strain on hardware, execution time and table size. With no performance tuning in place the process takes approximately 9 minutes to complete the entire data conversion. The raw file is close to 2.5GB and the resultant table size is 3.3GB. I knew it had to be feasible to do better than that, and if at all possible I wanted to significantly reduce both execution time and table size.
Excluding unnecessary data– This limits the number of bytes being processed by the SSIS Engine and overall footprint on SQL Server.
I saw that we only needed just a few of the fields that have been sent from the external client. After a thorough analysis of our data needs, I decided to remove 10 columns from the package with a total savings of 254 bytes per record. This leaves us with a total of 243 useable bytes per record (497 [initial] – 254 [excluded] = 243 [useable]). Obviously, this modification represents a significant reduction in process time and database footprint.
Increasing the insert commit size –This allows a user to set a specific buffer commit size for loading into SQL Server. This setting is available on the OLE DB Destination Editor when using the SQL Server OLE DB driver. By default, SSIS specifies a commit size of 0; meaning that SSIS will attempt to load all rows in a single batch which has a negative effect for especially large files. By increasing the insert commit size to… say… 1 million records, it breaks the data up into 5 smaller commits which does not strain the processor as much.
Buffer Tuning– Increasing the DefaultMaxBufferRows for the Data Flow task allows a user decrease the number of buffers moving through the data flow.
SSIS multiplies the estimated row size (in my case it would be 243 bytes) by the DefaultMaxBufferRows to gauge the size of the dataset for every 10,000 records. The DefaultMaxBufferRows should not be increased too significantly because this will cause the execution engine to swap out buffers to disk; therefore defeating the purpose all together.
Raw data location –Moving the raw client data to the server hosting SSIS should improve the overall data access time. Now SSIS will not have to reach across the network to grab data from the raw file. Also this decreases the chance for error, such as network connection issues.
SQL server location –Moving the destination SQL server to the same server hosting SSIS improves the write time to SQL Server, also decreasing the chance for error.
OLE DB to SQL server destination –For those of you already running a full fledged 2005 data warehouse, you are probably already utilizing the performance advantages of a 2005 SQL server destination. Unlike the existing solution, which was writing to a SQL 2000 server, using the SQL server destination will significantly improve the write time for data through SSIS into SQL server.
Significant performance gains were achieved by implementing the above procedures. The greatest improvements showed to be excluding unnecessary data and changing the OLE DB Destination to a SQL Server Destination for SQL Server 2005. As I stated before, the existing process took a total of almost 9 minutes to initialize, execute and load 5 million records to SQL Server. After implementing data exclusion, buffer tuning, placing the raw data on the application server, loading data to the application server and changing the OLE DB destination to a SQL Server Destination for SQL Server 2005, I was able to obtain a performance savings of 4 minutes and 21 seconds of execution time and a storage savings of 543,204 KB.
No Enhancements 8:54.141 3,
– add Data Exclusion 6:32.828 2,857,168 KB
– add Increase Insert Commit 6:30.750 2,857,168 KB
– add Buffer tuning 6:20.078 2,857,168 KB
– add Raw data move 6:09.063 2,857,168 KB
– add SQL Server move 6:07.718 2,857,168 KB
– add Destination type change 4:33.344 2,790,188 KB Summary: If you are looking for ways to enhance your performance for data conversion using Microsoft’s SQL Server 2005 Integration Services there are a number of methodologies that can be applied. When converting large EBCDIC files, a few performance enhancing techniques should be employed in the interest of significantly reducing execution time all while using less storage. For the example above, the highest performance gains were achieved by changing the existing OLE DB destination (a 2000 SQL server) to a SQL Server Destination (a 2005 SQL server), and by excluding unwanted data from the data flow.
No two scenarios will likely perform exactly the same; but trying all or a combination of these practices should render great results.
Best of luck!
 CodePageConvert is an SSIS dataflow component that translates from and to any code page or Unicode character representations. Downloadable at:http://www.microsoft.com/ SQL Server 2005 SSIS: Tuning the Dataflow Task by Kirk Haseldenhttp://www.simple-talk.com/
SQL Server 2005 SSIS: Tuning the Dataflow Task by Kirk Haseldenhttp://www.simple-talk.com/