If you were asked what are the data sources that you can use as a data source for SQL Server Integration Services (SSIS), you may say list the out of the box data sources such as, SQL Server Oracle, DB2 etc and flat files such as Text files and CSV files or Excel. If that is your answer it is correct but only half correct, because you can use data source by using the script component in the SSIS data flow task. This article shows few real world examples of how you can use the script component as a Data Source.
Knowledge-wise you need to have a basic understanding of SSIS packages. If you have created an SSIS package to import traditional text files that will be more than enough. As we are going to do some .NET scripting, you also need to have some understanding of .NET coding.
Resource-wise you need to have SQL Server Business Intelligence Development Studio (BIDS) installed.
If you want to store your file system structure in a table, what would you do? In this example the data source is nothing but your file system. Let’s say you want to save the file name, location, file size, created date and modified date of set of files in a folder using SSIS.
The first step is to create a SSIS project from BIDS and create a new package. Drag and drop a Data Flow task to the control flow pane. Then Double click the data flow task.Then drag and drop a Script component from the tool box to the Data Flow task. After dragging it, you will see the following screen asking you to select the type of the script component.
As we are using script as a source, select the first the option which is Source.
The next step is to add the columns that we want to have for the output.
After double clicking the script component you will be taken to the Script Transformation Editor.In this add columns you want and specify with the correct data type. The following are the column names and their data types that have been added for this example:
|Column Name||Data Type & Length|
|FileSize||Eight-byte signed integer|
The next step is to write the. Net script. By selecting the script option and clicking the Design Script button, you will be taken to the .Net editor. Then you need to add Imports System.IO at the top to import file access functions. Then you need to write following script.
Public Overrides Sub CreateNewOutputRows()
Dim Root As New DirectoryInfo(“C:Sample Data”) ‘ Directory you want to get data
Dim Files As FileInfo() = Root.GetFiles(“*.*”, SearchOption.AllDirectories)
Dim Filename As FileInfo
For Each Filename In Files
.FileName = Filename.Name
.FileLocation = Filename.DirectoryName
.Extension = Filename.Extension
If Filename.IsReadOnly Then
.ReadOnly = “Y”
.ReadOnly = “N”
.CreatedTime = Filename.CreationTime
.ModifiedTime = Filename.LastWriteTime
.FileSize = CLng(Filename.Length / 1024) ‘divied by 1024 to get the size in KB
If you wish to retirve file informations of the sub folders of the main folder, you can modify above script accordinly.
For this example I will add a row count which will enable us to see the output we are getting from the script component. As most of you are aware, we can view the output by adding a data viewer to the data flow path editor.
Now this data is the same getting data from a SQL Server table or any other typical source. You can have conditional splits, multicast etc, depending on your requirement to do modify above data set to your need.
For, example if you want to have files which have a file size greater than 50KB, you can include a conditional split with FileSize >= 50 condition.
Apart from the above example, another place where you can use this method is to generate data for a date dimension. As you are aware, date dimensions have attributes like date, month year etc. Rather than entering thse dates from a script you can use .Net code to do it for you. We can start the package as we did for the previous exampale. The Only change will be the output columns and the script. Columns name, type and length will be as following table.
|Column Name||Data Type & Length|
|Month||Four byte signed integer|
|Year||Four byte signed integer|
|DayofWeek||Four byte signed integer|
|DayofYear||Four byte signed integer|
|Day||Four byte signed integer|
Next is to write the script. I have written very simple script just to get the basic attributes required. You can modify this script get more attributes depending on your requirement.
Dim i As Integer
Dim dt As Date
dt = Now()
While i < 10000
DateBuffer.FullDate = dt.Date()
DateBuffer.Month = dt.Month
DateBuffer.Year = dt.Year
DateBuffer.DayofWeek = dt.DayOfWeek
DateBuffer.DayofYear = dt.DayOfYear
DateBuffer.Day = dt.Day
dt = DateAdd(DateInterval.Day, 1, dt)
i = i + 1
As we did for the previous example, let us view the output of this example.
I am sure that now if you are asked what the data source that can be used in SSIS are, you will have a very different answer. Simply it can be anything.