Script Component as Data Source

Introduction
If you were asked what are the data sources that you can use as a data source for SQL Server Integration Services (SSIS), you may say list the out of the box data sources such as, SQL Server Oracle, DB2 etc and flat files such as Text files and CSV files or Excel. If that is your answer it is correct but only half correct, because you can use data source by using the script component in the SSIS data flow task. This article shows few real world examples of how you can use the script component as a Data Source.

Pre-Requisites
Knowledge-wise you need to have a basic understanding of SSIS packages. If you have created an SSIS package to import traditional text files that will be more than enough. As we are going to do some .NET scripting, you also need to have some understanding of .NET coding.

Resource-wise you need to have SQL Server Business Intelligence Development Studio (BIDS) installed.

File System
If you want to store your file system structure in a table, what would you do? In this example the data source is nothing but your file system. Let’s say you want to save the file name, location, file size, created date and modified date of set of files in a folder using SSIS.

The first step is to create a SSIS project from BIDS and create a new package. Drag and drop a Data Flow task to the control flow pane. Then Double click the data flow task.

Then drag and drop a Script component from the tool box to the Data Flow task. After dragging it, you will see the following screen asking you to select the type of the script component.  

As we are using script as a source, select the first the option which is Source.

The next step is to add the columns that we want to have for the output.

After double clicking the script component you will be taken to the Script Transformation Editor.

In this add columns you want and specify with the correct data type. The following are the column names and their data types that have been added for this example: 
Column Name  Data Type & Length 
 FileName  String 50
 Extension  String 10
 ReadOnly  String 1
 FileLocation  String 150
 CreatedTime  Database timestamp
 ModifiedTime  Database timestamp
 FileSize  Eight-byte signed integer

The next step is to write the. Net script. By selecting the script option and clicking the Design Script button, you will be taken to the .Net editor.  Then you need to add Imports System.IO at the top to import file access functions.  Then you need to write following script.

Public Overrides Sub CreateNewOutputRows()          Dim Root As New DirectoryInfo(“C:Sample Data”) ‘ Directory you want to get data
        Dim Files As FileInfo() = Root.GetFiles(“*.*”, SearchOption.AllDirectories)
        Dim Filename As FileInfo
        For Each Filename In Files
            With FileListBuffer

                .AddRow()
                .FileName = Filename.Name
                .FileLocation = Filename.DirectoryName
                .Extension = Filename.Extension
                If Filename.IsReadOnly Then
                    .ReadOnly = “Y”
                Else
                    .ReadOnly = “N”
                End If

                .CreatedTime = Filename.CreationTime
                .ModifiedTime = Filename.LastWriteTime
                .FileSize = CLng(Filename.Length / 1024) ‘divied by 1024 to get the size in KB

            End With
        Next

    End Sub

If you wish to retirve file informations of the sub folders of the main folder, you can modify above script accordinly. 

For this example I will add a row count which will enable us to see the output we are getting from the script component.  As most of you are aware, we can view the output by adding a data viewer to the data flow path editor. 

Now this data is the same getting data from a SQL Server table or any other typical source.  You can have conditional splits, multicast etc, depending on your requirement to do modify above data set to your need.

For, example if you want to have files which have a file size greater than 50KB, you can include a conditional split with FileSize >= 50 condition.  Date Dimension
Apart from the above example, another place where you can use this method is to generate data for a date dimension. As you are aware, date dimensions have attributes like date, month year etc. Rather than entering thse dates from a script you can use .Net code to do it for you. We can start the package as we did for the previous exampale.  The Only change will be the output columns and the script. Columns name, type and length will be as following table.

 Column Name  Data Type & Length 
 FullDate  Database date
 Month  Four byte signed integer
 Year  Four byte signed integer
 DayofWeek  Four byte signed integer
 DayofYear  Four byte signed integer
 Day  Four byte signed integer

Next is to write the script.  I have written very simple script just to get the basic attributes required. You can modify this script get more attributes depending on your requirement.

                    Dim i As Integer

        Dim dt As Date

        dt = Now()

        While i < 10000
            DateBuffer.AddRow()
            DateBuffer.FullDate = dt.Date()
            DateBuffer.Month = dt.Month
            DateBuffer.Year = dt.Year
            DateBuffer.DayofWeek = dt.DayOfWeek
            DateBuffer.DayofYear = dt.DayOfYear
            DateBuffer.Day = dt.Day
            dt = DateAdd(DateInterval.Day, 1, dt)
            i = i + 1
        End While

As we did for the previous example, let us view the output of this example. Conclusion
I am sure that now if you are asked what the data source that can be used in SSIS are, you will have a very different answer.  Simply it can be anything.

 

 

]]>

Leave a comment

Your email address will not be published.