Using The Data Profiler Task and FTP Task in SQL Server 2008 Integration Services

Introduction

SQL Server Integration Services (SSIS) is a
Business Intelligence tool which can be used by database developers or
administrators to perform Extract, Transform & Load (ETL) operations. In my
previous article entitled Using Web Service and XML Task in
SQL Server 2008 Integration Services
I discussed how to use the Web Service Task and XML Task which are available
in SQL Server 2008 Integration Services. In
this article I will examine the use of the Data Profiling Task and FTP Task.

Data Preparation Tasks in SSIS

The
data preparation tasks within SSIS can be used to retrieve data or to validate
the quality of data. The different types of data preparation tasks which are
available in SSIS are the File
System Task
, Web Service Task, XML Task, Data
Profiling Task and FTP Task. If you are new to SQL Server Integration Services I
would recommend you to start with my previous article entitled SQL Server 2008 Integration Services Tasks to
understand different types of Integration Services Tasks which are available in
SQL Server 2008 Integration Services.

Overview of the Data
Profiling Task

The Data Profiling Task is a
new task which was introduced with SQL Server 2008 Integration Services. This
task can be used by SSIS developers to easily and effectively profile through
source system to understand the quality of data before it can be finally loaded
to a data warehouse using the ETL process. There are eight inbuilt data
profiles available within Data Profiling Task. Among the eight profiles
available, five can be used to analyze individual columns and the remaining
three can be used to analyze multiple columns or relationships existing between
columns and tables. The Data Profiling Task internally uses the ADO.Net
connection to connect with SQL Server. The Data Profiling Task returns an XML
output which can either be saved as an XML file or to an SSIS variable. If you
are storing the returning XML output to an XML file you can view the XML file
at a later stage using Data Profiler Viewer utility which is available at “C:Program FilesMicrosoft SQL
Server100DTSBinnDataProfileViewer.exe”.

Below are the
five profiles which can be used to analyze individual columns.

  1. Column Length
    Distribution
  2. Column Pattern
  3. Column NULL
    Ratio
  4. Column
    Statistics
  5. Column Value
    Distribution

Below are the
three profiles which can be used to analysis multiple columns or relationships
between columns and tables.

  1. Candidate Key
  2. Functional Dependency
  3. Value Inclusion

Worked example using the Data Profiling Task

We will run through an example using
Column NULL Ratio” to profile the dbo.DimCustomer table of AdventureWorksDW2008 database. Using Column
Null Ratio you can identify the percentage of Null Values within selected
columns.

1.    Create a new SQL
Server Integration Services Project and rename the default package DataProfilingTaskExample.dtsx

2.    Double click the DataProfilingTaskExample.dtsx
package to open it in design mode.

3.    Create a new folder
named DataProfiling within the C Drive (the XML output from the Data
Profiling Task will be saved in this folder).

4.    Right click within
the Connection Managers window and select New File Connections…. This will open
up File Connection Manager Editor and provide the details as shown in the
snippet below and click OK to save the File Connection.


5.       Drag and drop the Data
Profiling Task
from the toolbox to the Control Flow window and rename it   “DimCustomer
– Data Profiling Task
” , then double click the Data Profiling Task
to open   the editor window as shown in the snippet below.

Within the Data
Profiling Task Editor, click on General tab and provide the details as shown in
the snippet above.

6.    Click  the Quick
Profile… button which will open  the  “Single Table Quick Profile Form” dialog
where you can quickly configure the Data Profiling Task. Create an ADO.Net
Connection and then choose the table or view against which you want to use “Column
Null Ration Profile”. Finally, click OK to save the changes.

7.    Select the Profile
Request Page on the left of the panel to view   detailed information related to
the Column Null Ration Profile, in this screen you can even add any new Profiles
or make changes to the default settings for the current profile. Click OK to
save the changes to the Data Profiling Task.

8.    Finally, to execute
the package, right click the DataProfilingTaskExample.dtsx package from
the solution explorer and select the Execute Package option from the drop down
list. Once the package has successfully executed you will be able to see the
below screen within the control flow designer.

9.    Once the package
has successfully executed you will be able to see theDimCustomerDataProfilingResults.XML
files within the C:DataProfiling folder. This file can be viewed using the
Data Profiler Viewer utility which is available at “C:Program
FilesMicrosoft SQL Server100DTSBinnDataProfileViewer.exe”

 

Continues…

Pages: 1 2




Array

No comments yet... Be the first to leave a reply!

Software Reviews | Book Reviews | FAQs | Tips | Articles | Performance Tuning | Audit | BI | Clustering | Developer | Reporting | DBA | ASP.NET Ado | Views tips | | Developer FAQs | Replication Tips | OS Tips | Misc Tips | Index Tuning Tips | Hints Tips | High Availability Tips | Hardware Tips | ETL Tips | Components Tips | Configuration Tips | App Dev Tips | OLAP Tips | Admin Tips | Software Reviews | Error | Clustering FAQs | Performance Tuning FAQs | DBA FAQs |