Know your Data with Data Profiling

When the OK button is clicked, you will be taken to the Profile
Requests
page.

We can use the Column Null Ratio Profile Request profiler
type as an example.

Since you have to create this from Quick Profile all
the fields are filled automatically.  Normally, you will not be looking at the
all the columns but by default all columns (*) will be selected. The best
option would be to select only the columns required to avoid unnecessary overhead.

Please note that you can have multiple profiles for several
tables.

The next step is to execute the task and analyze the results.

Analyzing Profiler Results

If you open the XML file you will see something as below:.

This pure XML so you can use any query language or any
other mechanism to analyze this file However, there is an tool for this which
ships with SQL Server 2008 called Data Profile Viewer.

After launching this tool you need to open the xml file via
this tool and you will see following screen.

By clicking the relevant profiler for the relevant table you see
the profiler information.

The below  is the Column Length Distribution. By clicking a
column, you can do a detailed analysis on the column length distribution.

For example, if you are anticipating at least 3-4 characters
for first name. If you are seen, first name with lesser lengths for substantial
number of records there is an issue. In most  data warehousing
environments, you will   pass the data if that is in low percentages. So, if percentage
is high (which should be defined by analyzing the environment) you should be
reject the data set or flag them for future intervention.

Below is the profiler for NULL records.

Similarly to  FirstName, you can define boundary values for
NULL and reject data based on that.

Continues…

Leave a comment

Your email address will not be published.