Know your Data with Data Profiling

When the OK button is clicked, you will be taken to the Profile Requests page.

We can use the Column Null Ratio Profile Request profiler type as an example.

Since you have to create this from Quick Profile all the fields are filled automatically.  Normally, you will not be looking at the all the columns but by default all columns (*) will be selected. The best option would be to select only the columns required to avoid unnecessary overhead.

Please note that you can have multiple profiles for several tables.

The next step is to execute the task and analyze the results.

Analyzing Profiler Results

If you open the XML file you will see something as below:.

This pure XML so you can use any query language or any other mechanism to analyze this file However, there is an tool for this which ships with SQL Server 2008 called Data Profile Viewer.

After launching this tool you need to open the xml file via this tool and you will see following screen.

By clicking the relevant profiler for the relevant table you see the profiler information.

The below  is the Column Length Distribution. By clicking a column, you can do a detailed analysis on the column length distribution.

For example, if you are anticipating at least 3-4 characters for first name. If you are seen, first name with lesser lengths for substantial number of records there is an issue. In most  data warehousing environments, you will   pass the data if that is in low percentages. So, if percentage is high (which should be defined by analyzing the environment) you should be reject the data set or flag them for future intervention.

Below is the profiler for NULL records.

Similarly to  FirstName, you can define boundary values for NULL and reject data based on that.


Pages: 1 2 3


No comments yet... Be the first to leave a reply!