Know your Data with Data Profiling
Below is the output for the column statistics profiler which only provides Minimum, Maximum, Mean or Average and Standard Deviations.
From the user point view, they would prefer to have additional statistics such as Variance.
Below is the Column Value distribution profiler.
Pattern profiling is very important since it will identify
what patterns you have and simple coding will not be enough to identify the
patterns of your data.
Above is the pattern of the product number where there are
three types of patterns. If there are different patterns in large percentages you could stop the data loading.
- You cannot have customized profiler techniques hence you
have to live with what SSIS profiler ships with.
- You need to use SQL Server 2000 or a later version. In case
you need to use other sources, you need to use SQL Server as a staging
environment (where you insert to a SQL Server table temporary) and perform profiling on staging data.
- There is no direct way to verify your profile. For
example, let’s say you want to stop data being inserted if null value
percentage is more than 70% in a column. For this, you need to have some knowledge about XML and scripting which we will discuss in later article.
- You cannot profile data which are flowing from the data flow.
You have to insert data to a temporary table and then get that data.
This article describes what
the data profiling control task offers. We will next discuss how
you can use this profiler output along with the script component. Also, since this
control at a basic level, we will explore some other tools available in the
Also, if you are wish to have any
example related to profiling which are not discussed in this article please
feel free to send your thoughts to firstname.lastname@example.org