Know your Data with Data Profiling

Below is the output for the column statistics profiler which only provides Minimum, Maximum, Mean or Average and Standard Deviations. From the user point view, they would prefer to have additional statistics such as Variance.

Below is the Column Value distribution profiler.

Pattern profiling is very important since it will identify what patterns you have and simple coding will not be enough to identify the patterns of your data.

Above is the pattern of the product number where there are three types of patterns. If there are different patterns in large percentages you could stop the data loading.

Constraints

  • You cannot have customized profiler techniques hence you have to live with what SSIS profiler ships with.
  • You need to use SQL Server 2000 or a later version. In case you need to use other sources, you need to use SQL Server as a staging environment (where you insert to a SQL Server table temporary) and perform profiling on staging data.
  • There is no direct way to verify your profile. For example, let’s say you want to stop data being inserted if null value percentage is more than 70% in a column. For this, you need to have some knowledge about XML and scripting which we will discuss in later article.
  • You cannot profile data which are flowing from the data flow. You have to insert data to a temporary table and then get that data.

What next

This article describes what the data profiling control task offers. We will next discuss how you can use this profiler output along with the script component. Also, since this control at a basic level, we will explore some other tools available in the market. 

Also, if you are wish to have any example related to profiling which are not discussed in this article please feel free to send your thoughts to dineshasanka@dbfriend.net

Pages: 1 2 3




Related Articles :

No comments yet... Be the first to leave a reply!

Software Reviews | Book Reviews | FAQs | Tips | Articles | Performance Tuning | Audit | BI | Clustering | Developer | Reporting | DBA | ASP.NET Ado | Views tips | | Developer FAQs | Replication Tips | OS Tips | Misc Tips | Index Tuning Tips | Hints Tips | High Availability Tips | Hardware Tips | ETL Tips | Components Tips | Configuration Tips | App Dev Tips | OLAP Tips | Admin Tips | Software Reviews | Error | Clustering FAQs | Performance Tuning FAQs | DBA FAQs |