Introduction to Data Mining with SQL Server

How Effective is the Data Mining Model?

Evaluating the Data Mining Model

Data mining model evaluation is an integral process in producing a reliable data mining solution. Now that the mining model has been trained, it is necessary to gage the effectiveness and accuracy of the training against a set of data with a known outcome, pattern or grouping. By creating an evaluation case set, model developers can ensure the functionality of the model by comparing its results with those of a predefined set of evaluation data.

Subsequent to the evaluation, a second training set will be applied to correct any unexpected predictions, groupings or classifications. This can exist as an iterative process, until the model begins to display accurate results against an evaluation set. Once the mining model begins to respond and produce predictions consistent with the predefined outcome, the data model will exist as an accurate view of historical data, adding insight into the future.

The best way to ensure proper refinement of the model is to initiate prediction queries against the evaluation case set when it is applied to the data model. A successful matching of these query result sets with the known outcomes form the evaluation case set indicates a proper refinement of the model.

Effectiveness Measurement

Based on the results of the evaluation set, there are multiple mathematical methods utilized in calculating the statistical effectiveness of the data mining model. These calculations are based on the data mining model’s ability to produce reliable prediction results in relation to the actual data.

Utilizing statistical and mathematical methods to obtain measures like Accuracy (percentage of total predictions that returned the correct value), Error Rate (percentage of predictions that were incorrect), Mean Squared Error (variance between predicted and actual squared), Lift (percentage of expected predictions/over random selections), and Profit (ROI) all contribute to effectiveness measurement. In addition to these calculations, if an oversampled model is being evaluated, it is necessary to make corrective modifications to the aforementioned calculations, due to the pre-selection of a subset of data sharing a certain characteristic.

Using visualization tools such as the Analysis Services Data Mining Model Browser and the Dependency Network Browser, developers can gain access to the constructed model and visually examine the statistical information housed within the model. These tools allow for the physical examination of decision trees, groupings and relationships, as well as clustering and classification of key data elements.

Data Mining’s Role in Business Intelligence

Where Business Intelligence is Going

For many enterprises, business intelligence solutions are comprised of an enterprise data warehouse and may include data marts along with some type of reporting application. The reporting application can be third party and multidimensional in nature (using multidimensional OLAP cubes), or can rely on relational data access entirely. In any case, the enterprise data warehouse has become a commonplace necessity in many organizations.

Many organizations are realizing that true decision support only begins with the enterprise data warehouse initiative. The enterprise data warehouse has enormous value to the organization by arranging operational data into meaningful information, which can be acted upon by the business. However, much of the information required for proactive activities of an enterprise cannot be accommodated simply through organized views of historical data.

Data mining is the next cycle of business intelligence. It allows a business to empirically navigate the business to profitability, while simultaneously setting the focus of the organization and adding insight into its processes, customers and products. Data mining allows the business to free the this information that is inherent in their operational data and present analysis, users, decision makers and other business processes through applications, with true decision support.

References

  • SQL Server 200 Books Online Product Documentation
  • SQL Server Resource Kit Chapter 24 – Effective Strategies for Data Mining
  • MSDN SQL Server Newsgroup Discussion Forum

Published with the express written permission of the author. Copyright 2002 by the author.

]]>

Leave a comment

Your email address will not be published.