Introduction to Data Mining with SQL Server
Data Mining Defined
In order to fully comprehend, much less implement enhanced Decision Support Systems in an organization, it is necessary to understand the basic precepts behind the requirements, methodologies and business drive to obtain information. Information about an organization gathered from various sources and coalesced into a format in which business decision makers and analysts can gain insight into business practices, as well as obtain information to help plot the next course of action for a business, is what data mining makes possible.
Data mining represents the next quantum step beyond the historical and aggregate-based world of information that the data warehouse makes available to users. Data mining allows organizations to collect vital information regarding business processes, customers, sales and marketing and arrange the information in such a fashion as to allow business users to make predictive decisions about what direction the business should focus its resources. This advantage allows business decision makers to “steer” the focus of an organization and facilitate the continued success of the enterprise. Once information gathered by an organization through its business processes can be analyzed in a data mining environment, discoveries can be made which help uncover business and market trends, such as information about customers and how to target business processes to maximize revenue from core customers.
Data mining is not an “intelligence” tool or framework. Business intelligence, typically drawn from an enterprise data warehouse is used to analyze and uncover information about past performance on an aggregate level. Data warehousing and business intelligence provide a method for users to anticipate future trends from analyzing past patterns in organizational data. Data mining is more intuitive, allowing for increased insight beyond data warehousing. An implementation of data mining in an organization will serve as a guide to uncover inherent trends and tendencies in historical information, as well as allow for statistical predictions, groupings and classifications of data.
Typical data warehousing implementations in organizations will allow users to ask and answer questions such as “How many sales were made, by territory, by sales person between the months of May and June in 1999?” Data mining will allow business decision makers to ask and answer questions, such as “Who is my core customer that purchases a particular product we sell?” or “Geographically, how well would a line of products sell in a particular region and who would purchase them, given the sale of similar products in that region?”.
Through the usage of tools like SQL Server 2000 Analysis Services and methodologies such as data warehousing implementations within the Microsoft Data Warehousing Framework, data mining can successfully be implemented and leveraged as a next step towards uncovering and discovering essential business decision data.
Data Mining Methodologies
Data mining is achieved through the use of a data mining model. This model serves as a “sifter” which catches and reveals important information that business decision makers use to obtain predictions regarding future trends. The data mining model is a necessary key component to begin understanding and predicting trends based on gathered business data. Often in organizations, data mining can be used in several business functions, either operationally, or for decision support for focusing business activities. Both of these methodologies require the usage of a data mining model to facilitate these activities.
Closed Loop Data Mining
Closed loop data mining represents an endless cycle in which each part of the process improves on itself. In this closed loop process, each refinement to one section of the loop encourages a refinement in the next section of the loop. In this manner, as the organization progresses with its operations through time, the data mining process refines the information gathered from its operating procedures. These procedures then are transformed into OLAP data through the enterprise data warehouse and are then fed to the data mining model. In this fashion, the trends uncovered through evaluating the organization’s data through the data mining model defines or refines the information gathered by the business processes (for example, an OLTP application). The following example outlines how a business that sells hardware can utilize data mining in a closed loop fashion.
Hardware Sales Example
A business that sells hardware has a transactional application which facilitates the sale of the hardware to organizations and individuals. This application can take the form of a web based interface where users can purchase the hardware via the web in an e-commerce environment, while a call center accepts traditional phone orders for the hardware. Each one of these transactional processes writes to the same OLTP data store. The e-commerce application writes to the data store via n-tier architecture. Similarly, the call center operators use a data entry application which writes to the same data store.
As part of the hardware sales and registration process, information about the customer is gathered at the time of sale. This information includes demographic and geographic information, as well as intended use, company information, etc. This information is extracted nightly from the OLTP data store, transformed and loaded into the organizational data warehouse which subsequently propagates new data to OLAP cubes. The data warehouse and its OLAP cubes then provides users with the ability to analyze data about the purchasers of hardware and uncover important information at an aggregate level about sales, marketing and other areas.
Subsequent to the population of the warehouse, the new data is processed against a data mining model. Business decision makers then utilize the data mining model to uncover information about the focus of the business and attempt to make educated guesses about upcoming trends, product placement and marketing efforts regarding their hardware products.
Similarly, the business decision makers notice that there is missing or bad information regarding a demographic of their customers that is preventing them from making an important prediction or trend analysis. As a result, the managers of application development are instructed to modify the applications (both the e-commerce site and the call center application) to acquire this missing piece of information. Similarly, the data warehouse manager is responsible for altering the warehouse to ensure that this data will be transformed and pushed through a new iteration of the data mining model.