Shopping Basket Analysis in SQL Server
A famous super market chain in USA once observed that men who are buying beer for weekend tend to buy nappies for their kids. This revelation enabled the chain in increase sales volume and revenue by placing the items in close proximity to eachother. An alternative approach would have been to move the items apart to encourage store exploration.
An online example of this is Amazon – as the above image demonstrates.
Shopping Basket Analysis is mainly used to analyze shopping basket. However, there are other things you can do it from this algorithm. In sports, you can determine the optimum combinations among players.
There two important parameters in shopping basket analysis.
Association Rules are often written as X->Y meaning that whenever X appears Y also tends to appear. X is often referred to as the rule’s Antecedent and Y as the Consequent . Suppose, X and Y appear together in only 10% of the transactions but whenever X appears there is an 80% chance that Y appears. 10% presence of X and Y together is called the Support (or prevalence) of the rule and the 80% chance is called Confidence (or predictability) of the rule.
Support (X) = (Number of times X appears) / N= P(X) Support (XY) = (Number of times X and Y appears) / N = P(X?Y) Confidence ( X -> Y) = Support (XY) / Support (X) = P(X?Y) / P(X) = P(Y/X)
In some articles confidence is stated as importance as well.
SQL Server Implementation
For the demo implementation, the AdventureWorkDW database is used.
As usual, create a SSAS project and add the data source with sufficient permissions. Then add a Data Source views with following tables.
After adding two views (vAssocSeqLineItem and vAssocSeqOrders), you will get the following data source views.
In these two data set, vAssocSeqOrders contains the order data as shown below.
For the other view, vAssocSeqLineItems contains items for each order in which sample is shown below.
Since there is a link between these two data sets, the data sets can be joined by dragging and dropping.
If you are using two tables with defined foreign key constraints, you don’t need to manually create the relationship as it will be created automatically. Since the above are views, you need to manually join them.
After above steps, data mining project solution explorer looks as below.
Let’s create the data mining model by right clicking the mining model and selecting new model. Since we are creating a model from an existing relational database, we need to select the first option as below.
Since we are creating the mining structure from an Association rule, we need to select the Microsoft Association Rule from the list.
After selecting Association Rule from the list, next is to select the data source views. Since we have only one data source view, we have to select that data source view.
The next step is to select Case and Nested tables. If you can remember, in the previous data mining algorithms like clustering etc. we had only one table, Case table to select. However, in this case, there are two tables. Data must be fed to a data mining algorithm as a series of cases that are contained within a case table. However, not all cases can be described by a single row of data. That data will be stored in the Nested table. For more information check https://technet.microsoft.com/en-us/library/ms175659.aspx .
So in the above dialog, the Case table is vAssocSeqOrders while the Nested table is vAssocSeqLineItems.
The next task is to choose Input, Predict and Key columns. Obviously, Order Number Is the key column where the product model will be Input and the Predict column.
After selecting the columns, the next screen will be to select the data types. Both columns are Text data type columns.
As last time, we will select training data using a percentage. We will go with the 30% as testing data set.
With that the Association mode creation is complete. Next is to process the model.
When the model build is completed, the screen should appear as above. After the model is built, next is to view the model using the Mining Model Viewer tab.
From the rules tab, you can find out what are the selling items are most closely associated.
The above image shows that the Touring Tire Tube and Touring Tires are a common combined purchase.
From the item set tab, we can view the number of transactions that has this combination. The Minimum Item Set Size is set to 2, so that you can view meaning full data.
From the above image, you can see that the Mountain Bottle Case and Water Bottles have been sold together 690 times in the selected data set. The below image shows the dependency network of the items which shows the graphical presentation of relationship between items.
Different colors shown different relationships between items.
There is an option to query the data from the model that was processed. From the mining model prediction tab, you can find out the products which are most likely that customers will buy for a given product.
First, select the mining model by clicking the Select Model… button.
Since there is only one Mining Model in this project, you will be selecting that mining model as shown above.
Next is to select singleton query as shown in the below image.
Then, we need to select a product for the value column.
When clicking the button next to Value, you will get the option of selecting a product from the list.
Next is to select the prediction function. Select Predict Association as shown in the below screen. For the Criteria/Argument, you need to select model and number which is 4 in this case. This means we are requesting the model to give us most the frequently selling items with Cycling Cap. Four means we are limiting the list to four.
Results will be as below.
This means, that Sport-100, Water Bottle, Patch Kit and Mountain Tire Tube are the most frequently selling items with Cycling Cap.
Implementation in Excel
Similarly, the shopping basket analysis can be done from Microsoft Excel if you have installed the Microsoft SQL Server 2012 Data Mining Add-in. In Excel you can select, Shopping Basket Analysis, from the Analyze ribbon in the Table tools.
The below, is the data set for which we are going to apply the market basket analysis.
After selecting Shopping Basket Analysis button, you will be taken through a wizard.
In this screen, you have to select the TransactionID which is Order Number. In the Item selection drop down, you can select the column for products. In this scenario, product category is selected. So the target is to find out what are common categories selling together. Also, there is an option selection which you can select the Item Value.
When the analysis is complete, two additional sheets are included to the Excel workbook – namely Shopping Basket Bundled Item and Shopping Basket Recommendation.
Shopping Basket Bundled Item
This sheet gives the most frequently selling items. In addition, it gives additional information like number of sales and sales value. Sales value is also important factor as business users want not only the quantity but also amount.
Shopping Basket Recommendation.
Navigate to the Shopping Basket Recommendation sheet to access the recommendations.
There are a few parameters where you can tweak the given algorithm to suit your data set.
MINIMUM_SUPPORT defines the minimum support requirement that items must to qualify as a frequent item set. If this value set too low, the algorithm may take much longer to process and require much more memory. The default value is 0
MIXIMUM_SUPPORT defines the maximum support threshold of a frequent item set. It defines the maximum support threshold of a frequent item set.
MAXIMUM_ITEMSET_SIZE defines the maximum size of an item set. Reducing the maximum item set size reduce the processing time because the algorithm can save further iterations over the data set when the candidate item set size reaches the limit.
MINIMUM_ITEMSET_SIZE defines the minimum size of an item set.
MAXIMUM_ITEMSET_COUNTspecifies the maximum number of itemsets to produce. If no number is specified, the algorithm generates all possible itemsets.