Shopping Basket Analysis in SQL Server
A famous super
market chain in USA once observed that men who are buying beer for weekend tend
to buy nappies for their kids. This revelation enabled the chain in increase sales volume and revenue by placing the items
in close proximity to eachother. An alternative approach would have been to move the items apart to encourage store exploration.
An online example of this is Amazon – as the above image demonstrates.
Shopping Basket Analysis is mainly used to
analyze shopping basket. However, there are other things you can do it from
this algorithm. In sports, you can determine the optimum combinations among
There two important parameters in shopping
Association Rules are often written as
X->Y meaning that whenever X appears Y also tends to appear. X is often
referred to as the rule’s Antecedent and Y as the
. Suppose, X and Y
appear together in only 10% of the transactions but whenever X appears there is
an 80% chance that Y appears. 10% presence of X and Y together is called the
Support (or prevalence)
of the rule and the 80% chance is called
of the rule.
Support (X) = (Number of times X appears) / N= P(X) Support (XY) = (Number of times X and Y appears) / N = P(X?Y) Confidence ( X -> Y) = Support (XY) / Support (X) = P(X?Y) / P(X) = P(Y/X)
In some articles confidence is stated as
importance as well.
SQL Server Implementation
For the demo implementation, the
AdventureWorkDW database is used.
As usual, create a SSAS project and add the
data source with sufficient permissions. Then add a Data Source views with
After adding two views (vAssocSeqLineItem
and vAssocSeqOrders), you will get the following data source views.
In these two data set, vAssocSeqOrders
contains the order data as shown below.
For the other view, vAssocSeqLineItems
contains items for each order in which sample is shown below.
Since there is a link between these two
data sets, the data sets can be joined by dragging and dropping.
If you are using two tables with defined
foreign key constraints, you don’t need to manually create the relationship as
it will be created automatically. Since the above are views, you need to manually
After above steps, data mining project solution
explorer looks as below.
Let’s create the data mining model by right
clicking the mining model and selecting new model. Since we are creating a model from an
existing relational database, we need to select the first option as below.
Since we are creating the mining structure
from an Association rule, we need to select the Microsoft Association Rule from
After selecting Association Rule from the
list, next is to select the data source views. Since we have only one data
source view, we have to select that data source view.
The next step is to select Case and Nested
tables. If you can remember, in the previous data mining algorithms like
clustering etc. we had only one table, Case table to select. However, in this
case, there are two tables. Data must be fed to a data mining algorithm as a
series of cases that are contained within a case table. However, not all cases
can be described by a single row of data. That data will be stored in the
Nested table. For more information check https://technet.microsoft.com/en-us/library/ms175659.aspx
So in the above dialog, the Case table is
vAssocSeqOrders while the Nested table is vAssocSeqLineItems.
The next task is to choose Input, Predict and
Key columns. Obviously, Order Number Is the key column where the product model
will be Input and the Predict column.
After selecting the columns, the next screen
will be to select the data types. Both columns are Text data type columns.
As last time, we will select training data using
a percentage. We will go with the 30% as testing data set.
With that the Association mode creation is
complete. Next is to process the model.
When the model build is completed, the screen
should appear as above. After the model is built, next is to view the model
using the Mining Model Viewer tab.
From the rules tab, you can find out what
are the selling items are most closely associated.
The above image shows that the Touring Tire
Tube and Touring Tires are a common combined purchase.
From the item set tab, we can view the
number of transactions that has this combination. The Minimum Item Set Size is set
to 2, so that you can view meaning full data.
From the above image, you can see that the
Mountain Bottle Case and Water Bottles have been sold together 690 times in
the selected data set.
The below image shows the dependency
network of the items which shows the graphical presentation of relationship between
Different colors shown different
relationships between items.
There is an option to query the data from the
model that was processed. From the mining model prediction tab, you can find
out the products which are most likely that customers will buy for a given
First, select the mining model by clicking
the Select Model… button.
Since there is only one Mining Model in
this project, you will be selecting that mining model as shown above.
Next is to select singleton query as shown
in the below image.
Then, we need to select a product for the
When clicking the button next to Value, you
will get the option of selecting a product from the list.
Next is to select the prediction function.
Select Predict Association as shown in the below screen. For the
Criteria/Argument, you need to select model and number which is 4 in this case.
This means we are requesting the model to give us most the frequently selling items
with Cycling Cap. Four means we are limiting the list to four.
Results will be as below.
This means, that Sport-100, Water Bottle, Patch
Kit and Mountain Tire Tube are the most frequently selling items with Cycling
Implementation in Excel
Similarly, the shopping basket analysis can be
done from Microsoft Excel if you have installed the Microsoft SQL Server 2012 Data
Mining Add-in. In Excel you can select, Shopping
Basket Analysis, from the Analyze ribbon in the Table tools.
The below, is the data set for which we are
going to apply the market basket analysis.
After selecting Shopping Basket Analysis
button, you will be taken through a wizard.
In this screen, you have to select the
TransactionID which is Order Number. In the Item selection drop down, you can
select the column for products. In this scenario, product category is selected.
So the target is to find out what are common categories selling together. Also,
there is an option selection which you can select the Item Value.
When the analysis is complete, two additional
sheets are included to the Excel workbook – namely Shopping Basket Bundled Item
and Shopping Basket Recommendation.
Shopping Basket Bundled Item
This sheet gives the most frequently
selling items. In addition, it gives additional information like number of sales and
sales value. Sales value is also important factor as business users want not
only the quantity but also amount.
Shopping Basket Recommendation.
Navigate to the Shopping Basket Recommendation sheet to access the recommendations.
There are a few parameters where you can
tweak the given algorithm to suit your data set.
MINIMUM_SUPPORT defines the minimum support requirement
that items must to qualify as a frequent item set. If this value set too low,
the algorithm may take much longer to process and require much more memory. The
default value is 0
MIXIMUM_SUPPORT defines the maximum support threshold of
a frequent item set. It defines the maximum support threshold of a frequent
MAXIMUM_ITEMSET_SIZE defines the maximum size of an item set.
Reducing the maximum item set size reduce the processing time because the
algorithm can save further iterations over the data set when the candidate item
set size reaches the limit.
MINIMUM_ITEMSET_SIZE defines the minimum size of an item set.
MAXIMUM_ITEMSET_COUNTspecifies the maximum number of itemsets to
produce. If no number is specified, the algorithm generates all possible