Hola~ Welcome back folks! It’s 2018 and today is the most fundamentally solid essential transaction analysis you want to have within your company. I swear I will keep it brief but all covered up.

Market Basket analysis works for both online store and in-store retails. It’s a little bit glimpse of data science but a very basic one. It’s the fundamental of recommendation system. It’s not easy but not too hard with your time and effort. It will be harder later when you want to applied it with advance stuff, but there a promise *– it’s fun. *

We will use R because I personally prefer it more than Python :p. Don’t be scared out of coding, just six lines of code can tell you a lot already, then when you get to have more and more fun with it, your skill will follow. Plus, this article will help you understand things, not to implement thing, but I do provide you some lines of code that I personally perceive them as helpful. If you need tutorial, I recommend you go to Youtube instead.

Let’s start from the basic.

**Know the jargon.**

* Density: *Key metric to use as an indicator on the campaign that encourages a customer to buy more. It’s the percentage of non-empty cells in your table. [Code: summary]

** Support:** in association rules they use ‘Support’ to tell you how frequently an item occurs in the data by percentage. (Code: itemFrequency(dataname[,N1]) multiply it by total number of transaction if you want to see how many times N1 appears)

Note: See for first six items [,1:6]

To look at all items that show up in 20% of transactions use Code: itemFrequencyPlot(dataname, support=0.20)

Show the top 20 support Code: itemFrequencyPlot(dataname, topN=20)

* Confidence:* a measure of the proportion of transactions where the presence of an item or a set of items results in the presence of another set of items (conditional probability). Let me explain it easier – it will tell you that if they buy item A how many times they are likely to buy item B too compares to A with C and A with D…

Let’s try to find the probability for the customer who buy A and B to buy C mathematically.

conf({A,B} -> {C}) = support({A,B,C}) divided by support({A,B}) = In A and B, what proportion in A and B that contain C.

Easier way: [Code: m1 <- apriori(dataname, parameter=list(support=0.07, confidence 0.25, minlen=2) so the sets that meet these rules will now store in m1.] = *any items that have frequency more than 7% and also appear that if they buy it, they are likely to buy another particular products by 25% more of all pairs and we will only look at the transactions which contain 2 or more items.*

**Lift:** The percentage of how much likely the item will be purchase with any other particular item relatives to its general purchase rate. If the lift is 4.70 for product A and B, it means that it’s 470 percent increase or 4.7 times increase of the tendency to purchase product B if customer bought product A rather than the support of B alone.

Here 5 things Market basket analysis can tell you and your boss and how it will benefit your marketing strategy in the future.

**Basic indicators for performance or period analysis**

Total items sold = Density*Total cell or Density*cells*column

Most frequently bought items and how many

Length distribution (the number of item per transaction)

**2. Rule length distribution** for the items with high support and confidence

2 3 4

440 240 45

These mean there are 440 transaction that A implies B, 240 transaction that A, B imply C, 45 transaction that A,B, and C imply D in all the transaction that meet our support and confidence.

** 3. Top support pairs of item (in this example I’ll set top 5)**

Code: inspect(sort(setname, by=”support”)[1:5])

{Milk} => {Bread} support: 0.09 confidence.. lift..

{Milk} => {Cereal} support: 0.08 confidence.. lift..

{Milk} => {Coffee} support: 0.078 confidence.. lift..

{Milk} => {Almond} support: 0.75 confidence.. lift..

{Milk} => {Raisin} support: 0.70 confidence.. lift..

People purchase milk with bread in 9% of all transaction.. and you get the idea right? This will tell you how much likely things going to happen.

** 4 Top confident pairs of item**

Code: inspect(sort(setname, by=”confidence”)[1:5])

{Milk} => {Bread} support: 0.09 confidence: 0.3.. lift..

{Milk} => {Cereal} support: 0.08 confidence: 0.2.. lift..

{Milk} => {Coffee} support: 0.078 confidence:0.05.. lift..

{Milk} => {Almond} support: 0.75 confidence: 0.05.. lift..

{Milk} => {Raisin} support: 0.70 confidence 0.04.. lift..

With any items pair with milk, bread covers 30% of all pairs, cereal covers 20%, coffee covers 5%.. and so on. It shows the relevance of two product in terms of complementary good.

** 5 Top lift pairs of item**

Code: inspect(sort(setname, by=”lift”)[1:5])

Try promote product A, maybe product B sales will increase too. However, no-one knows the results before testing, but at least this analysis can give you many clues, many evidence, so you can plan your next step more confidently. You may want to set the goal that you will increase the support of item A for 20% and see if items B support increases too without any effort on product B and how much does it increase – significantly high or just typical swing.

Experiment and Enjoy!

Article about automation will come soon definitely, let me allocate my time for it, I hope my article helpful for you and your company and thank you for your support. Have a magical 2018! 🙂