Association Rule Mining (Apriori Algorithm)
Note: Association Rule Mining is an Unsupervised Machine Learning technique. It is mainly used for Market Basket Analysis to discover relationships between items frequently purchased together.
🟦 Program Aim
Aim:
To implement the Association Rule Mining (Apriori Algorithm) using Python and identify products that are frequently purchased together.
🟩 Algorithm Used
Apriori Algorithm
🟨 Problem Statement
A supermarket wants to analyze customer shopping patterns. By examining previous transactions, the store aims to identify products that are frequently purchased together. This information helps improve product placement, cross-selling, and promotional strategies.
🟪 Step 1: Install Required Library
Install the mlxtend package (only once).
pip install mlxtend
Explanation
-
mlxtendstands for Machine Learning Extensions. - It provides the Apriori algorithm and functions for generating association rules.
🟦 Step 2: Import Required Libraries
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules
Explanation
-
pandas→ Used to create and manipulate data. -
TransactionEncoder→ Converts transaction data into a True/False matrix. -
apriori()→ Finds frequent itemsets. -
association_rules()→ Generates association rules from frequent itemsets.
🟩 Step 3: Create the Transaction Dataset
transactions = [
["Milk", "Bread", "Butter"],
["Milk", "Bread"],
["Milk", "Butter"],
["Bread", "Butter"],
["Milk", "Bread", "Butter", "Eggs"],
["Bread", "Eggs"],
["Milk", "Eggs"]
]
Explanation
Each inner list represents one customer's shopping basket.
| Customer | Purchased Items |
|---|---|
| 1 | Milk, Bread, Butter |
| 2 | Milk, Bread |
| 3 | Milk, Butter |
| 4 | Bread, Butter |
| 5 | Milk, Bread, Butter, Eggs |
| 6 | Bread, Eggs |
| 7 | Milk, Eggs |
🟨 Step 4: Convert Transactions into Binary Format
encoder = TransactionEncoder()
encoded_data = encoder.fit(transactions).transform(transactions)
df = pd.DataFrame(encoded_data, columns=encoder.columns_)
Explanation
The Apriori algorithm requires data in binary (True/False or 1/0) format.
The dataset becomes:
| Bread | Butter | Eggs | Milk |
|---|---|---|---|
| True | True | False | True |
| True | False | False | True |
| False | True | False | True |
| True | True | False | False |
| True | True | True | True |
| True | False | True | False |
| False | False | True | True |
🟦 Step 5: Display the Dataset
print(df)
Explanation
Displays the converted transaction matrix used for mining frequent itemsets.
🟩 Step 6: Find Frequent Itemsets
frequent_items = apriori(df, min_support=0.3, use_colnames=True)
print(frequent_items)
Explanation
-
min_support = 0.3means an itemset must appear in at least 30% of all transactions. -
use_colnames=Truedisplays product names instead of column numbers.
Example Output:
| Support | Itemsets |
|---|---|
| 0.71 | {Milk} |
| 0.71 | {Bread} |
| 0.57 | {Butter} |
| 0.43 | {Eggs} |
| 0.43 | {Milk, Bread} |
| 0.43 | {Milk, Butter} |
🟨 Step 7: Generate Association Rules
rules = association_rules(
frequent_items,
metric="confidence",
min_threshold=0.7
)
print(rules)
Explanation
This step generates association rules using:
- Metric = Confidence
- Minimum Confidence = 70%
Example Rule:
Milk → Bread
Meaning:
Customers buying Milk are likely to buy Bread as well.
🟥 Step 8: Display Selected Columns
print(rules[['antecedents',
'consequents',
'support',
'confidence',
'lift']])
Explanation
This displays the most important measures:
| Antecedent | Consequent | Support | Confidence | Lift |
|---|---|---|---|---|
| Milk | Bread | 0.43 | 0.75 | 1.05 |
| Bread | Butter | 0.43 | 0.60 | 1.04 |
🟪 Complete Python Program
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules
transactions = [
["Milk", "Bread", "Butter"],
["Milk", "Bread"],
["Milk", "Butter"],
["Bread", "Butter"],
["Milk", "Bread", "Butter", "Eggs"],
["Bread", "Eggs"],
["Milk", "Eggs"]
]
encoder = TransactionEncoder()
encoded_data = encoder.fit(transactions).transform(transactions)
df = pd.DataFrame(encoded_data, columns=encoder.columns_)
print("Transaction Dataset")
print(df)
frequent_items = apriori(df,
min_support=0.3,
use_colnames=True)
print("\nFrequent Itemsets")
print(frequent_items)
rules = association_rules(frequent_items,
metric="confidence",
min_threshold=0.7)
print("\nAssociation Rules")
print(rules[['antecedents',
'consequents',
'support',
'confidence',
'lift']])
🟩 Sample Output
Transaction Dataset
Bread Butter Eggs Milk
0 True True False True
1 True False False True
2 False True False True
3 True True False False
4 True True True True
5 True False True False
6 False False True True
Frequent Itemsets
support itemsets
0.71 {Milk}
0.71 {Bread}
0.57 {Butter}
0.43 {Eggs}
0.43 {Milk, Bread}
...
Association Rules
Milk → Bread
Bread → Butter
🟦 Step-by-Step Working of the Algorithm
Transaction Data
│
▼
Convert into Binary Matrix
│
▼
Apply Apriori Algorithm
│
▼
Find Frequent Itemsets
│
▼
Generate Association Rules
│
▼
Display Support, Confidence & Lift
🟨 Important Terms
| Term | Description |
|---|---|
| Support | Frequency of an itemset appearing in all transactions. |
| Confidence | Probability that customers who buy item A also buy item B. |
| Lift | Measures the strength of the relationship between two items. A lift value greater than 1 indicates a positive association. |
| Frequent Itemset | A group of items that appears frequently in the dataset. |
| Association Rule | A rule showing the relationship between two or more items (e.g., Milk → Bread). |
🌍 Real-Life Applications
- 🛒 Market Basket Analysis
- 🛍 Product Recommendation Systems
- 🏪 Store Shelf Arrangement
- 💳 Banking Product Recommendations
- 🎬 Movie Recommendation Systems
- 🌐 E-commerce Websites (Amazon, Flipkart)
- 🍔 Restaurant Combo Offers
🎯 Viva Questions
- What is Association Rule Mining?
- What is the Apriori Algorithm?
- Define Support, Confidence, and Lift.
- What is a Frequent Itemset?
- Why is TransactionEncoder used?
-
What is the purpose of
min_support? -
What is the purpose of
min_thresholdin association rules? - Give two real-life applications of Association Rule Mining.
⭐ One-Line Revision
Association Rule Mining uses the Apriori algorithm to discover frequently occurring item combinations and generate rules such as "If a customer buys Milk, they are also likely to buy Bread."
No comments:
Post a Comment