Monday, June 29, 2026

🌳 Decision Tree in Machine Learning Using Python

🌳 Decision Tree in Machine Learning

🎯 Aim

Aim:
To implement the Decision Tree Classification Algorithm using Python and predict whether a customer is eligible for a Loan Approval based on their Age.

📖 Problem Statement

A bank has customer records containing Age and Loan Approval Status.

The bank wants to predict whether a new customer will get a loan based on the customer's age.

🟦 Step 1: Import Required Library


from sklearn.tree import DecisionTreeClassifier

🔍 Explanation

sklearn → Scikit-Learn library used for Machine Learning.
tree → Module containing Decision Tree algorithms.
DecisionTreeClassifier → Used for solving classification problems.

🟩 Step 2: Create the Training Dataset


X = [
    [22],
    [25],
    [35],
    [40],
    [28],
    [50]
]

🔍 Explanation

X represents the Independent Variable (Input Feature).

Here, the feature is Age.

Customer	Age
1	22
2	25
3	35
4	40
5	28
6	50

The model learns patterns from these age values.

🟨 Step 3: Create the Target Variable


y = [
    "Reject",
    "Reject",
    "Approve",
    "Approve",
    "Reject",
    "Approve"
]

🔍 Explanation

y represents the Dependent Variable (Target Output).

Age	Loan Status
22	Reject
25	Reject
35	Approve
40	Approve
28	Reject
50	Approve

The model learns the relationship between Age and Loan Status.

🟪 Step 4: Create the Decision Tree Model


model = DecisionTreeClassifier()

🔍 Explanation

This line creates a Decision Tree Classifier object.

No training happens here.

It only creates an empty model.

🟦 Step 5: Train the Model


model.fit(X, y)

🔍 Explanation

The fit() method trains the model.

Syntax:


model.fit(input, output)

Here,

Input → X (Age)
Output → y (Loan Status)

During training, the Decision Tree:

Reads all training data.
Finds the best splitting condition.
Creates decision rules.
Builds the tree.

🟩 Step 6: Predict New Data

Suppose a new customer is 30 years old.


prediction = model.predict([[30]])

🔍 Explanation

The model compares the new customer's age with the learned decision rules and predicts the loan status.

🟨 Step 7: Display the Result


print("Loan Status =", prediction[0])

🔍 Explanation

prediction is returned as a list.

Example:


['Reject']

prediction[0] extracts the first element.

Output:


Loan Status = Reject

📌 Complete Python Program


# Decision Tree Classification Example

from sklearn.tree import DecisionTreeClassifier

# Training Data (Input Feature)
X = [
    [22],
    [25],
    [35],
    [40],
    [28],
    [50]
]

# Target Output
y = [
    "Reject",
    "Reject",
    "Approve",
    "Approve",
    "Reject",
    "Approve"
]

# Create Decision Tree Model
model = DecisionTreeClassifier()

# Train the Model
model.fit(X, y)

# Predict Loan Status for Age = 30
prediction = model.predict([[30]])

# Display Result
print("Loan Status =", prediction[0])

💻 Sample Output


Loan Status = Reject

🌳 How the Decision Tree Works

Suppose the trained model creates the following decision tree:


                Age
                 │
        Age ≤ 30 ?
         /        \
      Yes          No
      │             │
  Reject       Approve

Explanation

If Age ≤ 30, predict Reject.
If Age > 30, predict Approve.

For a customer aged 30:


30 ≤ 30

➡ Prediction = Reject

For a customer aged 40:


40 > 30

➡ Prediction = Approve

⚙️ Step-by-Step Working


Start
   │
   ▼
Import DecisionTreeClassifier
   │
   ▼
Create Training Dataset (X and y)
   │
   ▼
Create Decision Tree Model
   │
   ▼
Train the Model using fit()
   │
   ▼
Provide New Customer Data
   │
   ▼
Predict Loan Status
   │
   ▼
Display Result
   │
   ▼
End

📊 Explanation of Important Functions

Function	Purpose
`DecisionTreeClassifier()`	Creates the Decision Tree model
`fit(X, y)`	Trains the model using the training dataset
`predict()`	Predicts the class of new data
`print()`	Displays the prediction

🌍 Real-Life Applications

🏦 Loan Approval
🏥 Disease Diagnosis
📧 Spam Email Detection
🎓 Student Performance Prediction
🛒 Customer Purchase Prediction
🚗 Car Insurance Approval
🌾 Crop Recommendation
💳 Credit Risk Analysis

✅ Advantages

Easy to understand and interpret.
Requires little data preprocessing.
Handles both numerical and categorical data.
Works for classification and regression.
Can visualize decision-making as a tree.

❌ Limitations

Can overfit the training data.
Sensitive to small changes in the dataset.
Large trees become difficult to interpret.
May not perform well with very complex datasets.

🎯 Viva Questions

What is a Decision Tree?
Why is it called a Decision Tree?
What is DecisionTreeClassifier?
What is the purpose of fit()?
What is the purpose of predict()?
What are independent and dependent variables?
What are the advantages of Decision Trees?
What are the limitations of Decision Trees?
Give two real-life applications of Decision Trees.
Differentiate between Decision Tree Classification and Decision Tree Regression.

📝 University Exam Definition

Decision Tree is a supervised machine learning algorithm used for classification and regression. It predicts the output by splitting data into smaller subsets using decision rules based on input features, forming a tree-like structure.

⭐ One-Line Revision

Decision Tree builds a tree-like model by asking a series of questions about the input data and predicts the final output based on the learned decision rules.

Association Rule Mining (Apriori Algorithm) Using Python

Association Rule Mining (Apriori Algorithm)

Note: Association Rule Mining is an Unsupervised Machine Learning technique. It is mainly used for Market Basket Analysis to discover relationships between items frequently purchased together.

🟦 Program Aim

Aim:

To implement the Association Rule Mining (Apriori Algorithm) using Python and identify products that are frequently purchased together.

🟩 Algorithm Used

Apriori Algorithm

🟨 Problem Statement

A supermarket wants to analyze customer shopping patterns. By examining previous transactions, the store aims to identify products that are frequently purchased together. This information helps improve product placement, cross-selling, and promotional strategies.

🟪 Step 1: Install Required Library

Install the mlxtend package (only once).


pip install mlxtend

Explanation

mlxtend stands for Machine Learning Extensions.
It provides the Apriori algorithm and functions for generating association rules.

🟦 Step 2: Import Required Libraries


import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

Explanation

pandas → Used to create and manipulate data.
TransactionEncoder → Converts transaction data into a True/False matrix.
apriori() → Finds frequent itemsets.
association_rules() → Generates association rules from frequent itemsets.

🟩 Step 3: Create the Transaction Dataset


transactions = [
    ["Milk", "Bread", "Butter"],
    ["Milk", "Bread"],
    ["Milk", "Butter"],
    ["Bread", "Butter"],
    ["Milk", "Bread", "Butter", "Eggs"],
    ["Bread", "Eggs"],
    ["Milk", "Eggs"]
]

Explanation

Each inner list represents one customer's shopping basket.

Customer	Purchased Items
1	Milk, Bread, Butter
2	Milk, Bread
3	Milk, Butter
4	Bread, Butter
5	Milk, Bread, Butter, Eggs
6	Bread, Eggs
7	Milk, Eggs

🟨 Step 4: Convert Transactions into Binary Format


encoder = TransactionEncoder()

encoded_data = encoder.fit(transactions).transform(transactions)

df = pd.DataFrame(encoded_data, columns=encoder.columns_)

Explanation

The Apriori algorithm requires data in binary (True/False or 1/0) format.

The dataset becomes:

Bread	Butter	Eggs	Milk
True	True	False	True
True	False	False	True
False	True	False	True
True	True	False	False
True	True	True	True
True	False	True	False
False	False	True	True

🟦 Step 5: Display the Dataset


print(df)

Explanation

Displays the converted transaction matrix used for mining frequent itemsets.

🟩 Step 6: Find Frequent Itemsets


frequent_items = apriori(df, min_support=0.3, use_colnames=True)

print(frequent_items)

Explanation

min_support = 0.3 means an itemset must appear in at least 30% of all transactions.
use_colnames=True displays product names instead of column numbers.

Example Output:

Support	Itemsets
0.71	{Milk}
0.71	{Bread}
0.57	{Butter}
0.43	{Eggs}
0.43	{Milk, Bread}
0.43	{Milk, Butter}

🟨 Step 7: Generate Association Rules


rules = association_rules(
    frequent_items,
    metric="confidence",
    min_threshold=0.7
)

print(rules)

Explanation

This step generates association rules using:

Metric = Confidence
Minimum Confidence = 70%

Example Rule:


Milk  → Bread

Meaning:

Customers buying Milk are likely to buy Bread as well.

🟥 Step 8: Display Selected Columns


print(rules[['antecedents',
             'consequents',
             'support',
             'confidence',
             'lift']])

Explanation

This displays the most important measures:

Antecedent	Consequent	Support	Confidence	Lift
Milk	Bread	0.43	0.75	1.05
Bread	Butter	0.43	0.60	1.04

🟪 Complete Python Program


import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

transactions = [
    ["Milk", "Bread", "Butter"],
    ["Milk", "Bread"],
    ["Milk", "Butter"],
    ["Bread", "Butter"],
    ["Milk", "Bread", "Butter", "Eggs"],
    ["Bread", "Eggs"],
    ["Milk", "Eggs"]
]

encoder = TransactionEncoder()

encoded_data = encoder.fit(transactions).transform(transactions)

df = pd.DataFrame(encoded_data, columns=encoder.columns_)

print("Transaction Dataset")
print(df)

frequent_items = apriori(df,
                         min_support=0.3,
                         use_colnames=True)

print("\nFrequent Itemsets")
print(frequent_items)

rules = association_rules(frequent_items,
                          metric="confidence",
                          min_threshold=0.7)

print("\nAssociation Rules")
print(rules[['antecedents',
             'consequents',
             'support',
             'confidence',
             'lift']])

🟩 Sample Output


Transaction Dataset

   Bread  Butter  Eggs  Milk
0   True    True  False  True
1   True   False  False  True
2  False    True  False  True
3   True    True  False False
4   True    True   True  True
5   True   False   True False
6  False   False   True  True

Frequent Itemsets

support     itemsets

0.71        {Milk}

0.71        {Bread}

0.57        {Butter}

0.43        {Eggs}

0.43        {Milk, Bread}

...

Association Rules

Milk → Bread

Bread → Butter

🟦 Step-by-Step Working of the Algorithm


Transaction Data
        │
        ▼
Convert into Binary Matrix
        │
        ▼
Apply Apriori Algorithm
        │
        ▼
Find Frequent Itemsets
        │
        ▼
Generate Association Rules
        │
        ▼
Display Support, Confidence & Lift

🟨 Important Terms

Term	Description
Support	Frequency of an itemset appearing in all transactions.
Confidence	Probability that customers who buy item A also buy item B.
Lift	Measures the strength of the relationship between two items. A lift value greater than 1 indicates a positive association.
Frequent Itemset	A group of items that appears frequently in the dataset.
Association Rule	A rule showing the relationship between two or more items (e.g., Milk → Bread).

🌍 Real-Life Applications

🛒 Market Basket Analysis
🛍 Product Recommendation Systems
🏪 Store Shelf Arrangement
💳 Banking Product Recommendations
🎬 Movie Recommendation Systems
🌐 E-commerce Websites (Amazon, Flipkart)
🍔 Restaurant Combo Offers

🎯 Viva Questions

What is Association Rule Mining?
What is the Apriori Algorithm?
Define Support, Confidence, and Lift.
What is a Frequent Itemset?
Why is TransactionEncoder used?
What is the purpose of min_support?
What is the purpose of min_threshold in association rules?
Give two real-life applications of Association Rule Mining.

⭐ One-Line Revision

Association Rule Mining uses the Apriori algorithm to discover frequently occurring item combinations and generate rules such as "If a customer buys Milk, they are also likely to buy Bread."

Naïve Bayes Algorithm in Machine Learning Using Python

Naïve Bayes Algorithm in Machine Learning

🟦 Program Aim

Aim:

To implement the Gaussian Naïve Bayes Algorithm using Python and predict whether a patient has Diabetes or is Healthy based on their blood sugar level.

🟩 Algorithm Used

Gaussian Naïve Bayes (GaussianNB)

🟨 Problem Statement

A hospital wants to predict whether a patient is Healthy or has Diabetes based on the patient's Blood Sugar Level.

🟪 Step 1: Import Required Library


from sklearn.naive_bayes import GaussianNB

Explanation

sklearn is the Scikit-learn library.
naive_bayes is the module that contains Naïve Bayes algorithms.
GaussianNB is used for continuous numerical data (e.g., blood sugar, age, height, weight).

🟦 Step 2: Create the Training Dataset


X = [
    [85],
    [90],
    [95],
    [140],
    [150],
    [160]
]

Explanation

X represents the input feature (Independent Variable).

Each value is the patient's Blood Sugar Level (mg/dL).

Patient	Blood Sugar
Patient 1	85
Patient 2	90
Patient 3	95
Patient 4	140
Patient 5	150
Patient 6	160

The algorithm uses these values for learning.

🟩 Step 3: Create the Output Labels


y = [
    "Healthy",
    "Healthy",
    "Healthy",
    "Diabetes",
    "Diabetes",
    "Diabetes"
]

Explanation

y represents the target variable (Dependent Variable).

Blood Sugar	Output
85	Healthy
90	Healthy
95	Healthy
140	Diabetes
150	Diabetes
160	Diabetes

The algorithm learns the relationship between blood sugar levels and health status.

🟨 Step 4: Create the Gaussian Naïve Bayes Model


model = GaussianNB()

Explanation

This line creates an object of the Gaussian Naïve Bayes classifier.

The model is now ready to be trained.

🟪 Step 5: Train the Model


model.fit(X, y)

Explanation

The fit() function trains the model using the training data.

X = Input data (Blood Sugar)
y = Output labels (Healthy / Diabetes)

During training, the model:

Calculates the prior probability of each class.
Calculates the likelihood of each blood sugar value for each class.
Uses Bayes' Theorem to estimate probabilities.

🟦 Step 6: Predict for a New Patient


prediction = model.predict([[145]])

Explanation

The patient's blood sugar level is 145 mg/dL.

The model calculates:

Probability of Healthy
Probability of Diabetes

It selects the class with the higher probability.

🟩 Step 7: Display the Prediction


print("Prediction =", prediction[0])

Explanation

prediction is returned as a list (or array).

Using [0] retrieves the first (and only) predicted result.

Possible Output:


Prediction = Diabetes

🟥 Step 8: Complete Python Program


# Import Gaussian Naïve Bayes
from sklearn.naive_bayes import GaussianNB

# Training Data (Blood Sugar Levels)
X = [
    [85],
    [90],
    [95],
    [140],
    [150],
    [160]
]

# Output Labels
y = [
    "Healthy",
    "Healthy",
    "Healthy",
    "Diabetes",
    "Diabetes",
    "Diabetes"
]

# Create Model
model = GaussianNB()

# Train Model
model.fit(X, y)

# Predict New Patient
prediction = model.predict([[145]])

# Display Result
print("Prediction =", prediction[0])

🟦 Sample Output


Prediction = Diabetes

🟩 Step-by-Step Workflow


Start
   │
   ▼
Import GaussianNB
   │
   ▼
Create Training Dataset (X)
   │
   ▼
Create Output Labels (y)
   │
   ▼
Create GaussianNB Model
   │
   ▼
Train Model using fit()
   │
   ▼
Enter New Blood Sugar Value
   │
   ▼
Predict using predict()
   │
   ▼
Display Prediction
   │
   ▼
End

🟨 Line-by-Line Explanation

Line	Code	Description
1	`from sklearn.naive_bayes import GaussianNB`	Imports the Gaussian Naïve Bayes classifier.
2	`X = [...]`	Creates the input feature (blood sugar values).
3	`y = [...]`	Creates the output labels (Healthy/Diabetes).
4	`model = GaussianNB()`	Creates the Naïve Bayes model.
5	`model.fit(X, y)`	Trains the model using the training data.
6	`prediction = model.predict([[145]])`	Predicts the class for a new patient.
7	`print(prediction[0])`	Displays the predicted class.

🟪 Why Gaussian Naïve Bayes?

Gaussian Naïve Bayes is suitable because the feature (blood sugar level) is a continuous numerical value.

Examples of continuous data include:

Blood Sugar
Age
Height
Weight
Salary
Temperature

🟦 Advantages

✔ Easy to implement
✔ Fast training and prediction
✔ Works well with small datasets
✔ Handles continuous numerical data
✔ Effective for classification problems

🟥 Limitations

❌ Assumes all features are independent.
❌ Performance may decrease if features are highly correlated.
❌ Sensitive to the quality of training data.

🟩 Applications

🏥 Disease Diagnosis
📧 Spam Email Detection
😊 Sentiment Analysis
📰 News Classification
🌐 Language Detection
💳 Fraud Detection

📝 Viva Questions

What is Naïve Bayes?
Why is it called Naïve?
What is Gaussian Naïve Bayes?
What is the purpose of fit()?
What is the purpose of predict()?
What is the difference between Gaussian, Multinomial, and Bernoulli Naïve Bayes?
Why is prediction[0] used?
Which Python library provides the Naïve Bayes algorithm?

🎯 Key Points for Exams

Algorithm: Gaussian Naïve Bayes
Library: sklearn.naive_bayes
Model Class: GaussianNB()
Training Method: fit()
Prediction Method: predict()
Input: Continuous numerical values
Output: Predicted class (e.g., Healthy or Diabetes)

⭐ One-Line Revision

Gaussian Naïve Bayes is a supervised machine learning algorithm that uses Bayes' Theorem and probability to classify continuous numerical data by assuming that all input features are independent.

core subject

C	C++	CORE JAVA	SQL	PYTHON
MS OFFICE	HTML	VISUAL BASIC	advanced java	8085
PROLOG	ASSEMBLY LANGUAGE	JAVA SCRIPT	SHELL PROGRAMMING	R
DIGITAL ELECTRONICS	COMPUTER ARCHITECTURE	DATA STRUCTURE	OPERATING SYSTEM	GRAPH THEORY
DISCRETE MATHEMATICS	NUMERICAL ALGORITHM	AUTOMATA	MICROPROCESSOR	NETWORKING
GRAPHICS	SOFTWARE ENGINEERING	DATABSE	ANALYSIS OF ALGORITHM	IMAGE PROCESSING
ARTIFICIAL INTELLIGENCE	BIG DATA	CLOUD COMPUTING	DATA MINING	INTERNET TECHNOLOGY

list

students gallery
course

west bengal CLASS 12 COMPUTER SCIENCE
west bengal CLASS 12 COMPUTER APPLICATION
west bengal CLASS 11 COMPUTER SCIENCE
west bengal CLASS 11 COMPUTER APPLICATION
CBCS COMPUTER SCIENCE NEW SYLLABUS( UNIVERSITY OF CALCUTTA )

SEM 1	SEM 2	SEM 3
SEM 4	SEM 5	SEM 6

CBCS COMPUTER SCIENCE NEW SYLLABUS ( WEST BENGAL STATE UNIVERSITY )

SEM 1	SEM 2	SEM 3
SEM 4	SEM 5	SEM 6

CLASS-4	CLASS-5	CLASS-6
CLASS-7	CLASS-8	CLASS-9
CLASS10	CLASS11 application	CLASS12 application
CLASS11 science	CLASS12 science

CBSE BOARD

CLASS 4	CLASS 5	CLASS 6
CLASS 7	CLASS 8	CLASS 9
CLASS 10	CLASS11	CLASS12

ISCE & ISC

CLASS 4	CLASS 5	CLASS 6
CLASS 7	CLASS 8	CLASS 9
CLASS 10	CLASS11	CLASS12

language AND SOFTWARE

java

core java
advanced java

microprocessor

8085 theory
8085 question set

8085 question set (Gaonkar)

program code

digital

theory

practical

basic electronics

theory

practical
question set

architecture/organization

theory
practical

assembly languiage

question set

question set 1

data structure

theory
practical
question set

sorting question

system software

theory
practical
question set

operating system

theory
question set

question set

practical

c language

theory

theory

practical

question set

graph theory

theory
practical
question set

discrete mathematics

theory
practical
question set

numerical analysis

theory
practical

bisection

question set

formal languages and automata

theory
practical
question set

networking

theory
practical
question set

question set

graphics

theory

question set

practiical
question set

theory

question set
practical

question set

sofware engineering

theory
practical
question set

question set

dbms

theory

question set

practical

question set

image processing

theory
practical
question set

information retrieval

theory
practical
question set

unix

dos command - set1

theory
practical

basic program
vb-oracle connection

question set

syllabus

CU BSC computer science old syllabus	WBSU BSC computer science old syllabus
CU cbcs BSC computer science HONOURS syllabus 2018	WBSU cbcs BSc computer science HONOURS syllabus 2018
CU cbcs BSC computer science GENERAL syllabus 2018	WBSU cbcs BSC computer science GENERAL syllabus 2018

Total Pageviews

Monday, June 29, 2026