Monday, June 29, 2026

Naïve Bayes Algorithm in Machine Learning Using Python

Naïve Bayes Algorithm in Machine Learning

🟦 Program Aim

Aim:

To implement the Gaussian Naïve Bayes Algorithm using Python and predict whether a patient has Diabetes or is Healthy based on their blood sugar level.

🟩 Algorithm Used

Gaussian Naïve Bayes (GaussianNB)

🟨 Problem Statement

A hospital wants to predict whether a patient is Healthy or has Diabetes based on the patient's Blood Sugar Level.

🟪 Step 1: Import Required Library


from sklearn.naive_bayes import GaussianNB

Explanation

sklearn is the Scikit-learn library.
naive_bayes is the module that contains Naïve Bayes algorithms.
GaussianNB is used for continuous numerical data (e.g., blood sugar, age, height, weight).

🟦 Step 2: Create the Training Dataset


X = [
    [85],
    [90],
    [95],
    [140],
    [150],
    [160]
]

Explanation

X represents the input feature (Independent Variable).

Each value is the patient's Blood Sugar Level (mg/dL).

Patient	Blood Sugar
Patient 1	85
Patient 2	90
Patient 3	95
Patient 4	140
Patient 5	150
Patient 6	160

The algorithm uses these values for learning.

🟩 Step 3: Create the Output Labels


y = [
    "Healthy",
    "Healthy",
    "Healthy",
    "Diabetes",
    "Diabetes",
    "Diabetes"
]

Explanation

y represents the target variable (Dependent Variable).

Blood Sugar	Output
85	Healthy
90	Healthy
95	Healthy
140	Diabetes
150	Diabetes
160	Diabetes

The algorithm learns the relationship between blood sugar levels and health status.

🟨 Step 4: Create the Gaussian Naïve Bayes Model


model = GaussianNB()

Explanation

This line creates an object of the Gaussian Naïve Bayes classifier.

The model is now ready to be trained.

🟪 Step 5: Train the Model


model.fit(X, y)

Explanation

The fit() function trains the model using the training data.

X = Input data (Blood Sugar)
y = Output labels (Healthy / Diabetes)

During training, the model:

Calculates the prior probability of each class.
Calculates the likelihood of each blood sugar value for each class.
Uses Bayes' Theorem to estimate probabilities.

🟦 Step 6: Predict for a New Patient


prediction = model.predict([[145]])

Explanation

The patient's blood sugar level is 145 mg/dL.

The model calculates:

Probability of Healthy
Probability of Diabetes

It selects the class with the higher probability.

🟩 Step 7: Display the Prediction


print("Prediction =", prediction[0])

Explanation

prediction is returned as a list (or array).

Using [0] retrieves the first (and only) predicted result.

Possible Output:


Prediction = Diabetes

🟥 Step 8: Complete Python Program


# Import Gaussian Naïve Bayes
from sklearn.naive_bayes import GaussianNB

# Training Data (Blood Sugar Levels)
X = [
    [85],
    [90],
    [95],
    [140],
    [150],
    [160]
]

# Output Labels
y = [
    "Healthy",
    "Healthy",
    "Healthy",
    "Diabetes",
    "Diabetes",
    "Diabetes"
]

# Create Model
model = GaussianNB()

# Train Model
model.fit(X, y)

# Predict New Patient
prediction = model.predict([[145]])

# Display Result
print("Prediction =", prediction[0])

🟦 Sample Output


Prediction = Diabetes

🟩 Step-by-Step Workflow


Start
   │
   ▼
Import GaussianNB
   │
   ▼
Create Training Dataset (X)
   │
   ▼
Create Output Labels (y)
   │
   ▼
Create GaussianNB Model
   │
   ▼
Train Model using fit()
   │
   ▼
Enter New Blood Sugar Value
   │
   ▼
Predict using predict()
   │
   ▼
Display Prediction
   │
   ▼
End

🟨 Line-by-Line Explanation

Line	Code	Description
1	`from sklearn.naive_bayes import GaussianNB`	Imports the Gaussian Naïve Bayes classifier.
2	`X = [...]`	Creates the input feature (blood sugar values).
3	`y = [...]`	Creates the output labels (Healthy/Diabetes).
4	`model = GaussianNB()`	Creates the Naïve Bayes model.
5	`model.fit(X, y)`	Trains the model using the training data.
6	`prediction = model.predict([[145]])`	Predicts the class for a new patient.
7	`print(prediction[0])`	Displays the predicted class.

🟪 Why Gaussian Naïve Bayes?

Gaussian Naïve Bayes is suitable because the feature (blood sugar level) is a continuous numerical value.

Examples of continuous data include:

Blood Sugar
Age
Height
Weight
Salary
Temperature

🟦 Advantages

✔ Easy to implement
✔ Fast training and prediction
✔ Works well with small datasets
✔ Handles continuous numerical data
✔ Effective for classification problems

🟥 Limitations

❌ Assumes all features are independent.
❌ Performance may decrease if features are highly correlated.
❌ Sensitive to the quality of training data.

🟩 Applications

🏥 Disease Diagnosis
📧 Spam Email Detection
😊 Sentiment Analysis
📰 News Classification
🌐 Language Detection
💳 Fraud Detection

📝 Viva Questions

What is Naïve Bayes?
Why is it called Naïve?
What is Gaussian Naïve Bayes?
What is the purpose of fit()?
What is the purpose of predict()?
What is the difference between Gaussian, Multinomial, and Bernoulli Naïve Bayes?
Why is prediction[0] used?
Which Python library provides the Naïve Bayes algorithm?

🎯 Key Points for Exams

Algorithm: Gaussian Naïve Bayes
Library: sklearn.naive_bayes
Model Class: GaussianNB()
Training Method: fit()
Prediction Method: predict()
Input: Continuous numerical values
Output: Predicted class (e.g., Healthy or Diabetes)

⭐ One-Line Revision

Gaussian Naïve Bayes is a supervised machine learning algorithm that uses Bayes' Theorem and probability to classify continuous numerical data by assuming that all input features are independent.

k-Nearest Neighbors (KNN) Algorithm Using Python

k-Nearest Neighbors (KNN) Algorithm

🟦 Program Aim

Aim:

To implement the K-Nearest Neighbors (KNN) Classification Algorithm using Python and predict whether a person's height is classified as Short or Tall.

🟩 Algorithm Used

K-Nearest Neighbors (KNN) Classifier

🟨 Problem Statement

A school wants to classify students into two categories:

Short
Tall

based on their Height (in cm) using the K-Nearest Neighbors (KNN) algorithm.

🟪 Step 1: Import Required Library

First, import the KNeighborsClassifier class from the sklearn.neighbors module.


from sklearn.neighbors import KNeighborsClassifier

Explanation

sklearn is the Scikit-learn library.
neighbors contains the KNN algorithm.
KNeighborsClassifier() is used for classification problems.

🟦 Step 2: Create the Training Dataset


X = [
    [150],
    [160],
    [170],
    [180]
]

Explanation

X represents the input feature (Independent Variable).

Here, the input is the Height of students.

Student	Height (cm)
Student 1	150
Student 2	160
Student 3	170
Student 4	180

The KNN algorithm stores these training examples.

🟩 Step 3: Create the Output Labels


y = [
    "Short",
    "Short",
    "Tall",
    "Tall"
]

Explanation

y represents the output labels (Dependent Variable).

Height	Category
150	Short
160	Short
170	Tall
180	Tall

These are the correct answers used to train the model.

🟨 Step 4: Create the KNN Model


model = KNeighborsClassifier(n_neighbors=3)

Explanation

KNeighborsClassifier() creates the KNN model.
n_neighbors=3 means the model will consider the 3 nearest neighbors while making a prediction.

Why choose K = 3?

The algorithm checks the three closest training data points and predicts the class that appears most frequently among them.

🟦 Step 5: Train the Model


model.fit(X, y)

Explanation

The fit() method trains the model.

Syntax:


model.fit(input_data, output_labels)

Here,

X → Heights of students
y → Categories (Short/Tall)

During training, KNN stores the dataset instead of creating a mathematical model.

🟩 Step 6: Predict a New Data Point


prediction = model.predict([[175]])

Explanation

We want to predict the category of a student whose height is 175 cm.

The model calculates the distance between 175 cm and all training data points.

🟨 Step 7: Display the Result


print("Prediction =", prediction[0])

Output


Prediction = Tall

Explanation

Since the majority of the nearest neighbors are classified as Tall, the algorithm predicts:

Prediction = Tall

🟦 Complete Python Program


from sklearn.neighbors import KNeighborsClassifier

# Training Data (Height in cm)
X = [
    [150],
    [160],
    [170],
    [180]
]

# Output Labels
y = [
    "Short",
    "Short",
    "Tall",
    "Tall"
]

# Create KNN Model
model = KNeighborsClassifier(n_neighbors=3)

# Train the Model
model.fit(X, y)

# Predict for a New Student
prediction = model.predict([[175]])

# Display the Result
print("Prediction =", prediction[0])

🟪 Step-by-Step Working of KNN

Step 1️⃣ Import the KNN library

⬇

Step 2️⃣ Create the training dataset

⬇

Step 3️⃣ Create the output labels

⬇

Step 4️⃣ Choose the value of K

⬇

Step 5️⃣ Train the model using `fit()`

⬇

Step 6️⃣ Enter a new data point

⬇

Step 7️⃣ Calculate the distance from the new point to all training points

⬇

Step 8️⃣ Select the K nearest neighbors

⬇

Step 9️⃣ Count the majority class (Majority Voting)

⬇

Step 🔟 Display the predicted result

🟥 Workflow


        Training Data
             │
             ▼
   Choose Value of K (K=3)
             │
             ▼
      Train the Model
             │
             ▼
      New Data (175 cm)
             │
             ▼
   Calculate Distances
             │
             ▼
 Find 3 Nearest Neighbors
             │
             ▼
     Majority Voting
             │
             ▼
      Final Prediction
          (Tall)

🟩 Distance Calculation Example

Suppose the new student's height is 175 cm.

Training Height	Distance from 175	Category
150	25	Short
160	15	Short
170	5	Tall
180	5	Tall

The 3 nearest neighbors are:

Height	Category
170	Tall
180	Tall
160	Short

Majority Voting

Tall = 2 votes
Short = 1 vote

➡ Final Prediction = Tall

🟦 Expected Output


Prediction = Tall

🟨 Explanation of Important Functions

Function	Description
`KNeighborsClassifier()`	Creates the KNN classifier model
`n_neighbors=3`	Selects the 3 nearest neighbors
`fit(X, y)`	Stores the training dataset
`predict()`	Predicts the category for new data

🟩 Advantages

✔ Simple and easy to understand
✔ No complex training process
✔ Suitable for classification and regression
✔ Works well with small datasets
✔ Easy to implement

🟥 Limitations

❌ Slow for large datasets
❌ Sensitive to noisy data
❌ Choosing the correct value of K is important
❌ Performance decreases with high-dimensional data

🟦 Applications

🏥 Disease Diagnosis
📧 Spam Email Detection
😊 Face Recognition
🎬 Movie Recommendation
🛒 Product Recommendation
🌸 Flower Classification
👤 Customer Segmentation

📝 Viva Questions

Q1. What is KNN?

Answer:
K-Nearest Neighbors (KNN) is a supervised machine learning algorithm that predicts the class of a new data point by analyzing the K nearest training examples.

Q2. What does K represent?

Answer:
K represents the number of nearest neighbors considered while making a prediction.

Q3. Why is an odd value of K preferred?

Answer:
An odd value (e.g., 3, 5, 7) helps avoid ties during majority voting in binary classification.

Q4. Does KNN require a training phase?

Answer:
KNN has no explicit training phase. It simply stores the training data and performs calculations during prediction.

K-Nearest Neighbors (KNN) is a supervised machine learning algorithm that classifies a new data point by finding the K nearest neighbors using a distance metric and assigning the class based on majority voting (classification) or average value (regression).

Support Vector Machine (SVM) Using Python

Support Vector Machine (SVM)

🟦 Program Aim

Aim:

To implement the Support Vector Machine (SVM) algorithm using Python and classify objects into different categories.

🟩 Algorithm Used

Support Vector Machine (SVM) Classifier

🟨 Problem Statement

A fruit shop wants to classify fruits into two categories:

🍎 Small Fruit
🍉 Large Fruit

The classification is based on the weight of the fruit.

🟪 Step 1: Import the Required Library

First, import the SVC (Support Vector Classifier) class from the sklearn.svm module.


from sklearn.svm import SVC

Explanation

sklearn is the Scikit-learn machine learning library.
svm is the module that contains Support Vector Machine algorithms.
SVC() is used for classification problems.

🟦 Step 2: Create the Training Dataset


X = [
    [2],
    [3],
    [4],
    [5]
]

Explanation

X represents the input feature (Independent Variable).

Here, each value represents the weight of a fruit (in kg).

Fruit	Weight (kg)
Fruit 1	2
Fruit 2	3
Fruit 3	4
Fruit 4	5

The SVM algorithm learns from these weight values.

🟩 Step 3: Create the Output Labels


y = [
    "Small",
    "Small",
    "Large",
    "Large"
]

Explanation

y represents the target labels (Dependent Variable).

Weight	Category
2	Small
3	Small
4	Large
5	Large

The model learns which weight belongs to which category.

🟨 Step 4: Create the SVM Model


model = SVC(kernel="linear")

Explanation

SVC() creates the Support Vector Machine model.
kernel="linear" tells the model to use a Linear Kernel.
The model will find the best straight-line boundary (hyperplane) between the two categories.

🟪 Step 5: Train the Model


model.fit(X, y)

Explanation

The fit() function trains the SVM model.

Syntax


model.fit(X, y)

Where:

X = Input data
y = Output labels

During training, the algorithm:

Reads the training data.
Finds the support vectors.
Calculates the maximum margin.
Draws the optimal hyperplane.

🟦 Step 6: Predict New Data

Suppose a new fruit has a weight of 4 kg.


prediction = model.predict([[4]])

Explanation

predict() is used to classify new data.

Syntax


model.predict([[value]])

Here,


[[4]]

means the weight of the new fruit is 4 kg.

The model predicts whether it is Small or Large.

🟩 Step 7: Display the Result


print("Prediction =", prediction[0])

Explanation

prediction is returned as a list.

Example:


['Large']

To print only the predicted class, use:


prediction[0]

Output


Prediction = Large

🟨 Complete Python Program


# Import Support Vector Machine
from sklearn.svm import SVC

# Training Data (Fruit Weight)
X = [
    [2],
    [3],
    [4],
    [5]
]

# Output Labels
y = [
    "Small",
    "Small",
    "Large",
    "Large"
]

# Create SVM Model
model = SVC(kernel="linear")

# Train the Model
model.fit(X, y)

# Predict New Fruit
prediction = model.predict([[4]])

# Display Result
print("Prediction =", prediction[0])

🟥 Expected Output


Prediction = Large

🟦 Step-by-Step Working of the Program


Step 1
Import SVC Class
        │
        ▼
Step 2
Create Training Dataset (X)
        │
        ▼
Step 3
Create Output Labels (y)
        │
        ▼
Step 4
Create SVM Model
(kernel = "linear")
        │
        ▼
Step 5
Train Model
(model.fit)
        │
        ▼
Step 6
Predict New Data
(model.predict)
        │
        ▼
Step 7
Display Prediction

🟩 How SVM Makes the Decision

Suppose the training data is:

Weight	Category
2	Small
3	Small
4	Large
5	Large

The SVM finds the best boundary:


Small Fruits           Large Fruits

2      3      |      4      5
○------○------|------●------●
               ↑
         Best Hyperplane

When a new fruit with weight = 4 kg is given:

It lies on the Large side of the hyperplane.
Therefore, the model predicts Large.

🟪 Advantages of SVM

✔ High accuracy
✔ Effective for classification problems
✔ Works well with high-dimensional data
✔ Handles both linear and non-linear data (using kernels)
✔ Less prone to overfitting

🟥 Limitations of SVM

❌ Training is slower for very large datasets
❌ Choosing the correct kernel can be difficult
❌ Sensitive to noisy data
❌ Requires careful parameter tuning

🌍 Real-Life Applications

🏥 Disease Diagnosis
📧 Spam Email Detection
😊 Face Recognition
✍️ Handwriting Recognition
💳 Credit Card Fraud Detection
🚗 Traffic Sign Recognition
📱 Image Classification

📝 Viva Questions

What is Support Vector Machine (SVM)?
What is a hyperplane in SVM?
What are support vectors?
What is the role of the kernel in SVM?
What is the difference between Linear SVM and Non-Linear SVM?
Why is SVM considered a powerful classification algorithm?

⭐ One-Line Revision

Support Vector Machine (SVM) is a supervised machine learning algorithm that classifies data by finding the optimal hyperplane with the maximum margin between different classes.

core subject

C	C++	CORE JAVA	SQL	PYTHON
MS OFFICE	HTML	VISUAL BASIC	advanced java	8085
PROLOG	ASSEMBLY LANGUAGE	JAVA SCRIPT	SHELL PROGRAMMING	R
DIGITAL ELECTRONICS	COMPUTER ARCHITECTURE	DATA STRUCTURE	OPERATING SYSTEM	GRAPH THEORY
DISCRETE MATHEMATICS	NUMERICAL ALGORITHM	AUTOMATA	MICROPROCESSOR	NETWORKING
GRAPHICS	SOFTWARE ENGINEERING	DATABSE	ANALYSIS OF ALGORITHM	IMAGE PROCESSING
ARTIFICIAL INTELLIGENCE	BIG DATA	CLOUD COMPUTING	DATA MINING	INTERNET TECHNOLOGY

list

students gallery
course

west bengal CLASS 12 COMPUTER SCIENCE
west bengal CLASS 12 COMPUTER APPLICATION
west bengal CLASS 11 COMPUTER SCIENCE
west bengal CLASS 11 COMPUTER APPLICATION
CBCS COMPUTER SCIENCE NEW SYLLABUS( UNIVERSITY OF CALCUTTA )

SEM 1	SEM 2	SEM 3
SEM 4	SEM 5	SEM 6

CBCS COMPUTER SCIENCE NEW SYLLABUS ( WEST BENGAL STATE UNIVERSITY )

SEM 1	SEM 2	SEM 3
SEM 4	SEM 5	SEM 6

CLASS-4	CLASS-5	CLASS-6
CLASS-7	CLASS-8	CLASS-9
CLASS10	CLASS11 application	CLASS12 application
CLASS11 science	CLASS12 science

CBSE BOARD

CLASS 4	CLASS 5	CLASS 6
CLASS 7	CLASS 8	CLASS 9
CLASS 10	CLASS11	CLASS12

ISCE & ISC

CLASS 4	CLASS 5	CLASS 6
CLASS 7	CLASS 8	CLASS 9
CLASS 10	CLASS11	CLASS12

language AND SOFTWARE

java

core java
advanced java

microprocessor

8085 theory
8085 question set

8085 question set (Gaonkar)

program code

digital

theory

practical

basic electronics

theory

practical
question set

architecture/organization

theory
practical

assembly languiage

question set

question set 1

data structure

theory
practical
question set

sorting question

system software

theory
practical
question set

operating system

theory
question set

question set

practical

c language

theory

theory

practical

question set

graph theory

theory
practical
question set

discrete mathematics

theory
practical
question set

numerical analysis

theory
practical

bisection

question set

formal languages and automata

theory
practical
question set

networking

theory
practical
question set

question set

graphics

theory

question set

practiical
question set

theory

question set
practical

question set

sofware engineering

theory
practical
question set

question set

dbms

theory

question set

practical

question set

image processing

theory
practical
question set

information retrieval

theory
practical
question set

unix

dos command - set1

theory
practical

basic program
vb-oracle connection

question set

syllabus

CU BSC computer science old syllabus	WBSU BSC computer science old syllabus
CU cbcs BSC computer science HONOURS syllabus 2018	WBSU cbcs BSc computer science HONOURS syllabus 2018
CU cbcs BSC computer science GENERAL syllabus 2018	WBSU cbcs BSC computer science GENERAL syllabus 2018

Total Pageviews

Monday, June 29, 2026