Total Pageviews

Monday, June 29, 2026

Naïve Bayes Algorithm in Machine Learning Using Python

 

Naïve Bayes Algorithm in Machine Learning



🟦 Program Aim

Aim:

To implement the Gaussian Naïve Bayes Algorithm using Python and predict whether a patient has Diabetes or is Healthy based on their blood sugar level.


🟩 Algorithm Used

Gaussian Naïve Bayes (GaussianNB)


🟨 Problem Statement

A hospital wants to predict whether a patient is Healthy or has Diabetes based on the patient's Blood Sugar Level.


🟪 Step 1: Import Required Library

from sklearn.naive_bayes import GaussianNB

Explanation

  • sklearn is the Scikit-learn library.
  • naive_bayes is the module that contains Naïve Bayes algorithms.
  • GaussianNB is used for continuous numerical data (e.g., blood sugar, age, height, weight).

🟦 Step 2: Create the Training Dataset

X = [
[85],
[90],
[95],
[140],
[150],
[160]
]

Explanation

X represents the input feature (Independent Variable).

Each value is the patient's Blood Sugar Level (mg/dL).

PatientBlood Sugar
Patient 185
Patient 290
Patient 395
Patient 4140
Patient 5150
Patient 6160

The algorithm uses these values for learning.


🟩 Step 3: Create the Output Labels

y = [
"Healthy",
"Healthy",
"Healthy",
"Diabetes",
"Diabetes",
"Diabetes"
]

Explanation

y represents the target variable (Dependent Variable).

Blood SugarOutput
85Healthy
90Healthy
95Healthy
140Diabetes
150Diabetes
160Diabetes

The algorithm learns the relationship between blood sugar levels and health status.


🟨 Step 4: Create the Gaussian Naïve Bayes Model

model = GaussianNB()

Explanation

This line creates an object of the Gaussian Naïve Bayes classifier.

The model is now ready to be trained.


🟪 Step 5: Train the Model

model.fit(X, y)

Explanation

The fit() function trains the model using the training data.

  • X = Input data (Blood Sugar)
  • y = Output labels (Healthy / Diabetes)

During training, the model:

  • Calculates the prior probability of each class.
  • Calculates the likelihood of each blood sugar value for each class.
  • Uses Bayes' Theorem to estimate probabilities.

🟦 Step 6: Predict for a New Patient

prediction = model.predict([[145]])

Explanation

The patient's blood sugar level is 145 mg/dL.

The model calculates:

  • Probability of Healthy
  • Probability of Diabetes

It selects the class with the higher probability.


🟩 Step 7: Display the Prediction

print("Prediction =", prediction[0])

Explanation

prediction is returned as a list (or array).

Using [0] retrieves the first (and only) predicted result.

Possible Output:

Prediction = Diabetes

🟥 Step 8: Complete Python Program

# Import Gaussian Naïve Bayes
from sklearn.naive_bayes import GaussianNB

# Training Data (Blood Sugar Levels)
X = [
[85],
[90],
[95],
[140],
[150],
[160]
]

# Output Labels
y = [
"Healthy",
"Healthy",
"Healthy",
"Diabetes",
"Diabetes",
"Diabetes"
]

# Create Model
model = GaussianNB()

# Train Model
model.fit(X, y)

# Predict New Patient
prediction = model.predict([[145]])

# Display Result
print("Prediction =", prediction[0])

🟦 Sample Output

Prediction = Diabetes

🟩 Step-by-Step Workflow

Start


Import GaussianNB


Create Training Dataset (X)


Create Output Labels (y)


Create GaussianNB Model


Train Model using fit()


Enter New Blood Sugar Value


Predict using predict()


Display Prediction


End

🟨 Line-by-Line Explanation

LineCodeDescription
1from sklearn.naive_bayes import GaussianNBImports the Gaussian Naïve Bayes classifier.
2X = [...]Creates the input feature (blood sugar values).
3y = [...]Creates the output labels (Healthy/Diabetes).
4model = GaussianNB()Creates the Naïve Bayes model.
5model.fit(X, y)Trains the model using the training data.
6prediction = model.predict([[145]])Predicts the class for a new patient.
7print(prediction[0])Displays the predicted class.

🟪 Why Gaussian Naïve Bayes?

Gaussian Naïve Bayes is suitable because the feature (blood sugar level) is a continuous numerical value.

Examples of continuous data include:

  • Blood Sugar
  • Age
  • Height
  • Weight
  • Salary
  • Temperature

🟦 Advantages

  • ✔ Easy to implement
  • ✔ Fast training and prediction
  • ✔ Works well with small datasets
  • ✔ Handles continuous numerical data
  • ✔ Effective for classification problems

🟥 Limitations

  • ❌ Assumes all features are independent.
  • ❌ Performance may decrease if features are highly correlated.
  • ❌ Sensitive to the quality of training data.

🟩 Applications

  • 🏥 Disease Diagnosis
  • 📧 Spam Email Detection
  • 😊 Sentiment Analysis
  • 📰 News Classification
  • 🌐 Language Detection
  • 💳 Fraud Detection

📝 Viva Questions

  1. What is Naïve Bayes?
  2. Why is it called Naïve?
  3. What is Gaussian Naïve Bayes?
  4. What is the purpose of fit()?
  5. What is the purpose of predict()?
  6. What is the difference between Gaussian, Multinomial, and Bernoulli Naïve Bayes?
  7. Why is prediction[0] used?
  8. Which Python library provides the Naïve Bayes algorithm?

🎯 Key Points for Exams

  • Algorithm: Gaussian Naïve Bayes
  • Library: sklearn.naive_bayes
  • Model Class: GaussianNB()
  • Training Method: fit()
  • Prediction Method: predict()
  • Input: Continuous numerical values
  • Output: Predicted class (e.g., Healthy or Diabetes)

⭐ One-Line Revision

Gaussian Naïve Bayes is a supervised machine learning algorithm that uses Bayes' Theorem and probability to classify continuous numerical data by assuming that all input features are independent.

k-Nearest Neighbors (KNN) Algorithm Using Python

 

k-Nearest Neighbors (KNN) Algorithm



🟦 Program Aim

Aim:

To implement the K-Nearest Neighbors (KNN) Classification Algorithm using Python and predict whether a person's height is classified as Short or Tall.


🟩 Algorithm Used

K-Nearest Neighbors (KNN) Classifier


🟨 Problem Statement

A school wants to classify students into two categories:

  • Short
  • Tall

based on their Height (in cm) using the K-Nearest Neighbors (KNN) algorithm.


🟪 Step 1: Import Required Library

First, import the KNeighborsClassifier class from the sklearn.neighbors module.

from sklearn.neighbors import KNeighborsClassifier

Explanation

  • sklearn is the Scikit-learn library.
  • neighbors contains the KNN algorithm.
  • KNeighborsClassifier() is used for classification problems.

🟦 Step 2: Create the Training Dataset

X = [
[150],
[160],
[170],
[180]
]

Explanation

X represents the input feature (Independent Variable).

Here, the input is the Height of students.

StudentHeight (cm)
Student 1150
Student 2160
Student 3170
Student 4180

The KNN algorithm stores these training examples.


🟩 Step 3: Create the Output Labels

y = [
"Short",
"Short",
"Tall",
"Tall"
]

Explanation

y represents the output labels (Dependent Variable).

HeightCategory
150Short
160Short
170Tall
180Tall

These are the correct answers used to train the model.


🟨 Step 4: Create the KNN Model

model = KNeighborsClassifier(n_neighbors=3)

Explanation

  • KNeighborsClassifier() creates the KNN model.
  • n_neighbors=3 means the model will consider the 3 nearest neighbors while making a prediction.

Why choose K = 3?

The algorithm checks the three closest training data points and predicts the class that appears most frequently among them.


🟦 Step 5: Train the Model

model.fit(X, y)

Explanation

The fit() method trains the model.

Syntax:

model.fit(input_data, output_labels)

Here,

  • X → Heights of students
  • y → Categories (Short/Tall)

During training, KNN stores the dataset instead of creating a mathematical model.


🟩 Step 6: Predict a New Data Point

prediction = model.predict([[175]])

Explanation

We want to predict the category of a student whose height is 175 cm.

The model calculates the distance between 175 cm and all training data points.


🟨 Step 7: Display the Result

print("Prediction =", prediction[0])

Output

Prediction = Tall

Explanation

Since the majority of the nearest neighbors are classified as Tall, the algorithm predicts:

Prediction = Tall


🟦 Complete Python Program

from sklearn.neighbors import KNeighborsClassifier

# Training Data (Height in cm)
X = [
[150],
[160],
[170],
[180]
]

# Output Labels
y = [
"Short",
"Short",
"Tall",
"Tall"
]

# Create KNN Model
model = KNeighborsClassifier(n_neighbors=3)

# Train the Model
model.fit(X, y)

# Predict for a New Student
prediction = model.predict([[175]])

# Display the Result
print("Prediction =", prediction[0])

🟪 Step-by-Step Working of KNN

Step 1️⃣ Import the KNN library

Step 2️⃣ Create the training dataset

Step 3️⃣ Create the output labels

Step 4️⃣ Choose the value of K

Step 5️⃣ Train the model using fit()

Step 6️⃣ Enter a new data point

Step 7️⃣ Calculate the distance from the new point to all training points

Step 8️⃣ Select the K nearest neighbors

Step 9️⃣ Count the majority class (Majority Voting)

Step 🔟 Display the predicted result


🟥 Workflow

        Training Data


Choose Value of K (K=3)


Train the Model


New Data (175 cm)


Calculate Distances


Find 3 Nearest Neighbors


Majority Voting


Final Prediction
(Tall)

🟩 Distance Calculation Example

Suppose the new student's height is 175 cm.

Training HeightDistance from 175Category
15025Short
16015Short
1705Tall
1805Tall

The 3 nearest neighbors are:

HeightCategory
170Tall
180Tall
160Short

Majority Voting

  • Tall = 2 votes
  • Short = 1 vote

Final Prediction = Tall


🟦 Expected Output

Prediction = Tall

🟨 Explanation of Important Functions

FunctionDescription
KNeighborsClassifier()Creates the KNN classifier model
n_neighbors=3Selects the 3 nearest neighbors
fit(X, y)Stores the training dataset
predict()Predicts the category for new data

🟩 Advantages

  • ✔ Simple and easy to understand
  • ✔ No complex training process
  • ✔ Suitable for classification and regression
  • ✔ Works well with small datasets
  • ✔ Easy to implement

🟥 Limitations

  • ❌ Slow for large datasets
  • ❌ Sensitive to noisy data
  • ❌ Choosing the correct value of K is important
  • ❌ Performance decreases with high-dimensional data

🟦 Applications

  • 🏥 Disease Diagnosis
  • 📧 Spam Email Detection
  • 😊 Face Recognition
  • 🎬 Movie Recommendation
  • 🛒 Product Recommendation
  • 🌸 Flower Classification
  • 👤 Customer Segmentation

📝 Viva Questions

Q1. What is KNN?

Answer:
K-Nearest Neighbors (KNN) is a supervised machine learning algorithm that predicts the class of a new data point by analyzing the K nearest training examples.


Q2. What does K represent?

Answer:
K represents the number of nearest neighbors considered while making a prediction.


Q3. Why is an odd value of K preferred?

Answer:
An odd value (e.g., 3, 5, 7) helps avoid ties during majority voting in binary classification.


Q4. Does KNN require a training phase?

Answer:
KNN has no explicit training phase. It simply stores the training data and performs calculations during prediction.



K-Nearest Neighbors (KNN) is a supervised machine learning algorithm that classifies a new data point by finding the K nearest neighbors using a distance metric and assigning the class based on majority voting (classification) or average value (regression).

Support Vector Machine (SVM) Using Python

 

Support Vector Machine (SVM) 



🟦 Program Aim

Aim:

To implement the Support Vector Machine (SVM) algorithm using Python and classify objects into different categories.


🟩 Algorithm Used

Support Vector Machine (SVM) Classifier


🟨 Problem Statement

A fruit shop wants to classify fruits into two categories:

  • 🍎 Small Fruit
  • 🍉 Large Fruit

The classification is based on the weight of the fruit.


🟪 Step 1: Import the Required Library

First, import the SVC (Support Vector Classifier) class from the sklearn.svm module.

from sklearn.svm import SVC

Explanation

  • sklearn is the Scikit-learn machine learning library.
  • svm is the module that contains Support Vector Machine algorithms.
  • SVC() is used for classification problems.

🟦 Step 2: Create the Training Dataset

X = [
[2],
[3],
[4],
[5]
]

Explanation

X represents the input feature (Independent Variable).

Here, each value represents the weight of a fruit (in kg).

FruitWeight (kg)
Fruit 12
Fruit 23
Fruit 34
Fruit 45

The SVM algorithm learns from these weight values.


🟩 Step 3: Create the Output Labels

y = [
"Small",
"Small",
"Large",
"Large"
]

Explanation

y represents the target labels (Dependent Variable).

WeightCategory
2Small
3Small
4Large
5Large

The model learns which weight belongs to which category.


🟨 Step 4: Create the SVM Model

model = SVC(kernel="linear")

Explanation

  • SVC() creates the Support Vector Machine model.
  • kernel="linear" tells the model to use a Linear Kernel.
  • The model will find the best straight-line boundary (hyperplane) between the two categories.

🟪 Step 5: Train the Model

model.fit(X, y)

Explanation

The fit() function trains the SVM model.

Syntax

model.fit(X, y)

Where:

  • X = Input data
  • y = Output labels

During training, the algorithm:

  • Reads the training data.
  • Finds the support vectors.
  • Calculates the maximum margin.
  • Draws the optimal hyperplane.

🟦 Step 6: Predict New Data

Suppose a new fruit has a weight of 4 kg.

prediction = model.predict([[4]])

Explanation

predict() is used to classify new data.

Syntax

model.predict([[value]])

Here,

[[4]]

means the weight of the new fruit is 4 kg.

The model predicts whether it is Small or Large.


🟩 Step 7: Display the Result

print("Prediction =", prediction[0])

Explanation

prediction is returned as a list.

Example:

['Large']

To print only the predicted class, use:

prediction[0]

Output

Prediction = Large

🟨 Complete Python Program

# Import Support Vector Machine
from sklearn.svm import SVC

# Training Data (Fruit Weight)
X = [
[2],
[3],
[4],
[5]
]

# Output Labels
y = [
"Small",
"Small",
"Large",
"Large"
]

# Create SVM Model
model = SVC(kernel="linear")

# Train the Model
model.fit(X, y)

# Predict New Fruit
prediction = model.predict([[4]])

# Display Result
print("Prediction =", prediction[0])

🟥 Expected Output

Prediction = Large

🟦 Step-by-Step Working of the Program

Step 1
Import SVC Class


Step 2
Create Training Dataset (X)


Step 3
Create Output Labels (y)


Step 4
Create SVM Model
(kernel = "linear")


Step 5
Train Model
(model.fit)


Step 6
Predict New Data
(model.predict)


Step 7
Display Prediction

🟩 How SVM Makes the Decision

Suppose the training data is:

WeightCategory
2Small
3Small
4Large
5Large

The SVM finds the best boundary:

Small Fruits           Large Fruits

2 3 | 4 5
○------○------|------●------●

Best Hyperplane

When a new fruit with weight = 4 kg is given:

  • It lies on the Large side of the hyperplane.
  • Therefore, the model predicts Large.

🟪 Advantages of SVM

  • ✔ High accuracy
  • ✔ Effective for classification problems
  • ✔ Works well with high-dimensional data
  • ✔ Handles both linear and non-linear data (using kernels)
  • ✔ Less prone to overfitting

🟥 Limitations of SVM

  • ❌ Training is slower for very large datasets
  • ❌ Choosing the correct kernel can be difficult
  • ❌ Sensitive to noisy data
  • ❌ Requires careful parameter tuning

🌍 Real-Life Applications

  • 🏥 Disease Diagnosis
  • 📧 Spam Email Detection
  • 😊 Face Recognition
  • ✍️ Handwriting Recognition
  • 💳 Credit Card Fraud Detection
  • 🚗 Traffic Sign Recognition
  • 📱 Image Classification

📝 Viva Questions

  1. What is Support Vector Machine (SVM)?
  2. What is a hyperplane in SVM?
  3. What are support vectors?
  4. What is the role of the kernel in SVM?
  5. What is the difference between Linear SVM and Non-Linear SVM?
  6. Why is SVM considered a powerful classification algorithm?

⭐ One-Line Revision

Support Vector Machine (SVM) is a supervised machine learning algorithm that classifies data by finding the optimal hyperplane with the maximum margin between different classes.