Naïve Bayes Algorithm in Machine Learning
🟦 Program Aim
Aim:
To implement the Gaussian Naïve Bayes Algorithm using Python and predict whether a patient has Diabetes or is Healthy based on their blood sugar level.
🟩 Algorithm Used
Gaussian Naïve Bayes (GaussianNB)
🟨 Problem Statement
A hospital wants to predict whether a patient is Healthy or has Diabetes based on the patient's Blood Sugar Level.
🟪 Step 1: Import Required Library
from sklearn.naive_bayes import GaussianNB
Explanation
-
sklearnis the Scikit-learn library. -
naive_bayesis the module that contains Naïve Bayes algorithms. -
GaussianNBis used for continuous numerical data (e.g., blood sugar, age, height, weight).
🟦 Step 2: Create the Training Dataset
X = [
[85],
[90],
[95],
[140],
[150],
[160]
]
Explanation
X represents the input feature (Independent Variable).
Each value is the patient's Blood Sugar Level (mg/dL).
| Patient | Blood Sugar |
|---|---|
| Patient 1 | 85 |
| Patient 2 | 90 |
| Patient 3 | 95 |
| Patient 4 | 140 |
| Patient 5 | 150 |
| Patient 6 | 160 |
The algorithm uses these values for learning.
🟩 Step 3: Create the Output Labels
y = [
"Healthy",
"Healthy",
"Healthy",
"Diabetes",
"Diabetes",
"Diabetes"
]
Explanation
y represents the target variable (Dependent Variable).
| Blood Sugar | Output |
|---|---|
| 85 | Healthy |
| 90 | Healthy |
| 95 | Healthy |
| 140 | Diabetes |
| 150 | Diabetes |
| 160 | Diabetes |
The algorithm learns the relationship between blood sugar levels and health status.
🟨 Step 4: Create the Gaussian Naïve Bayes Model
model = GaussianNB()
Explanation
This line creates an object of the Gaussian Naïve Bayes classifier.
The model is now ready to be trained.
🟪 Step 5: Train the Model
model.fit(X, y)
Explanation
The fit() function trains the model using the training data.
-
X= Input data (Blood Sugar) -
y= Output labels (Healthy / Diabetes)
During training, the model:
- Calculates the prior probability of each class.
- Calculates the likelihood of each blood sugar value for each class.
- Uses Bayes' Theorem to estimate probabilities.
🟦 Step 6: Predict for a New Patient
prediction = model.predict([[145]])
Explanation
The patient's blood sugar level is 145 mg/dL.
The model calculates:
- Probability of Healthy
- Probability of Diabetes
It selects the class with the higher probability.
🟩 Step 7: Display the Prediction
print("Prediction =", prediction[0])
Explanation
prediction is returned as a list (or array).
Using [0] retrieves the first (and only) predicted result.
Possible Output:
Prediction = Diabetes
🟥 Step 8: Complete Python Program
# Import Gaussian Naïve Bayes
from sklearn.naive_bayes import GaussianNB
# Training Data (Blood Sugar Levels)
X = [
[85],
[90],
[95],
[140],
[150],
[160]
]
# Output Labels
y = [
"Healthy",
"Healthy",
"Healthy",
"Diabetes",
"Diabetes",
"Diabetes"
]
# Create Model
model = GaussianNB()
# Train Model
model.fit(X, y)
# Predict New Patient
prediction = model.predict([[145]])
# Display Result
print("Prediction =", prediction[0])
🟦 Sample Output
Prediction = Diabetes
🟩 Step-by-Step Workflow
Start
│
▼
Import GaussianNB
│
▼
Create Training Dataset (X)
│
▼
Create Output Labels (y)
│
▼
Create GaussianNB Model
│
▼
Train Model using fit()
│
▼
Enter New Blood Sugar Value
│
▼
Predict using predict()
│
▼
Display Prediction
│
▼
End
🟨 Line-by-Line Explanation
| Line | Code | Description |
|---|---|---|
| 1 | from sklearn.naive_bayes import GaussianNB | Imports the Gaussian Naïve Bayes classifier. |
| 2 | X = [...] | Creates the input feature (blood sugar values). |
| 3 | y = [...] | Creates the output labels (Healthy/Diabetes). |
| 4 | model = GaussianNB() | Creates the Naïve Bayes model. |
| 5 | model.fit(X, y) | Trains the model using the training data. |
| 6 | prediction = model.predict([[145]]) | Predicts the class for a new patient. |
| 7 | print(prediction[0]) | Displays the predicted class. |
🟪 Why Gaussian Naïve Bayes?
Gaussian Naïve Bayes is suitable because the feature (blood sugar level) is a continuous numerical value.
Examples of continuous data include:
- Blood Sugar
- Age
- Height
- Weight
- Salary
- Temperature
🟦 Advantages
- ✔ Easy to implement
- ✔ Fast training and prediction
- ✔ Works well with small datasets
- ✔ Handles continuous numerical data
- ✔ Effective for classification problems
🟥 Limitations
- ❌ Assumes all features are independent.
- ❌ Performance may decrease if features are highly correlated.
- ❌ Sensitive to the quality of training data.
🟩 Applications
- 🏥 Disease Diagnosis
- 📧 Spam Email Detection
- 😊 Sentiment Analysis
- 📰 News Classification
- 🌐 Language Detection
- 💳 Fraud Detection
📝 Viva Questions
- What is Naïve Bayes?
- Why is it called Naïve?
- What is Gaussian Naïve Bayes?
-
What is the purpose of
fit()? -
What is the purpose of
predict()? - What is the difference between Gaussian, Multinomial, and Bernoulli Naïve Bayes?
-
Why is
prediction[0]used? - Which Python library provides the Naïve Bayes algorithm?
🎯 Key Points for Exams
- Algorithm: Gaussian Naïve Bayes
-
Library:
sklearn.naive_bayes -
Model Class:
GaussianNB() -
Training Method:
fit() -
Prediction Method:
predict() - Input: Continuous numerical values
- Output: Predicted class (e.g., Healthy or Diabetes)
⭐ One-Line Revision
Gaussian Naïve Bayes is a supervised machine learning algorithm that uses Bayes' Theorem and probability to classify continuous numerical data by assuming that all input features are independent.
No comments:
Post a Comment