k-Nearest Neighbors (KNN) Algorithm
🟦 Program Aim
Aim:
To implement the K-Nearest Neighbors (KNN) Classification Algorithm using Python and predict whether a person's height is classified as Short or Tall.
🟩 Algorithm Used
K-Nearest Neighbors (KNN) Classifier
🟨 Problem Statement
A school wants to classify students into two categories:
- Short
- Tall
based on their Height (in cm) using the K-Nearest Neighbors (KNN) algorithm.
🟪 Step 1: Import Required Library
First, import the KNeighborsClassifier class from the sklearn.neighbors module.
from sklearn.neighbors import KNeighborsClassifier
Explanation
-
sklearnis the Scikit-learn library. -
neighborscontains the KNN algorithm. -
KNeighborsClassifier()is used for classification problems.
🟦 Step 2: Create the Training Dataset
X = [
[150],
[160],
[170],
[180]
]
Explanation
X represents the input feature (Independent Variable).
Here, the input is the Height of students.
| Student | Height (cm) |
|---|---|
| Student 1 | 150 |
| Student 2 | 160 |
| Student 3 | 170 |
| Student 4 | 180 |
The KNN algorithm stores these training examples.
🟩 Step 3: Create the Output Labels
y = [
"Short",
"Short",
"Tall",
"Tall"
]
Explanation
y represents the output labels (Dependent Variable).
| Height | Category |
|---|---|
| 150 | Short |
| 160 | Short |
| 170 | Tall |
| 180 | Tall |
These are the correct answers used to train the model.
🟨 Step 4: Create the KNN Model
model = KNeighborsClassifier(n_neighbors=3)
Explanation
-
KNeighborsClassifier()creates the KNN model. -
n_neighbors=3means the model will consider the 3 nearest neighbors while making a prediction.
Why choose K = 3?
The algorithm checks the three closest training data points and predicts the class that appears most frequently among them.
🟦 Step 5: Train the Model
model.fit(X, y)
Explanation
The fit() method trains the model.
Syntax:
model.fit(input_data, output_labels)
Here,
-
X→ Heights of students -
y→ Categories (Short/Tall)
During training, KNN stores the dataset instead of creating a mathematical model.
🟩 Step 6: Predict a New Data Point
prediction = model.predict([[175]])
Explanation
We want to predict the category of a student whose height is 175 cm.
The model calculates the distance between 175 cm and all training data points.
🟨 Step 7: Display the Result
print("Prediction =", prediction[0])
Output
Prediction = Tall
Explanation
Since the majority of the nearest neighbors are classified as Tall, the algorithm predicts:
Prediction = Tall
🟦 Complete Python Program
from sklearn.neighbors import KNeighborsClassifier
# Training Data (Height in cm)
X = [
[150],
[160],
[170],
[180]
]
# Output Labels
y = [
"Short",
"Short",
"Tall",
"Tall"
]
# Create KNN Model
model = KNeighborsClassifier(n_neighbors=3)
# Train the Model
model.fit(X, y)
# Predict for a New Student
prediction = model.predict([[175]])
# Display the Result
print("Prediction =", prediction[0])
🟪 Step-by-Step Working of KNN
Step 1️⃣ Import the KNN library
⬇
Step 2️⃣ Create the training dataset
⬇
Step 3️⃣ Create the output labels
⬇
Step 4️⃣ Choose the value of K
⬇
Step 5️⃣ Train the model using fit()
⬇
Step 6️⃣ Enter a new data point
⬇
Step 7️⃣ Calculate the distance from the new point to all training points
⬇
Step 8️⃣ Select the K nearest neighbors
⬇
Step 9️⃣ Count the majority class (Majority Voting)
⬇
Step 🔟 Display the predicted result
🟥 Workflow
Training Data
│
▼
Choose Value of K (K=3)
│
▼
Train the Model
│
▼
New Data (175 cm)
│
▼
Calculate Distances
│
▼
Find 3 Nearest Neighbors
│
▼
Majority Voting
│
▼
Final Prediction
(Tall)
🟩 Distance Calculation Example
Suppose the new student's height is 175 cm.
| Training Height | Distance from 175 | Category |
|---|---|---|
| 150 | 25 | Short |
| 160 | 15 | Short |
| 170 | 5 | Tall |
| 180 | 5 | Tall |
The 3 nearest neighbors are:
| Height | Category |
|---|---|
| 170 | Tall |
| 180 | Tall |
| 160 | Short |
Majority Voting
- Tall = 2 votes
- Short = 1 vote
➡ Final Prediction = Tall
🟦 Expected Output
Prediction = Tall
🟨 Explanation of Important Functions
| Function | Description |
|---|---|
KNeighborsClassifier() | Creates the KNN classifier model |
n_neighbors=3 | Selects the 3 nearest neighbors |
fit(X, y) | Stores the training dataset |
predict() | Predicts the category for new data |
🟩 Advantages
- ✔ Simple and easy to understand
- ✔ No complex training process
- ✔ Suitable for classification and regression
- ✔ Works well with small datasets
- ✔ Easy to implement
🟥 Limitations
- ❌ Slow for large datasets
- ❌ Sensitive to noisy data
- ❌ Choosing the correct value of K is important
- ❌ Performance decreases with high-dimensional data
🟦 Applications
- 🏥 Disease Diagnosis
- 📧 Spam Email Detection
- 😊 Face Recognition
- 🎬 Movie Recommendation
- 🛒 Product Recommendation
- 🌸 Flower Classification
- 👤 Customer Segmentation
📝 Viva Questions
Q1. What is KNN?
Answer:
K-Nearest Neighbors (KNN) is a supervised machine learning algorithm that predicts the class of a new data point by analyzing the K nearest training examples.
Q2. What does K represent?
Answer:
K represents the number of nearest neighbors considered while making a prediction.
Q3. Why is an odd value of K preferred?
Answer:
An odd value (e.g., 3, 5, 7) helps avoid ties during majority voting in binary classification.
Q4. Does KNN require a training phase?
Answer:
KNN has no explicit training phase. It simply stores the training data and performs calculations during prediction.
K-Nearest Neighbors (KNN) is a supervised machine learning algorithm that classifies a new data point by finding the K nearest neighbors using a distance metric and assigning the class based on majority voting (classification) or average value (regression).
No comments:
Post a Comment