Total Pageviews

Monday, June 29, 2026

k-Nearest Neighbors (KNN) Algorithm Using Python

 

k-Nearest Neighbors (KNN) Algorithm



🟦 Program Aim

Aim:

To implement the K-Nearest Neighbors (KNN) Classification Algorithm using Python and predict whether a person's height is classified as Short or Tall.


🟩 Algorithm Used

K-Nearest Neighbors (KNN) Classifier


🟨 Problem Statement

A school wants to classify students into two categories:

  • Short
  • Tall

based on their Height (in cm) using the K-Nearest Neighbors (KNN) algorithm.


🟪 Step 1: Import Required Library

First, import the KNeighborsClassifier class from the sklearn.neighbors module.

from sklearn.neighbors import KNeighborsClassifier

Explanation

  • sklearn is the Scikit-learn library.
  • neighbors contains the KNN algorithm.
  • KNeighborsClassifier() is used for classification problems.

🟦 Step 2: Create the Training Dataset

X = [
[150],
[160],
[170],
[180]
]

Explanation

X represents the input feature (Independent Variable).

Here, the input is the Height of students.

StudentHeight (cm)
Student 1150
Student 2160
Student 3170
Student 4180

The KNN algorithm stores these training examples.


🟩 Step 3: Create the Output Labels

y = [
"Short",
"Short",
"Tall",
"Tall"
]

Explanation

y represents the output labels (Dependent Variable).

HeightCategory
150Short
160Short
170Tall
180Tall

These are the correct answers used to train the model.


🟨 Step 4: Create the KNN Model

model = KNeighborsClassifier(n_neighbors=3)

Explanation

  • KNeighborsClassifier() creates the KNN model.
  • n_neighbors=3 means the model will consider the 3 nearest neighbors while making a prediction.

Why choose K = 3?

The algorithm checks the three closest training data points and predicts the class that appears most frequently among them.


🟦 Step 5: Train the Model

model.fit(X, y)

Explanation

The fit() method trains the model.

Syntax:

model.fit(input_data, output_labels)

Here,

  • X → Heights of students
  • y → Categories (Short/Tall)

During training, KNN stores the dataset instead of creating a mathematical model.


🟩 Step 6: Predict a New Data Point

prediction = model.predict([[175]])

Explanation

We want to predict the category of a student whose height is 175 cm.

The model calculates the distance between 175 cm and all training data points.


🟨 Step 7: Display the Result

print("Prediction =", prediction[0])

Output

Prediction = Tall

Explanation

Since the majority of the nearest neighbors are classified as Tall, the algorithm predicts:

Prediction = Tall


🟦 Complete Python Program

from sklearn.neighbors import KNeighborsClassifier

# Training Data (Height in cm)
X = [
[150],
[160],
[170],
[180]
]

# Output Labels
y = [
"Short",
"Short",
"Tall",
"Tall"
]

# Create KNN Model
model = KNeighborsClassifier(n_neighbors=3)

# Train the Model
model.fit(X, y)

# Predict for a New Student
prediction = model.predict([[175]])

# Display the Result
print("Prediction =", prediction[0])

🟪 Step-by-Step Working of KNN

Step 1️⃣ Import the KNN library

Step 2️⃣ Create the training dataset

Step 3️⃣ Create the output labels

Step 4️⃣ Choose the value of K

Step 5️⃣ Train the model using fit()

Step 6️⃣ Enter a new data point

Step 7️⃣ Calculate the distance from the new point to all training points

Step 8️⃣ Select the K nearest neighbors

Step 9️⃣ Count the majority class (Majority Voting)

Step 🔟 Display the predicted result


🟥 Workflow

        Training Data


Choose Value of K (K=3)


Train the Model


New Data (175 cm)


Calculate Distances


Find 3 Nearest Neighbors


Majority Voting


Final Prediction
(Tall)

🟩 Distance Calculation Example

Suppose the new student's height is 175 cm.

Training HeightDistance from 175Category
15025Short
16015Short
1705Tall
1805Tall

The 3 nearest neighbors are:

HeightCategory
170Tall
180Tall
160Short

Majority Voting

  • Tall = 2 votes
  • Short = 1 vote

Final Prediction = Tall


🟦 Expected Output

Prediction = Tall

🟨 Explanation of Important Functions

FunctionDescription
KNeighborsClassifier()Creates the KNN classifier model
n_neighbors=3Selects the 3 nearest neighbors
fit(X, y)Stores the training dataset
predict()Predicts the category for new data

🟩 Advantages

  • ✔ Simple and easy to understand
  • ✔ No complex training process
  • ✔ Suitable for classification and regression
  • ✔ Works well with small datasets
  • ✔ Easy to implement

🟥 Limitations

  • ❌ Slow for large datasets
  • ❌ Sensitive to noisy data
  • ❌ Choosing the correct value of K is important
  • ❌ Performance decreases with high-dimensional data

🟦 Applications

  • 🏥 Disease Diagnosis
  • 📧 Spam Email Detection
  • 😊 Face Recognition
  • 🎬 Movie Recommendation
  • 🛒 Product Recommendation
  • 🌸 Flower Classification
  • 👤 Customer Segmentation

📝 Viva Questions

Q1. What is KNN?

Answer:
K-Nearest Neighbors (KNN) is a supervised machine learning algorithm that predicts the class of a new data point by analyzing the K nearest training examples.


Q2. What does K represent?

Answer:
K represents the number of nearest neighbors considered while making a prediction.


Q3. Why is an odd value of K preferred?

Answer:
An odd value (e.g., 3, 5, 7) helps avoid ties during majority voting in binary classification.


Q4. Does KNN require a training phase?

Answer:
KNN has no explicit training phase. It simply stores the training data and performs calculations during prediction.



K-Nearest Neighbors (KNN) is a supervised machine learning algorithm that classifies a new data point by finding the K nearest neighbors using a distance metric and assigning the class based on majority voting (classification) or average value (regression).

No comments:

Post a Comment