🟦🟩🟨 Logistic Regression Using Python
🎯 Aim
To implement the Logistic Regression algorithm using Python to classify whether a patient has a disease based on their age.
🧠 Theory
Logistic Regression is a Supervised Machine Learning algorithm used for classification problems. Unlike Linear Regression, which predicts continuous values, Logistic Regression predicts categories or classes.
Examples
- 📧 Spam / Not Spam
- 🏥 Disease / Healthy
- 🎓 Pass / Fail
- 💳 Fraud / Not Fraud
The output is usually:
- 0 → No
- 1 → Yes
Logistic Regression uses the Sigmoid Function to convert predictions into probabilities between 0 and 1.
🌍 Real-Life Example
A hospital wants to predict whether a patient has diabetes based only on Age.
Training Data
| Age | Disease |
|---|---|
| 20 | 0 |
| 25 | 0 |
| 30 | 0 |
| 35 | 0 |
| 40 | 1 |
| 45 | 1 |
| 50 | 1 |
| 55 | 1 |
Here,
- 0 = Healthy
- 1 = Disease
📝 Step 1: Import Required Libraries
from sklearn.linear_model import LogisticRegression
Explanation
-
sklearn→ Machine Learning library. -
linear_model→ Contains regression algorithms. -
LogisticRegression→ Used for classification.
📝 Step 2: Create Input Data
X = [
[20],
[25],
[30],
[35],
[40],
[45],
[50],
[55]
]
Explanation
X stores the input feature (Age).
Each value is written inside another list because Scikit-learn expects 2D input.
X
20
25
30
35
40
45
50
55
📝 Step 3: Create Output Data
y = [
0,
0,
0,
0,
1,
1,
1,
1
]
Explanation
y stores the output labels.
| Value | Meaning |
|---|---|
| 0 | Healthy |
| 1 | Disease |
📝 Step 4: Create the Model
model = LogisticRegression()
Explanation
This creates an empty Logistic Regression model.
At this stage:
✔ No learning
✔ No prediction
✔ Just an empty model
📝 Step 5: Train the Model
model.fit(X, y)
Explanation
fit() teaches the model using the training data.
The model learns:
- Relationship between Age and Disease
- Probability of Disease
This is called the Training Phase.
Training Data
│
▼
Model Learning
│
▼
Trained Model
📝 Step 6: Predict New Data
Suppose a new patient is 42 years old.
prediction = model.predict([[42]])
Explanation
The model compares age 42 with the learned pattern and predicts:
Either
0
or
1
📝 Step 7: Display Prediction
print(prediction)
Output
[1]
Meaning
Disease
📝 Step 8: Display User-Friendly Output
if prediction[0] == 1:
print("Patient has Disease")
else:
print("Patient is Healthy")
Output
Patient has Disease
✅ Complete Python Program
# Logistic Regression Example
from sklearn.linear_model import LogisticRegression
# Input Data (Age)
X = [
[20],
[25],
[30],
[35],
[40],
[45],
[50],
[55]
]
# Output Data
# 0 = Healthy
# 1 = Disease
y = [
0,
0,
0,
0,
1,
1,
1,
1
]
# Create Model
model = LogisticRegression()
# Train Model
model.fit(X, y)
# Predict
prediction = model.predict([[42]])
# Display Prediction
if prediction[0] == 1:
print("Patient has Disease")
else:
print("Patient is Healthy")
💻 Expected Output
Patient has Disease
🔍 Step-by-Step Flow
Import Library
│
▼
Create Dataset
│
▼
Create Logistic Regression Model
│
▼
Train Model using fit()
│
▼
Predict using predict()
│
▼
Display Result
📌 Understanding fit()
model.fit(X, y)
This is the most important line.
It means:
Teach the computer using
X → Input
y → Output
After this line, the model becomes trained.
📌 Understanding predict()
model.predict([[42]])
Meaning:
Predict the output for a new patient whose age is 42 years.
📌 Understanding prediction[0]
predict() returns a list (or array).
Example:
prediction = [1]
To access the first value:
prediction[0]
Result
1
📊 Training Data Visualization
| Age | Disease |
|---|---|
| 20 | Healthy |
| 25 | Healthy |
| 30 | Healthy |
| 35 | Healthy |
| 40 | Disease |
| 45 | Disease |
| 50 | Disease |
| 55 | Disease |
The model learns that higher ages in this small example are associated with the "Disease" class.
📈 Why Logistic Regression?
| Linear Regression | Logistic Regression |
|---|---|
| Predicts numbers | Predicts categories |
| Output can be any value | Output is a probability (0–1) and a class |
| Used for Regression | Used for Classification |
⭐ Advantages
- ✔ Easy to implement
- ✔ Fast training
- ✔ Works well for binary classification
- ✔ Produces probability estimates
- ✔ Easy to interpret
❌ Limitations
- ❌ Assumes a linear relationship between features and the log-odds
- ❌ Less effective for highly complex, non-linear data
- ❌ Sensitive to outliers in some situations
🌍 Applications
- 🏥 Disease Prediction
- 📧 Spam Detection
- 💳 Credit Card Fraud Detection
- 🎓 Student Pass/Fail Prediction
- 🛒 Customer Purchase Prediction
- 🏦 Loan Approval Prediction
🎯 Viva Questions
- What is Logistic Regression?
- Why is it called "Regression" if it is used for classification?
- What is the role of the Sigmoid Function?
-
What does
fit()do? -
What does
predict()do? -
Why is the input written as
[[42]]instead of[42]? -
What is the meaning of
prediction[0]? - What types of problems can Logistic Regression solve?
📝 One-Line Revision
Logistic Regression is a supervised machine learning algorithm that predicts the probability of a data point belonging to a particular class and is mainly used for binary classification problems.
No comments:
Post a Comment