🌳 Decision Tree in Machine Learning
🎯 Aim
Aim:
To implement the Decision Tree Classification Algorithm using Python and predict whether a customer is eligible for a Loan Approval based on their Age.
📖 Problem Statement
A bank has customer records containing Age and Loan Approval Status.
The bank wants to predict whether a new customer will get a loan based on the customer's age.
🟦 Step 1: Import Required Library
from sklearn.tree import DecisionTreeClassifier
🔍 Explanation
-
sklearn→ Scikit-Learn library used for Machine Learning. -
tree→ Module containing Decision Tree algorithms. -
DecisionTreeClassifier→ Used for solving classification problems.
🟩 Step 2: Create the Training Dataset
X = [
[22],
[25],
[35],
[40],
[28],
[50]
]
🔍 Explanation
X represents the Independent Variable (Input Feature).
Here, the feature is Age.
| Customer | Age |
|---|---|
| 1 | 22 |
| 2 | 25 |
| 3 | 35 |
| 4 | 40 |
| 5 | 28 |
| 6 | 50 |
The model learns patterns from these age values.
🟨 Step 3: Create the Target Variable
y = [
"Reject",
"Reject",
"Approve",
"Approve",
"Reject",
"Approve"
]
🔍 Explanation
y represents the Dependent Variable (Target Output).
| Age | Loan Status |
|---|---|
| 22 | Reject |
| 25 | Reject |
| 35 | Approve |
| 40 | Approve |
| 28 | Reject |
| 50 | Approve |
The model learns the relationship between Age and Loan Status.
🟪 Step 4: Create the Decision Tree Model
model = DecisionTreeClassifier()
🔍 Explanation
This line creates a Decision Tree Classifier object.
No training happens here.
It only creates an empty model.
🟦 Step 5: Train the Model
model.fit(X, y)
🔍 Explanation
The fit() method trains the model.
Syntax:
model.fit(input, output)
Here,
- Input → X (Age)
- Output → y (Loan Status)
During training, the Decision Tree:
- Reads all training data.
- Finds the best splitting condition.
- Creates decision rules.
- Builds the tree.
🟩 Step 6: Predict New Data
Suppose a new customer is 30 years old.
prediction = model.predict([[30]])
🔍 Explanation
The model compares the new customer's age with the learned decision rules and predicts the loan status.
🟨 Step 7: Display the Result
print("Loan Status =", prediction[0])
🔍 Explanation
prediction is returned as a list.
Example:
['Reject']
prediction[0] extracts the first element.
Output:
Loan Status = Reject
📌 Complete Python Program
# Decision Tree Classification Example
from sklearn.tree import DecisionTreeClassifier
# Training Data (Input Feature)
X = [
[22],
[25],
[35],
[40],
[28],
[50]
]
# Target Output
y = [
"Reject",
"Reject",
"Approve",
"Approve",
"Reject",
"Approve"
]
# Create Decision Tree Model
model = DecisionTreeClassifier()
# Train the Model
model.fit(X, y)
# Predict Loan Status for Age = 30
prediction = model.predict([[30]])
# Display Result
print("Loan Status =", prediction[0])
💻 Sample Output
Loan Status = Reject
🌳 How the Decision Tree Works
Suppose the trained model creates the following decision tree:
Age
│
Age ≤ 30 ?
/ \
Yes No
│ │
Reject Approve
Explanation
- If Age ≤ 30, predict Reject.
- If Age > 30, predict Approve.
For a customer aged 30:
30 ≤ 30
➡ Prediction = Reject
For a customer aged 40:
40 > 30
➡ Prediction = Approve
⚙️ Step-by-Step Working
Start
│
▼
Import DecisionTreeClassifier
│
▼
Create Training Dataset (X and y)
│
▼
Create Decision Tree Model
│
▼
Train the Model using fit()
│
▼
Provide New Customer Data
│
▼
Predict Loan Status
│
▼
Display Result
│
▼
End
📊 Explanation of Important Functions
| Function | Purpose |
|---|---|
DecisionTreeClassifier() | Creates the Decision Tree model |
fit(X, y) | Trains the model using the training dataset |
predict() | Predicts the class of new data |
print() | Displays the prediction |
🌍 Real-Life Applications
- 🏦 Loan Approval
- 🏥 Disease Diagnosis
- 📧 Spam Email Detection
- 🎓 Student Performance Prediction
- 🛒 Customer Purchase Prediction
- 🚗 Car Insurance Approval
- 🌾 Crop Recommendation
- 💳 Credit Risk Analysis
✅ Advantages
- Easy to understand and interpret.
- Requires little data preprocessing.
- Handles both numerical and categorical data.
- Works for classification and regression.
- Can visualize decision-making as a tree.
❌ Limitations
- Can overfit the training data.
- Sensitive to small changes in the dataset.
- Large trees become difficult to interpret.
- May not perform well with very complex datasets.
🎯 Viva Questions
- What is a Decision Tree?
- Why is it called a Decision Tree?
-
What is
DecisionTreeClassifier? -
What is the purpose of
fit()? -
What is the purpose of
predict()? - What are independent and dependent variables?
- What are the advantages of Decision Trees?
- What are the limitations of Decision Trees?
- Give two real-life applications of Decision Trees.
- Differentiate between Decision Tree Classification and Decision Tree Regression.
📝 University Exam Definition
Decision Tree is a supervised machine learning algorithm used for classification and regression. It predicts the output by splitting data into smaller subsets using decision rules based on input features, forming a tree-like structure.
⭐ One-Line Revision
Decision Tree builds a tree-like model by asking a series of questions about the input data and predicts the final output based on the learned decision rules.
No comments:
Post a Comment