Decision Tree Algorithm
The ID3 (Iterative Dichotomiser 3) algorithm is a Decision Tree classification algorithm that selects attributes based on Information Gain (IG) to maximize data purity at each step.
Step 1: Understanding the Key Concepts
Before implementing ID3, let's understand the core concepts:
1. Entropy (H)
Entropy measures the impurity or uncertainty in the dataset:
- If all samples belong to one class, entropy = 0 (pure dataset).
- If the dataset is evenly split between classes, entropy = 1 (most uncertain).
Formula for Entropy:
Where:
- = Probability of class in dataset
- = Logarithm base 2
2. Information Gain (IG)
Information Gain tells us how much entropy is reduced after splitting the data using an attribute.
Formula for Information Gain:
Where:
- = Entropy before splitting
- = Weighted entropy after splitting by attribute
The attribute with the highest IG is chosen for splitting.
Step 2: Example Dataset
Let’s use an example dataset where we classify whether a person buys a computer based on Age, Income, Student Status, and Credit Rating.
| Age | Income | Student | Credit Rating | Buys Computer? |
|---|---|---|---|---|
| Youth | High | No | Fair | No |
| Youth | High | No | Excellent | No |
| Middle | High | No | Fair | Yes |
| Senior | Medium | No | Fair | Yes |
| Senior | Low | Yes | Fair | Yes |
| Senior | Low | Yes | Excellent | No |
| Middle | Low | Yes | Excellent | Yes |
| Youth | Medium | No | Fair | No |
| Youth | Low | Yes | Fair | Yes |
| Senior | Medium | Yes | Fair | Yes |
| Youth | Medium | Yes | Excellent | Yes |
| Middle | Medium | No | Excellent | Yes |
| Middle | High | Yes | Fair | Yes |
| Senior | Medium | No | Excellent | No |
Here, "Buys Computer?" is the target variable (Yes/No).
Step 3: Calculate Initial Entropy
First, calculate the entropy of the target variable.
- Total samples = 14
- Yes = 9, No = 5
Using the entropy formula:
Step 4: Calculate Entropy for Each Attribute
Now, calculate entropy after splitting for each attribute.
Splitting by "Age"
We split the dataset into three groups: Youth, Middle, and Senior.
| Age | Yes | No | Total | Entropy |
|---|---|---|---|---|
| Youth | 2 | 3 | 5 | 0.971 |
| Middle | 4 | 0 | 4 | 0 |
| Senior | 3 | 2 | 5 | 0.971 |
Weighted Entropy:
Information Gain for Age:
Similarly, we compute IG for Income, Student, and Credit Rating:
- IG(Age) = 0.247
- IG(Income) = 0.029
- IG(Student) = 0.151
- IG(Credit Rating) = 0.048
Since Age has the highest IG, it becomes the root node.
Step 5: Choose the Best Attribute and Split the Data
Now that Age is the root, we repeat the process for each subset.
Step 6: Repeat Until the Tree is Complete
Step 6.1: Splitting the "Youth" Subset
| Age | Income | Student | Credit Rating | Buys Computer? |
|---|---|---|---|---|
| Youth | High | No | Fair | No |
| Youth | High | No | Excellent | No |
| Youth | Medium | No | Fair | No |
| Youth | Low | Yes | Fair | Yes |
| Youth | Medium | Yes | Excellent | Yes |
Entropy Calculation:
Best Split: "Student"
Now, we split Youth by Student.
Step 6.2: Splitting the "Middle" Subset
| Age | Income | Student | Credit Rating | Buys Computer? |
|---|---|---|---|---|
| Middle | High | No | Fair | Yes |
| Middle | Low | Yes | Excellent | Yes |
| Middle | Medium | No | Excellent | Yes |
| Middle | High | Yes | Fair | Yes |
Since all instances are Yes, entropy = 0, so no further splitting is needed.
Step 6.3: Splitting the "Senior" Subset
| Age | Income | Student | Credit Rating | Buys Computer? |
|---|---|---|---|---|
| Senior | Medium | No | Fair | Yes |
| Senior | Low | Yes | Fair | Yes |
| Senior | Low | Yes | Excellent | No |
| Senior | Medium | No | Excellent | No |
| Senior | Medium | Yes | Fair | Yes |
Entropy Calculation:
Best Split: "Student"
Final Decision Tree
- Youth is split by Student.
- Middle is pure (Yes).
- Senior is split by Student.
Step 7: Classify New Data
If a new person has:
- Age = Youth
- Income = Medium
- Student = Yes
- Credit Rating = Fair
Following the tree:
- Age = Youth → Split by Student
- Student = Yes → Predict "Yes" (Buys Computer)
Thus, the final prediction is "Yes".
PROGRAM
import numpy as np
import pandas as pd
class ID3DecisionTree:
def __init__(self):
self.tree = None # Store the final decision tree
def entropy(self, target_col):
"""Calculate entropy of a dataset."""
values, counts = np.unique(target_col, return_counts=True)
probabilities = counts / len(target_col)
return -np.sum(probabilities * np.log2(probabilities))
def information_gain(self, data, feature, target):
"""Calculate the Information Gain of a feature."""
total_entropy = self.entropy(data[target])
# Calculate entropy for each subset after splitting by feature
values, counts = np.unique(data[feature], return_counts=True)
weighted_entropy = np.sum([
(counts[i] / np.sum(counts)) * self.entropy(data[data[feature] == values[i]][target])
for i in range(len(values))
])
return total_entropy - weighted_entropy # Information Gain
def best_feature_to_split(self, data, target):
"""Find the feature with the highest Information Gain."""
features = data.columns.drop(target)
gains = {feature: self.information_gain(data, feature, target) for feature in features}
return max(gains, key=gains.get) # Return the feature with highest IG
def build_tree(self, data, target):
"""Build the decision tree using the ID3 algorithm."""
target_values = np.unique(data[target])
# Base case: If all target values are the same, return that value
if len(target_values) == 1:
return target_values[0]
# Base case: If no features left, return the most common target value
if len(data.columns) == 1:
return data[target].mode()[0]
# Choose the best feature for splitting
best_feature = self.best_feature_to_split(data, target)
tree = {best_feature: {}}
# Split data and recursively build subtrees
for value in np.unique(data[best_feature]):
subset = data[data[best_feature] == value].drop(columns=[best_feature])
tree[best_feature][value] = self.build_tree(subset, target)
return tree
def fit(self, data, target):
"""Train the model by building the tree."""
self.tree = self.build_tree(data, target)
def predict(self, sample):
"""Classify a new sample using the trained decision tree."""
def classify(tree, sample):
if not isinstance(tree, dict):
return tree # Leaf node reached
root = next(iter(tree)) # Get root node
value = sample.get(root, None) # Get value for the root node
if value in tree[root]: # Traverse the tree
return classify(tree[root][value], sample)
else:
return "Unknown" # If value is not present in the tree
return classify(self.tree, sample)
# Sample dataset
data = pd.DataFrame({
'Age': ['Youth', 'Youth', 'Middle', 'Senior', 'Senior', 'Senior', 'Middle', 'Youth', 'Youth', 'Senior', 'Youth', 'Middle', 'Middle', 'Senior'],
'Income': ['High', 'High', 'High', 'Medium', 'Low', 'Low', 'Low', 'Medium', 'Low', 'Medium', 'Medium', 'Medium', 'High', 'Medium'],
'Student': ['No', 'No', 'No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No'],
'CreditRating': ['Fair', 'Excellent', 'Fair', 'Fair', 'Fair', 'Excellent', 'Excellent', 'Fair', 'Fair', 'Fair', 'Excellent', 'Excellent', 'Fair', 'Excellent'],
'BuysComputer': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No']
})
# Train the decision tree model
tree_model = ID3DecisionTree()
tree_model.fit(data, 'BuysComputer')
# Print the decision tree
print("Decision Tree:", tree_model.tree)
# Example: Classify a new sample
new_sample = {'Age': 'Youth', 'Income': 'Medium', 'Student': 'Yes', 'CreditRating': 'Fair'}
prediction = tree_model.predict(new_sample)
print(f"Prediction for {new_sample}: {prediction}")
Comments
Post a Comment