Decision Tree Algorithm

Decision Tree Algorithm

The ID3 (Iterative Dichotomiser 3) algorithm is a Decision Tree classification algorithm that selects attributes based on Information Gain (IG) to maximize data purity at each step.

Step 1: Understanding the Key Concepts

Before implementing ID3, let's understand the core concepts:

1. Entropy (H)

Entropy measures the impurity or uncertainty in the dataset:

If all samples belong to one class, entropy = 0 (pure dataset).
If the dataset is evenly split between classes, entropy = 1 (most uncertain).

Formula for Entropy:

H(S) = - \sum p_i \log_2(p_i)

Where:

$p_i$ = Probability of class $i$ in dataset $S$
$\log_2$ = Logarithm base 2

2. Information Gain (IG)

Information Gain tells us how much entropy is reduced after splitting the data using an attribute.

Formula for Information Gain:

IG(S,A) = H(S) - H(S|A)

Where:

$H(S)$ = Entropy before splitting
$H(S|A)$ = Weighted entropy after splitting by attribute $A$

The attribute with the highest IG is chosen for splitting.

Step 2: Example Dataset

Let’s use an example dataset where we classify whether a person buys a computer based on Age, Income, Student Status, and Credit Rating.

Age	Income	Student	Credit Rating	Buys Computer?
Youth	High	No	Fair	No
Youth	High	No	Excellent	No
Middle	High	No	Fair	Yes
Senior	Medium	No	Fair	Yes
Senior	Low	Yes	Fair	Yes
Senior	Low	Yes	Excellent	No
Middle	Low	Yes	Excellent	Yes
Youth	Medium	No	Fair	No
Youth	Low	Yes	Fair	Yes
Senior	Medium	Yes	Fair	Yes
Youth	Medium	Yes	Excellent	Yes
Middle	Medium	No	Excellent	Yes
Middle	High	Yes	Fair	Yes
Senior	Medium	No	Excellent	No

Here, "Buys Computer?" is the target variable (Yes/No).

Step 3: Calculate Initial Entropy $H(S)$

First, calculate the entropy of the target variable.

Total samples = 14
Yes = 9, No = 5

Using the entropy formula:

H(S) = - \left(\frac{9}{14} \log_2 \frac{9}{14} + \frac{5}{14} \log_2 \frac{5}{14} \right)

H(S) = 0.94

Step 4: Calculate Entropy for Each Attribute

Now, calculate entropy after splitting for each attribute.

Splitting by "Age"

We split the dataset into three groups: Youth, Middle, and Senior.

Age	Yes	No	Total	Entropy
Youth	2	3	5	0.971
Middle	4	0	4	0
Senior	3	2	5	0.971

Weighted Entropy:

H(S|Age) = \left(\frac{5}{14} \times 0.971\right) + \left(\frac{4}{14} \times 0\right) + \left(\frac{5}{14} \times 0.971\right)

H(S|Age) = 0.693

Information Gain for Age:

IG(Age) = H(S) - H(S|Age) = 0.94 - 0.693 = 0.247

Similarly, we compute IG for Income, Student, and Credit Rating:

IG(Age) = 0.247
IG(Income) = 0.029
IG(Student) = 0.151
IG(Credit Rating) = 0.048

Since Age has the highest IG, it becomes the root node.

Step 5: Choose the Best Attribute and Split the Data

Now that Age is the root, we repeat the process for each subset.

Step 6: Repeat Until the Tree is Complete

Step 6.1: Splitting the "Youth" Subset

Age	Income	Student	Credit Rating	Buys Computer?
Youth	High	No	Fair	No
Youth	High	No	Excellent	No
Youth	Medium	No	Fair	No
Youth	Low	Yes	Fair	Yes
Youth	Medium	Yes	Excellent	Yes

Entropy Calculation:

H(Youth) = 0.971

Best Split: "Student"

IG(Student) = 0.970

Now, we split Youth by Student.

Step 6.2: Splitting the "Middle" Subset

Age	Income	Student	Credit Rating	Buys Computer?
Middle	High	No	Fair	Yes
Middle	Low	Yes	Excellent	Yes
Middle	Medium	No	Excellent	Yes
Middle	High	Yes	Fair	Yes

Since all instances are Yes, entropy = 0, so no further splitting is needed.

Step 6.3: Splitting the "Senior" Subset

Age	Income	Student	Credit Rating	Buys Computer?
Senior	Medium	No	Fair	Yes
Senior	Low	Yes	Fair	Yes
Senior	Low	Yes	Excellent	No
Senior	Medium	No	Excellent	No
Senior	Medium	Yes	Fair	Yes

Entropy Calculation:

H(Senior) = 0.971

Best Split: "Student"

Final Decision Tree

yaml
          [Age]
         /  |   \
   Youth  Middle  Senior
    / \      |       /  \
 [Student]  Yes   [Student]
  /   \         /       \
 No   Yes     Yes       No

Youth is split by Student.
Middle is pure (Yes).
Senior is split by Student.

Step 7: Classify New Data

If a new person has:

Age = Youth
Income = Medium
Student = Yes
Credit Rating = Fair

Following the tree:

Age = Youth → Split by Student
Student = Yes → Predict "Yes" (Buys Computer)

Thus, the final prediction is "Yes".

PROGRAM

import numpy as np

import pandas as pd

class ID3DecisionTree:

def __init__(self):

self.tree = None # Store the final decision tree

def entropy(self, target_col):

"""Calculate entropy of a dataset."""

values, counts = np.unique(target_col, return_counts=True)

probabilities = counts / len(target_col)

return -np.sum(probabilities * np.log2(probabilities))

def information_gain(self, data, feature, target):

"""Calculate the Information Gain of a feature."""

total_entropy = self.entropy(data[target])

# Calculate entropy for each subset after splitting by feature

values, counts = np.unique(data[feature], return_counts=True)

weighted_entropy = np.sum([

(counts[i] / np.sum(counts)) * self.entropy(data[data[feature] == values[i]][target])

for i in range(len(values))

])

return total_entropy - weighted_entropy # Information Gain

def best_feature_to_split(self, data, target):

"""Find the feature with the highest Information Gain."""

features = data.columns.drop(target)

gains = {feature: self.information_gain(data, feature, target) for feature in features}

return max(gains, key=gains.get) # Return the feature with highest IG

def build_tree(self, data, target):

"""Build the decision tree using the ID3 algorithm."""

target_values = np.unique(data[target])

# Base case: If all target values are the same, return that value

if len(target_values) == 1:

return target_values[0]

# Base case: If no features left, return the most common target value

if len(data.columns) == 1:

return data[target].mode()[0]

# Choose the best feature for splitting

best_feature = self.best_feature_to_split(data, target)

tree = {best_feature: {}}

# Split data and recursively build subtrees

for value in np.unique(data[best_feature]):

subset = data[data[best_feature] == value].drop(columns=[best_feature])

tree[best_feature][value] = self.build_tree(subset, target)

return tree

def fit(self, data, target):

"""Train the model by building the tree."""

self.tree = self.build_tree(data, target)

def predict(self, sample):

"""Classify a new sample using the trained decision tree."""

def classify(tree, sample):

if not isinstance(tree, dict):

return tree # Leaf node reached

root = next(iter(tree)) # Get root node

value = sample.get(root, None) # Get value for the root node

if value in tree[root]: # Traverse the tree

return classify(tree[root][value], sample)

else:

return "Unknown" # If value is not present in the tree

return classify(self.tree, sample)

# Sample dataset

data = pd.DataFrame({

'Age': ['Youth', 'Youth', 'Middle', 'Senior', 'Senior', 'Senior', 'Middle', 'Youth', 'Youth', 'Senior', 'Youth', 'Middle', 'Middle', 'Senior'],

'Income': ['High', 'High', 'High', 'Medium', 'Low', 'Low', 'Low', 'Medium', 'Low', 'Medium', 'Medium', 'Medium', 'High', 'Medium'],

'Student': ['No', 'No', 'No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No'],

'CreditRating': ['Fair', 'Excellent', 'Fair', 'Fair', 'Fair', 'Excellent', 'Excellent', 'Fair', 'Fair', 'Fair', 'Excellent', 'Excellent', 'Fair', 'Excellent'],

'BuysComputer': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No']

})

# Train the decision tree model

tree_model = ID3DecisionTree()

tree_model.fit(data, 'BuysComputer')

# Print the decision tree

print("Decision Tree:", tree_model.tree)

# Example: Classify a new sample

new_sample = {'Age': 'Youth', 'Income': 'Medium', 'Student': 'Yes', 'CreditRating': 'Fair'}

prediction = tree_model.predict(new_sample)

print(f"Prediction for {new_sample}: {prediction}")

NEELAMOHAN

Search This Blog