Information Gain

Information Gain (IG) is a measure used in Decision Trees to quantify how much a feature reduces uncertainty (Entropy). about the target variable after splitting the data.

Mathematically, is the difference between the entropy before the split and the weighted entropy after the split. A high IG means the feature creates a split that separates the classes well, making the data more “organized” and easier to classify.

I G (S, A) = H (S) - v \in Va l u es (A) \sum \frac{∣ S _{v} ∣}{∣ S ∣} H (S_{v})

Where:

$I G (S, A)$ = Information Gain of attribute A
$H (S)$ = Entropy of the original dataset S
$S_{v}$ = Subset of data where attribute A has value v
$\frac{∣ S _{v} ∣}{∣ S ∣}$ = Weight (proportion of subset size)
$H (S_{V})$ = Entropy of each subset after splitting

Entropy of the dataset

H (S) = - i = 1 \sum c p_{i} lo g_{2} (p_{i})

Where:

$c$ = number of classes
$p_{i}$ = proportion of class $i$ in dataset $S$

Measures the uncertainty / disorder in the whole dataset before splitting.

Probability of a class in the dataset

p_{i} = \frac{number of samples in class i}{total samples in S}

Subset Weight

\frac{∣ S _{v} ∣}{∣ S ∣}

Where:

$∣ S_{v} ∣$ = number of samples in subset where attribute $A = v$
$∣ S ∣$ = total samples in original dataset

So: $∣ S_{v} ∣ = count of rows where A = v$ $∣ S ∣ = total number of rows$

Entropy of each subset

Entropy formula applied to the subset, where $p_{i, v} = samples of class in the subset S_{v}$ .

H (S_{v}) = - i = 1 \sum c p_{i, v} lo g_{2} (p_{i, v})

p_{i, v} = \frac{samples of class i in subset S _{v}}{∣ S _{v} ∣}

Equinox

Explorer

Information Gain

Entropy of the dataset

Probability of a class in the dataset

Subset Weight

Entropy of each subset

Graph View

Table of Contents

Backlinks