Information Gain (IG) is a measure used in Decision Trees to quantify how much a feature reduces uncertainty (Entropy). about the target variable after splitting the data.
Mathematically, is the difference between the entropy before the split and the weighted entropy after the split. A high IG means the feature creates a split that separates the classes well, making the data more “organized” and easier to classify.
Where:
- = Information Gain of attribute A
- = Entropy of the original dataset S
- = Subset of data where attribute A has value v
- = Weight (proportion of subset size)
- = Entropy of each subset after splitting
Entropy of the dataset
Where:
- = number of classes
- = proportion of class in dataset
Measures the uncertainty / disorder in the whole dataset before splitting.
Probability of a class in the dataset
Subset Weight
Where:
- = number of samples in subset where attribute
- = total samples in original dataset
So:
Entropy of each subset
Entropy formula applied to the subset, where .