Information Gain (IG) is a measure used in Decision Trees to quantify how much a feature reduces uncertainty (Entropy). about the target variable after splitting the data.

Mathematically, is the difference between the entropy before the split and the weighted entropy after the split. A high IG means the feature creates a split that separates the classes well, making the data more “organized” and easier to classify.

Where:

  • = Information Gain of attribute A
  • = Entropy of the original dataset S
  • = Subset of data where attribute A has value v
  • = Weight (proportion of subset size)
  • = Entropy of each subset after splitting

Entropy of the dataset

Where:

  • = number of classes
  • = proportion of class in dataset

Measures the uncertainty / disorder in the whole dataset before splitting.

Probability of a class in the dataset

Subset Weight

Where:

  • = number of samples in subset where attribute
  • = total samples in original dataset

So:

Entropy of each subset

Entropy formula applied to the subset, where .