Branch of statistics concerned with summarizing, organizing and describing a Dataset.

It answers questions like:

  • What is typical?
  • How spread out is the data?
  • Are there extreme values?
  • How is the data distributed?
  • Are there patterns or irregularities?

Importantly, descriptive statistics do not attempt to make predictions or draw conclusions about a larger population. They simple describe the data you have.

For example:

“The average daily sales were 1,500 and $3,000.”

This is descriptive statistics.

Main Categories

  • Measures of Central Tendency
  • Measures of Dispersion (Variability)
  • Measures of Position
  • Measures of Shape
  • Measures of Frequency and Distribution

Measures of Central Tendency

These describe the center or “typical” value of the data.

Mean

The arithmetic average of all observations.

Usefulness

Provides a single value representing the dataset.

Use Cases
  • Average sales
  • Average salary
  • Average response time
  • Average test score
Limitations

Highly sensitive to outliers: 100, 150, 10000 Mean = 3,416.67 Not representative

Median

The middle value after sorting the data.

Usefulness

Represents the “typical” value when data contains outliers.

Use Cases
  • Income distributions
  • Housing prices
  • Daily sales with occasional spikes
Advantages

Very robust against extreme values.

Mode

The most frequently occurring value.

Usefulness

Identifies the most common observation

Use Cases
  • Most purchased product
  • Most common shoe size
  • Most common customer segment
  • Most frequent error code
Limitations

May have:

  • No mode
  • One mode
  • Multiple modes