Branch of statistics concerned with summarizing, organizing and describing a Dataset.
It answers questions like:
- What is typical?
- How spread out is the data?
- Are there extreme values?
- How is the data distributed?
- Are there patterns or irregularities?
Importantly, descriptive statistics do not attempt to make predictions or draw conclusions about a larger population. They simple describe the data you have.
For example:
“The average daily sales were 1,500 and $3,000.”
This is descriptive statistics.
Main Categories
- Measures of Central Tendency
- Measures of Dispersion (Variability)
- Measures of Position
- Measures of Shape
- Measures of Frequency and Distribution
Measures of Central Tendency
These describe the center or “typical” value of the data.
Mean
The arithmetic average of all observations.
Usefulness
Provides a single value representing the dataset.
Use Cases
- Average sales
- Average salary
- Average response time
- Average test score
Limitations
Highly sensitive to outliers: 100, 150, 10000 → Mean = 3,416.67 → Not representative
Median
The middle value after sorting the data.
Usefulness
Represents the “typical” value when data contains outliers.
Use Cases
- Income distributions
- Housing prices
- Daily sales with occasional spikes
Advantages
Very robust against extreme values.
Mode
The most frequently occurring value.
Usefulness
Identifies the most common observation
Use Cases
- Most purchased product
- Most common shoe size
- Most common customer segment
- Most frequent error code
Limitations
May have:
- No mode
- One mode
- Multiple modes