Descriptive statistics involve summarizing and organizing data so it can be easily understood. Descriptive statistics are typically broken down into two categories:
- Measures of Central Tendency: These describe the center or typical value of a dataset. Common measures include:
- Mean: The average of all data points.
- Median: The middle value when the data is ordered from least to greatest.
- Mode: The most frequently occurring value(s) in a dataset.
- Measures of Dispersion (or Variability): These describe the spread of the data. Common measures include:
- Range: The difference between the maximum and minimum values.
- Variance: The average of the squared differences from the mean.
- Standard Deviation: The square root of the variance, indicating the average amount each data point differs from the mean.
- Interquartile Range (IQR): The range within the middle 50% of the data, calculated as the difference between the 75th and 25th percentiles.
- Other Descriptive Statistics:
- Skewness: A measure of the asymmetry of the distribution of values.
- Kurtosis: A measure of the “tailedness” of the distribution of values.
- Percentiles: Values below which a certain percentage of data points in a dataset fall.
Example
Let’s say we have a dataset of exam scores: [55, 63, 77, 85, 88, 92, 94, 97, 99, 100].
- Central Tendency:
- Mean: (55 + 63 + 77 + 85 + 88 + 92 + 94 + 97 + 99 + 100) / 10 = 85
- Median: (88 + 92) / 2 = 90
- Mode: No mode, as all values are unique.
- Dispersion:
- Range: 100 – 55 = 45
- Variance:
- First, find the mean (85).
- Calculate each data point’s deviation from the mean, square it, and find the average of those squared deviations.
- Variance = [(55-85)² + (63-85)² + (77-85)² + (85-85)² + (88-85)² + (92-85)² + (94-85)² + (97-85)² + (99-85)² + (100-85)²] / 10
- Variance = 184.4
- Standard Deviation: √184.4 ≈ 13.58
- IQR:
- Q1 (25th percentile) = 77
- Q3 (75th percentile) = 97
- IQR = 97 – 77 = 20
- Other Statistics:
- Skewness: Since the mean is less than the median, the distribution is slightly left-skewed.
- Kurtosis: Calculated as part of a more complex formula, typically requiring statistical software for exact value.
These statistics provide a comprehensive summary of the dataset’s characteristics, giving a clear picture of its central tendency, spread, and overall distribution shape.