Quantitative Methods·Statistics

Section: Statistics and Descriptive Analysis

Estimated study time: 45 minutes

Content:

Descriptive statistics summarize the key characteristics of a dataset, providing the foundation for quantitative analysis in finance. The two primary dimensions are measures of central tendency (what is typical?) and measures of dispersion (how spread out is the data?). Measures of central tendency include the arithmetic mean, geometric mean, weighted mean, harmonic mean, median, and mode. For investment returns, the choice of mean matters significantly. The arithmetic mean is the simple average and is appropriate for estimating the expected return in a single future period. The geometric mean (G = [(1+R1)(1+R2)…(1+Rn)]^(1/n) – 1) is the compound annual growth rate and reflects the actual long-run wealth accumulation of a multi-period investment. When returns are volatile, the geometric mean is always less than or equal to the arithmetic mean.

Measures of dispersion quantify the spread around the central tendency. Variance is the average squared deviation from the mean; standard deviation is its square root, expressed in the same units as the original data. For a sample, variance uses (n–1) in the denominator (Bessel's correction) to produce an unbiased estimate. The coefficient of variation (CV = standard deviation / mean) normalizes dispersion, allowing comparison across assets with different means. A fund with a 15% mean return and 10% standard deviation has a CV of 0.67, while one with a 5% mean and 6% standard deviation has a CV of 1.20 — the second fund has more dispersion per unit of return. Range and mean absolute deviation (MAD) are simpler dispersion measures but less commonly used in finance.

Skewness and kurtosis describe the shape of a distribution. A symmetric distribution has zero skewness; a positively skewed distribution has a longer right tail with mean > median > mode; a negatively skewed distribution has a longer left tail with mean < median < mode. For investment returns, negative skewness is particularly dangerous because it means large losses occur more frequently than a symmetric distribution would predict — exactly what happened during 2008 and other crises. Kurtosis measures the weight of the tails relative to a normal distribution. Excess kurtosis (kurtosis – 3) measures this relative to the normal distribution's kurtosis of 3. Leptokurtic distributions (excess kurtosis > 0) have fatter tails and higher peaks, implying more frequent extreme events than a normal distribution would suggest.

The Chebyshev inequality provides a non-parametric result: for any distribution, at least (1 – 1/k^2) of all observations fall within k standard deviations of the mean. For k=2, at least 75% of observations lie within 2 standard deviations; for k=3, at least 89%. This is weaker than the normal distribution's 95.4% and 99.7% rules but applies universally. For a normal distribution, approximately 68% of observations fall within 1 standard deviation, 95.4% within 2, and 99.7% within 3. On the CFA exam, candidates frequently need to identify appropriate summary statistics for a given scenario — for example, when to use median versus mean (median is more appropriate for skewed distributions or when outliers are present), and when geometric versus arithmetic mean is appropriate.

Key Terms:

  • Arithmetic mean: The sum of all values divided by the number of values; appropriate for estimating expected return over a single future period.
  • Geometric mean: The compound annual growth rate; calculated as the nth root of the product of (1 + period returns); always less than or equal to the arithmetic mean.
  • Variance: The average of squared deviations from the mean; a measure of dispersion. Sample variance uses n–1 in the denominator.
  • Standard deviation: The square root of variance; expressed in the same units as the data and the most commonly used risk measure in finance.
  • Coefficient of variation (CV): Standard deviation divided by the mean; measures dispersion per unit of return; useful for comparing risk across assets with different expected returns.
  • Skewness: A measure of asymmetry in a distribution; positive skewness indicates a longer right tail, negative skewness a longer left tail.
  • Kurtosis: A measure of the weight of the tails of a distribution relative to the normal; excess kurtosis > 0 (leptokurtic) implies fatter tails and more frequent extreme outcomes.
  • Chebyshev inequality: States that for any distribution, at least (1 – 1/k^2) of observations lie within k standard deviations of the mean.

Quiz Questions:

Q1. A portfolio returned 20% in Year 1 and –10% in Year 2. What is the geometric mean annual return?

A) 5.0% B) 3.9% C) 4.4% D) 2.5%

Answer: B — Geometric mean = [(1.20)(0.90)]^(1/2) – 1 = [1.08]^0.5 – 1 = 1.0392 – 1 = 3.92% ≈ 3.9%. The arithmetic mean would be (20% – 10%)/2 = 5%, which overstates actual wealth accumulation. A $100 investment becomes $120 after Year 1 and $108 after Year 2 — the compound return is 3.92%, not 5%.

---

Q2. An analyst is examining a dataset of annual returns for a hedge fund. She notices that the mean return is –2% while the median return is +5%. This pattern is most consistent with:

A) A positively skewed distribution where a few large gains pull the mean above the median B) A negatively skewed distribution where a few large losses pull the mean below the median C) A symmetric distribution with a high standard deviation D) A leptokurtic distribution with excess kurtosis near zero

Answer: B — When mean < median, the distribution is negatively skewed. A few extreme negative returns (such as a major loss year) pull the arithmetic mean below the median, which is unaffected by outliers. For investment returns, negative skewness is especially harmful because the large losses implied by the left tail can be devastating to portfolios.

---

Q3. Portfolio A has a mean return of 12% and standard deviation of 8%. Portfolio B has a mean return of 6% and standard deviation of 5%. Which portfolio has MORE risk per unit of return?

A) Portfolio A, because its standard deviation is higher in absolute terms B) Portfolio B, because its coefficient of variation is higher C) Portfolio A, because its coefficient of variation is higher D) They have equal risk per unit of return

Answer: B — Coefficient of variation (CV) = standard deviation / mean. CV_A = 8% / 12% = 0.67; CV_B = 5% / 6% = 0.83. Portfolio B has more dispersion per unit of return despite having a lower absolute standard deviation. The CV is the appropriate metric for comparing risk across assets with different return levels.

---

Q4. An investor knows that a return series has an unknown distribution. Using Chebyshev's inequality, what is the minimum percentage of observations that must lie within 2 standard deviations of the mean?

A) 68% B) 75% C) 95% D) 89%

Answer: B — Chebyshev's inequality states that at least (1 – 1/k^2) of observations lie within k standard deviations. For k=2: 1 – 1/4 = 75%. This applies to any distribution. The 95% figure (Option C) applies specifically to the normal distribution and cannot be guaranteed for unknown distributions. Chebyshev's bound is conservative but universally applicable.

---

Q5. A bond return distribution is described as leptokurtic with excess kurtosis of 2.5. Compared to a normal distribution, this means:

A) The distribution has thinner tails and less frequent extreme returns B) The distribution is positively skewed with most returns above the mean C) The distribution has fatter tails, implying more frequent extreme returns than a normal distribution predicts D) The standard deviation is 2.5 times larger than a normal distribution's standard deviation

Answer: C — Excess kurtosis > 0 defines a leptokurtic distribution, which has fatter tails and a higher peak than a normal distribution. This means extreme returns (both positive and negative) occur more frequently than a normal distribution model would suggest — a critical consideration in risk management where tail events drive the worst outcomes. Option A describes a platykurtic distribution (excess kurtosis < 0).

---