CFA Level I·Quantitative Methods·Hypothesis Testing

Section: Hypothesis Testing and Statistical Inference

Estimated study time: 45 minutes

Content:

Hypothesis testing is the formal framework for using sample data to make inferences about a population. In investment analysis, it is used to determine whether a strategy generates statistically significant alpha, whether two portfolio means differ, or whether a regression coefficient is meaningfully different from zero. The process begins with stating two competing hypotheses: the null hypothesis (H0) is the presumption of no effect (e.g., mean return = 0), and the alternative hypothesis (Ha) is what the analyst seeks to establish (e.g., mean return > 0). The alternative can be one-tailed (directional: > or <) or two-tailed (non-directional: ≠). A one-tailed test is used when theory or prior evidence strongly supports a specific direction; a two-tailed test is more conservative and appropriate when the direction is uncertain.

The test statistic is calculated from sample data and compared to a critical value from the appropriate distribution to decide whether to reject H0. For tests of population means with known variance, the z-statistic is used: z = (X_bar – μ0) / (σ / √n). For unknown variance (the common case), the t-statistic with n–1 degrees of freedom is used: t = (X_bar – μ0) / (s / √n). The p-value is the probability of obtaining a test statistic at least as extreme as the one observed, assuming H0 is true. If p-value < significance level (α), we reject H0. Commonly used significance levels are 1%, 5%, and 10%, corresponding to 99%, 95%, and 90% confidence levels. The significance level α is the probability of making a Type I error — rejecting H0 when it is actually true (a false positive).

Two types of errors arise in hypothesis testing. A Type I error (false positive) occurs when H0 is rejected even though it is true — the probability is α, the chosen significance level. A Type II error (false negative) occurs when H0 is not rejected even though it is false — the probability is β. The power of a test (1 – β) is the probability of correctly rejecting a false H0. There is a fundamental tradeoff: reducing α (making rejection harder) increases β (making it easier to miss a true effect). In practice, investment researchers often struggle with this tradeoff when evaluating trading strategies — setting α too low may cause them to miss genuine signals, while setting it too high leads to spurious results.

Chi-square tests and F-tests extend hypothesis testing to variances and ratios. A chi-square test with n–1 degrees of freedom tests whether a sample variance equals a hypothesized population variance: χ^2 = (n–1)s^2 / σ0^2. An F-test tests the equality of two population variances: F = s1^2 / s2^2, where s1^2 is the larger variance (placed in the numerator). F-tests are also the basis of ANOVA and regression significance testing. For the CFA Level 1 exam, candidates should also understand the concept of a confidence interval: a range of values constructed from sample data within which the true population parameter is expected to fall with a specified probability. A 95% confidence interval for the mean is: X_bar ± t_critical × (s / √n).

Key Terms:

Null hypothesis (H0): The hypothesis of no effect or no difference that is presumed true until statistical evidence suggests otherwise; the "status quo" assumption.
Alternative hypothesis (Ha): The hypothesis that the analyst seeks to establish; accepted when H0 is rejected based on sample evidence.
Type I error: Rejecting H0 when it is actually true (false positive); the probability of Type I error equals the significance level α.
Type II error: Failing to reject H0 when it is actually false (false negative); probability equals β; reduces test power (1 – β).
p-value: The probability of observing a test statistic at least as extreme as the computed value, assuming H0 is true; reject H0 if p-value < α.
t-statistic: The test statistic used when population variance is unknown, following a t-distribution with n–1 degrees of freedom.
Confidence interval: A range constructed from sample data that contains the true population parameter with a specified probability (e.g., 95%).
Chi-square test: A test for whether a sample variance equals a hypothesized population variance; uses the chi-square distribution with n–1 degrees of freedom.
F-test: A test for the equality of two population variances; the test statistic is the ratio of sample variances.
Power of a test: The probability of correctly rejecting a false null hypothesis; equals 1 – β; increases with larger sample size and higher effect size.

Quiz Questions:

Q1. An analyst tests whether a fund manager's mean monthly return differs from zero. The null hypothesis is H0: μ = 0 and the alternative is Ha: μ ≠ 0. Using a sample of 36 monthly returns, the analyst computes a t-statistic of 2.10. At the 5% significance level with 35 degrees of freedom, the two-tailed critical value is approximately 2.03. What conclusion should the analyst draw?

A) Fail to reject H0; there is insufficient evidence that the mean return differs from zero B) Reject H0; there is sufficient evidence at the 5% level that the mean return differs from zero C) Reject H0 at the 1% significance level, so the manager has proven skill D) Fail to reject H0 because a two-tailed test is more conservative than a one-tailed test

Answer: B — The computed t-statistic (2.10) exceeds the critical value (2.03), so we reject H0 at the 5% significance level. This means we have statistically significant evidence that the mean return differs from zero. Option C is wrong because the question states the 5% level, and we cannot conclude the 1% level without checking that critical value. Rejecting H0 indicates statistical significance but not necessarily economic significance.

---

Q2. A portfolio manager claims her strategy generates positive alpha. An analyst tests this with H0: α = 0 versus Ha: α > 0 using monthly data over 5 years. The analyst sets α = 0.05 for the significance level. The probability of a Type I error in this test is:

A) 50% B) 5% C) 95% D) Cannot be determined without sample data

Answer: B — The probability of a Type I error equals the chosen significance level α = 5%. This means that if the true alpha is actually zero (H0 is true), there is a 5% chance the test will incorrectly reject H0 and conclude the manager has skill. The significance level is set by the researcher before observing the data — it is not calculated from the sample.

---

Q3. An analyst is comparing two portfolio managers' variances to determine if their risk profiles differ. Manager A's return series has a sample variance of 0.0144 and Manager B's has 0.0096, each over 25 observations. The F-statistic for this test is:

A) 0.67 B) 1.50 C) 1.00 D) 2.25

Answer: B — The F-statistic = larger variance / smaller variance = 0.0144 / 0.0096 = 1.50. The convention is to place the larger variance in the numerator, which ensures the F-statistic is always ≥ 1 for the one-tailed version. The significance of this ratio is then compared to F-critical values based on degrees of freedom for each sample (24 and 24 in this case).

---

Q4. A sample of 25 monthly returns has a mean of 1.5% and a standard deviation of 4%. Construct a 95% confidence interval for the true mean monthly return. The t-critical value for 24 degrees of freedom at 95% confidence is approximately 2.064.

A) [–0.15%, 3.15%] B) [0.00%, 3.00%] C) [–1.80%, 4.80%] D) [–0.15%, 3.15%]

Answer: A — Standard error = s / √n = 4% / √25 = 4% / 5 = 0.80%. Confidence interval = 1.5% ± 2.064 × 0.80% = 1.5% ± 1.651% ≈ [–0.15%, 3.15%]. The interval includes zero, suggesting the true mean may not differ significantly from zero at the 5% level. This is consistent with a two-tailed t-test failing to reject H0: μ = 0 at the 5% level.

---

Q5. A researcher computes a p-value of 0.03 for a test of whether a trading strategy's mean return equals zero. Which interpretation is CORRECT?

A) There is a 3% probability that the trading strategy has a true mean return of zero B) There is a 3% probability of observing results this extreme or more extreme if the true mean return is zero C) The strategy has a 97% probability of generating positive returns going forward D) The null hypothesis is rejected at the 1% significance level

Answer: B — The p-value is the probability of observing a test statistic at least as extreme as the computed value, assuming H0 is true. A p-value of 0.03 means that if the true mean were zero, there is only a 3% chance of seeing sample results this extreme — so we reject H0 at the 5% level (but not at the 1% level, making Option D wrong). The p-value is NOT the probability that H0 is true (Option A) or a forecast probability (Option C).

---

Quantitative Methods

Hypothesis Testing