Statistics — Math: Data & Problem Solving | SAT Prep

SAT Math — Statistics and Data Analysis

Measures of Center

Mean (Average): Sum of all values ÷ number of values > {3, 7, 9, 11, 15} → Mean = (3 + 7 + 9 + 11 + 15)/5 = 45/5 = 9

Median: The middle value when data is ordered. If there's an even number of values, average the two middle ones. > {3, 7, 9, 11, 15} → Median = 9 (middle value) > {3, 7, 9, 11} → Median = (7 + 9)/2 = 8

Mode: The most frequently occurring value.

When to use which:

Mean is affected by outliers (extreme values)
Median is better for skewed distributions or when outliers are present
Use median for home prices, income distributions, etc.

Spread: Range and Standard Deviation

Range: Maximum − Minimum > {3, 7, 9, 11, 15} → Range = 15 − 3 = 12

Standard deviation: Measures how spread out values are from the mean. The SAT tests conceptual understanding, not calculation.

Small standard deviation: Data is clustered close to the mean
Large standard deviation: Data is spread far from the mean

Sampling and Inference

The SAT tests your ability to evaluate the validity of statistical conclusions.

Representative sample: A sample that reflects the population's characteristics. Random sampling produces representative samples.

Margin of error: The uncertainty range around a sample estimate. A survey result of "60% ± 3%" means the true value is likely between 57% and 63%.

Bias: When a sample systematically misrepresents the population.

Voluntary response bias: People with strong opinions are more likely to respond
Convenience sampling: Only surveying people who are easy to reach
Undercoverage: Missing segments of the population in the sample

Inference principle: You can generalize results to the population only if the sample was randomly selected from that population.

Interpreting Data

The SAT often presents tables or stats and asks about the meaning.

Example: A class of 30 students took a test. The average score was 74. 5 students scored above 90. What does this tell you?

The mean is 74, but we don't know if the distribution is symmetric
We know 5/30 (about 17%) scored above 90
We can't conclude individual scores without more data

Real-world example: A school surveys 100 randomly selected students and finds 72% prefer a later school start time. Can you conclude that the majority of all students in the school prefer later start times? Yes — because the sample was randomly selected, you can generalize to the larger population (with some margin of error).

---

Key Terms

Mean: Average; sum ÷ count; sensitive to outliers
Median: Middle value in ordered data; resistant to outliers
Mode: Most frequent value
Range: Max − Min; measures spread
Standard deviation: Measure of how spread out data is from the mean
Outlier: An extreme value that differs significantly from other values
Random sample: A sample where every member of the population has an equal chance of selection
Margin of error: The range of uncertainty around a sample estimate
Representative sample: A sample that accurately reflects the population
Bias: Systematic error in sampling that skews results away from the true population value

---

Quiz Questions:

Q1. Five test scores are: 82, 90, 74, 96, 88. What is the mean score?

A) 88 B) 86 C) 84 D) 90

Answer: B — Sum = 82 + 90 + 74 + 96 + 88 = 430. Mean = 430/5 = 86.

---

Q2. A real estate agent reports the average (mean) home sale price in a neighborhood as $650,000. One sale was an unusually expensive mansion at $3,000,000. A buyer asks if this average reflects typical prices. What should the agent say?

A) Yes, the mean accurately represents typical prices in all situations B) The mean is heavily influenced by the outlier (the mansion), so the median would better represent the typical home price C) The mean is only valid for populations over 100 homes D) The mean is accurate because it includes all data points

Answer: B — The mean is pulled upward by extremely high values (outliers). For home prices with extreme outliers, the median is a better measure of the "typical" price.

---

Q3. Two datasets have the same mean. Dataset A has a standard deviation of 2; Dataset B has a standard deviation of 15. Which statement is correct?

A) Dataset A and Dataset B have identical distributions B) Dataset B has values more spread out from the mean than Dataset A C) Dataset A has values more spread out from the mean than Dataset B D) Standard deviation cannot be compared across different datasets

Answer: B — A larger standard deviation means greater spread from the mean. Dataset B (SD = 15) has values spread much further from the mean than Dataset A (SD = 2). Both have the same center (mean) but different spreads.

---

Q4. A researcher surveys 50 randomly selected customers from a store's database of 10,000 customers. 64% say they are satisfied. Which conclusion is best supported?

A) Exactly 6,400 customers are satisfied B) It is likely that approximately 64% of all 10,000 customers are satisfied, with some margin of error C) The survey result is invalid because the sample size is too small D) The conclusion only applies to the 50 surveyed customers

Answer: B — Because the sample was randomly selected, you can generalize to the population — but with some uncertainty (margin of error). Choice A claims exact precision the sample doesn't provide. Choice C is wrong — samples don't need to be huge if they're random. Choice D ignores the purpose of statistical inference.

---

Q5. A school surveys only students who volunteer to participate in a study about phone use. The results show 90% use their phones over 4 hours daily. What is a concern about this study?

A) The sample size might be too large B) Voluntary response bias — students who volunteer for phone surveys may be more heavy users, making the sample unrepresentative of all students C) The result proves that 90% of all students use phones over 4 hours daily D) The survey is valid because voluntary participation ensures honest answers

Answer: B — Voluntary response sampling overrepresents people with strong opinions or a stake in the topic (here, heavy phone users might be more likely to participate). This is voluntary response bias, and the results cannot be generalized to all students without this caveat.