Chapter 7 Sampling Distributions- Complete Summary
What Is a Sampling Distribution?
A sampling distribution is the probability distribution of a statistic—like the sample mean or sample proportion—obtained from repeated random samples of the same size drawn from a population. It's not about one sample. It's about what happens when you take many samples and look at the pattern of their results.
Most students crash here because they confuse the sample with the sampling distribution. Your single dataset is useless for understanding this concept. You need to imagine pulling thousands of samples and watching what their statistics do.
The sampling distribution tells you how much your statistic would vary if you repeated the sampling process infinitely. That variation is what statisticians actually care about.
Why Sampling Distributions Matter
Without this concept, you can't do inference. Period. Every confidence interval, every hypothesis test, every p-value you calculate depends on understanding how sample statistics behave under repeated sampling.
Here is what you actually need to know:
- The mean of the sampling distribution equals the population parameter you're estimating
- The spread of the sampling distribution tells you about sampling variability
- As sample size increases, the spread gets smaller
- The shape of the sampling distribution often becomes normal under certain conditions
That's it. Everything else in Chapter 7 flows from these four facts.
The Sampling Distribution of the Sample Mean
When you collect a simple random sample of size n from a population with mean μ and standard deviation σ, the sampling distribution of x̄ (the sample mean) has these properties:
Mean and Standard Deviation
The mean of the sampling distribution is μ. This is called unbiasedness—on average, your sample mean hits the true population mean.
The standard deviation of the sampling distribution (called the standard error) is:
σx̄ = σ / √n
Notice the square root in the denominator. This means if you want to cut your sampling error in half, you need four times the sample size. Doubling the sample doesn't halve the error. It only reduces it by about 30%.
The Central Limit Theorem
Here is the big one. The Central Limit Theorem (CLT) states that for sufficiently large sample sizes, the sampling distribution of x̄ is approximately normal, regardless of the population's shape.
How large is "sufficiently large"? Usually n ≥ 30 works as a rule of thumb. But if the population is heavily skewed or has outliers, you need more. If the population is already normal, the sampling distribution is normal for any n.
What the CLT does not say: your sample data must be normal. It says the sampling distribution of the statistic becomes normal. These are completely different things.
The Sampling Distribution of the Sample Proportion
When your data is categorical—yes/no, heads/tails, success/failure—the statistic you care about is the sample proportion (p̂).
The sampling distribution of p̂ has:
- Mean = p (the population proportion)
- Standard error = √(p(1-p)/n)
The CLT for proportions requires that both np ≥ 10 and n(1-p) ≥ 10 before you can treat the sampling distribution as approximately normal.
Why 10? It's an arbitrary cutoff that works in practice. Don't overthink it.
Comparing Sampling Distributions: A Practical Overview
| Statistic | Mean of Distribution | Standard Error | CLT Conditions |
|---|---|---|---|
| Sample mean (x̄) | μ | σ/√n | n ≥ 30 (or population normal) |
| Sample proportion (p̂) | p | √(p(1-p)/n) | np ≥ 10 and n(1-p) ≥ 10 |
| Sample variance (s²) | σ² | Complex formula | n large; used for chi-square tests |
Common Misconceptions That Will Cost You
Wrong: "My sample is normal, so the CLT kicks in."
Right: The CLT applies to the sampling distribution of the statistic, not your raw data.
Wrong: "A bigger sample always gives a more accurate estimate."
Right: Accuracy improves with √n, not n. Diminishing returns hit fast.
Wrong: "The standard deviation of my sample estimates the standard error."
Right: The standard error describes the distribution of sample statistics across many samples. Your single sample standard deviation estimates the population standard deviation.
How to Calculate Probabilities Using Sampling Distributions
Here's the practical process you need to memorize:
- Verify conditions — Check CLT conditions (sample size, np/n(1-p) thresholds)
- Identify parameters — Find the mean and standard error of the sampling distribution
- Standardize — Convert to a Z-score: Z = (x̄ - μ) / (σ/√n) or Z = (p̂ - p) / √(p(1-p)/n)
- Use the standard normal table — Find your probability from Z
Example: SAT scores have μ = 1050 and σ = 200. If you sample 50 students, what's the probability the sample mean exceeds 1100?
Standard error = 200/√50 = 28.3
Z = (1100 - 1050) / 28.3 = 1.77
P(Z > 1.77) = 0.0384, or about 3.8%
The Law of Large Numbers
Closely related to sampling distributions is the Law of Large Numbers: as sample size increases, the sample mean gets closer to the population mean.
This is not the same as the CLT. The CLT describes the shape and spread of the sampling distribution. The Law of Large Numbers describes how the sample mean converges to the population mean as you collect more data.
Students mix these up constantly. The CLT is about the distribution of sample means across many samples of fixed size. The Law of Large Numbers is about what happens to a single sample mean as you add more observations to it.
What Comes Next
Once you have sampling distributions locked down, you move directly into confidence intervals and hypothesis testing. Every formula in those chapters is just a sampling distribution with algebra applied to it.
If the CLT isn't second nature by now, go back and practice problems until it is. You cannot skip this foundation and expect to survive the rest of the course.