Significance Test for Proportions- Rules and Examples
What a Significance Test for Proportions Actually Is
🤔 You see a headline: "70% of voters support the new policy." Your gut says that number is off. But gut feelings don't hold up in court — or in statistics.
A significance test for proportions is the tool that lets you call BS on claims like that. It checks whether a sample proportion is far enough from a hypothesized value to mean something real, or if it's just random noise.
You're dealing with categorical data — yes/no, success/failure, click/don't click. One proportion or two. The test spits out a p-value that tells you how surprised you should be by your results.
Why Bother? The Brutal Truth
Sample proportions lie. Not on purpose, but random sampling always wiggles. A survey of 100 people might show 55% approval. Another 100 might show 48%. Which one is right?
Neither. They're both samples. The significance test measures whether the wiggle you see is bigger than the normal background noise. If it is, you've got evidence. If not, you've got nothing.
Businesses use this to A/B test conversion rates. Pollsters use it to fact-check claims. Medical researchers use it to compare treatment success rates. It's everywhere because proportions are everywhere.
The Rules You Can't Skip
Jump straight to the math and you'll crash. These conditions aren't suggestions — they're guardrails.
1. Random Sampling or Random Assignment
Your data must come from a random process. Convenience samples are garbage for inference. If you polled people at a mall on a Saturday, your results only tell you about Saturday mall people. Period.
2. The Success-Failure Condition
This is the big one. For the normal approximation to work, you need enough successes and failures.
- np ≥ 10 and n(1-p) ≥ 10 for one-sample tests
- Use sample proportions if the null hypothesis doesn't give you p
Fail this check and your z-scores are meaningless. No exceptions.
3. Independence (The 10% Rule)
Sample size must be less than 10% of the population. Sampling without replacement from a small group breaks independence. If you're sampling 50 students from a class of 100, don't use this test.
4. For Two-Sample Tests: Both Samples Must Behave
Each sample needs its own random selection. The groups must be independent of each other. And both samples need to pass their own success-failure check.
| Rule | One-Sample Test | Two-Sample Test |
|---|---|---|
| Randomization | One random sample | Two independent random samples |
| Success-Failure | np ≥ 10, n(1-p) ≥ 10 | Both samples check out |
| Independence | n < 10% of population | n₁ < 10%, n₂ < 10% |
| Null Hypothesis | H₀: p = p₀ | H₀: p₁ = p₂ |
One-Sample Test: A Real Example
A company claims 60% of customers are satisfied. You survey 200 random customers. Only 110 say they're satisfied. That's 55%. Is the company lying, or is this just sampling error?
Step 1: Check the rules. n = 200, p₀ = 0.60. Expected successes: 200 × 0.60 = 120. Expected failures: 80. Both ≥ 10. Good to go.
Step 2: Set up hypotheses.
- H₀: p = 0.60 (the company is telling the truth)
- Hₐ: p ≠ 0.60 (two-tailed test)
Step 3: Calculate the standard error.
SE = √[p₀(1-p₀)/n] = √[0.60 × 0.40 / 200] = √0.0012 ≈ 0.0346
Step 4: Find the z-score.
z = (p̂ - p₀) / SE = (0.55 - 0.60) / 0.0346 ≈ -1.45
Step 5: Get the p-value. For z = -1.45 in a two-tailed test, p ≈ 0.147.
At α = 0.05, p > 0.05. You fail to reject the null. The 55% result isn't weird enough to prove the company wrong. 📉 You don't have evidence of a lie — just a slightly unlucky sample.
Two-Sample Test: Comparing Two Groups
City A has a 45% recycling rate. City B claims they do better. You sample 300 households in City A: 135 recycle (45%). You sample 250 in City B: 130 recycle (52%). Is City B actually ahead?
Step 1: Check conditions. For City A: 135 successes, 165 failures. For City B: 130 successes, 120 failures. All ≥ 10. ✅
Step 2: Hypotheses.
- H₀: p₁ = p₂ (no difference)
- Hₐ: p₁ < p₂ (one-tailed: City B is higher)
Step 3: Pooled proportion (since we assume they're equal under H₀).
p̂_pool = (135 + 130) / (300 + 250) = 265 / 550 ≈ 0.482
Step 4: Standard error for the difference.
SE = √[p̂_pool(1-p̂_pool) × (1/n₁ + 1/n₂)]
SE = √[0.482 × 0.518 × (1/300 + 1/250)] ≈ √0.00183 ≈ 0.0428
Step 5: z-score.
z = (0.52 - 0.45) / 0.0428 ≈ 1.64
Step 6: p-value for one-tailed test: ≈ 0.051.
At α = 0.05, p = 0.051 is just barely over the line. You fail to reject. Sorry — City B might be better, but you don't have solid proof. That 7 percentage point gap could still be chance. 🎲
How to Run This Test: A No-Nonsense Guide
Stop overthinking. Here's the exact sequence.
1. State your hypotheses in plain English first. What are you actually testing? Write it down. Then translate to math symbols.
2. Verify the three conditions. Random? Success-failure? Independence? If any fail, stop. Use a different method or collect better data.
3. Calculate the standard error. Use p₀ for one-sample tests. Use the pooled proportion for two-sample tests.
4. Compute the z-score. (Sample stat - null value) / SE. Don't mix up proportions and percentages. 0.55, not 55.
5. Find the p-value. Use a z-table, calculator, or software. Match it to your alternative hypothesis — one-tailed or two-tailed.
6. Compare to alpha and decide. p < α? Reject H₀. p ≥ α? Fail to reject. Never say "accept the null." You just didn't find enough evidence against it.
Where People Screw This Up
- Using the sample proportion p̂ instead of p₀ in the SE formula for one-sample tests. Wrong. The null hypothesis gives you the SE.
- Running a two-tailed test when they only care about one direction. You're diluting your power and making life harder.
- Forgetting the pooled proportion in two-sample tests. The math only works if you assume the groups are equal under the null.
- Declaring "no difference" when they fail to reject. You found no evidence. That's not the same as proving equality.
- Ignoring the success-failure condition with small samples. Your "z-score" is fiction if np < 10.
FAQ: Short Answers to Annoying Questions
What's the difference between a confidence interval and a significance test?
A test gives a yes/no decision about a specific claim. A confidence interval gives a range of plausible values. They're related but not the same thing.
Can I use this for sample sizes under 30?
You can if you pass the success-failure condition. The "n ≥ 30" rule is for means, not proportions. Proportions use the normal approximation based on expected counts.
What if my p-value is exactly 0.05?
Technically, you reject at α = 0.05. In practice, that's a borderline result. Don't bet your career on it. Report the exact p-value and let people decide.
Why "fail to reject" instead of "accept"?
Because absence of evidence isn't evidence of absence. Your test might have missed a real difference due to small sample size or bad luck.
Can I do this in Excel?
Yes. Use =1-NORM.S.DIST(z,TRUE) for one-tailed p-values. Double it for two-tailed. Or use any stats software — R, Python, graphing calculators. The math is the same everywhere.