Statistics Hypothesis Testing- A Complete Beginner's Guide
What Hypothesis Testing Actually Is
You have a claim. Hypothesis testing is the method you use to check if that claim holds up against real data. That's it. Nothing fancy.
You start with what you believe is true (the alternative hypothesis). Then you assume the opposite is true (the null hypothesis). You run statistical tests to see if your data gives you enough evidence to reject that null hypothesis.
If the evidence is strong enough, you reject the null. If not, you fail to reject it. You never "prove" your hypothesis—you just fail to disprove it.
Why This Matters
Every A/B test, every medical trial, every quality control check uses hypothesis testing. If you make decisions based on data, you need to understand this. Period.
Wrong hypothesis testing = wrong business decisions = lost money. That's the chain.
The Core Concepts You Must Know
Null Hypothesis (H₀)
This is your baseline assumption. It states there's no effect, no difference, no relationship. You're assuming the status quo is correct until proven otherwise.
Example: "This new drug has no effect on patients" or "Customer A and Customer B spend the same amount."
Alternative Hypothesis (H₁ or Hₐ)
This is what you want to prove. It states there IS an effect, a difference, or a relationship.
Example: "This new drug lowers blood pressure" or "Customer A spends more than Customer B."
P-Value
The p-value tells you the probability of getting your results IF the null hypothesis were true.
Low p-value = your data is unlikely under the null hypothesis = evidence against the null.
High p-value = your data could easily happen by chance = no evidence against the null.
Most people set their threshold at 0.05. That's arbitrary. It's a convention, not a law.
Significance Level (α)
This is your cutoff. If p-value ≤ α, you reject the null hypothesis.
Common choices: 0.05, 0.01, 0.10. Pick before you collect data. Don't change it after.
Type I and Type II Errors
You're going to make mistakes. Know what they are:
- Type I Error (False Positive): You reject the null when it's actually true. You think you found an effect that doesn't exist. Probability = α.
- Type II Error (False Negative): You fail to reject the null when it's actually false. You missed a real effect. Probability = β.
You can reduce both errors by increasing sample size. That's usually the practical solution.
Test Statistics
Your test produces a number (z-score, t-score, F-value, chi-square). You compare this to a critical value or convert it to a p-value. The test statistic depends on which test you're running.
Types of Hypothesis Tests
Which test you use depends on what you're comparing and your data type.
| Test | Use When | Data Type |
|---|---|---|
| Z-Test | Comparing a mean to a known population mean | Continuous, large sample (n≥30) |
| T-Test | Comparing means when population std dev is unknown | Continuous, small or large sample |
| Chi-Square Test | Testing relationships between categorical variables | Categorical/frequency |
| ANOVA | Comparing means across 3+ groups | Continuous, 3+ groups |
| Correlation Test | Testing if two variables are related | Two continuous variables |
One-Tailed vs Two-Tailed Tests
Two-tailed test: You're testing for any difference (greater OR less than). Use this when you don't have a directional hypothesis.
One-tailed test: You're testing for a difference in a specific direction only (greater OR less than, not both). Use this when you have a strong theoretical reason for expecting a specific direction.
One-tailed tests give you more power to detect an effect in your direction. But they're also easier to misuse. Most researchers stick with two-tailed tests.
How To Run a Hypothesis Test
Step 1: State Your Hypotheses
Write out H₀ and H₁ in plain English and mathematically. Be specific.
Step 2: Choose Your Significance Level
Set α before you collect data. 0.05 is standard. Write it down.
Step 3: Collect Data
Get your sample. Size matters—a small sample might miss real effects.
Step 4: Calculate Your Test Statistic
Run the numbers. Use software (R, Python, Excel, whatever works). Get your t-value, z-value, or whatever statistic your test requires.
Step 5: Find the P-Value or Critical Value
Compare your test statistic to the distribution. Get your p-value.
Step 6: Make Your Decision
If p-value ≤ α → Reject H₀. You have statistically significant results.
If p-value > α → Fail to reject H₀. You don't have enough evidence.
Step 7: Interpret in Context
What does this mean for your actual problem? Don't just report "p < 0.05." Explain what you found.
Common Mistakes Beginners Make
- P-hacking: Running tests until you find p < 0.05, then stopping. This produces fake results. Pre-register your analysis.
- Ignoring effect size: Statistical significance ≠ practical importance. A tiny effect can be "significant" with a large sample.
- Confusing correlation with causation: A hypothesis test can show variables are related. Proving causation requires experimental design.
- Using the wrong test: Don't use a t-test for categorical data. Don't use ANOVA for two groups. Match your test to your question.
- Forgetting assumptions: Most tests assume normality, independence, or equal variances. Check them.
Practical Example
Your e-commerce site has a 3% conversion rate. You test a new checkout flow with 1,000 visitors. 42 convert (4.2%).
Step 1: H₀: Conversion rate ≤ 3%. H₁: Conversion rate > 3%.
Step 2: α = 0.05.
Step 3-4: Run a one-proportion z-test. You get p = 0.03.
Step 5: 0.03 < 0.05. Reject H₀.
Step 6: The new checkout flow significantly improves conversion. Deploy it.
That's hypothesis testing in action. No fluff.
What to Do When Results Are Significant
Don't just report the p-value. Tell people:
- What you tested
- What you found
- What it means practically
- What limitations exist
Statistical significance doesn't validate your business idea. It just tells you the effect is unlikely to be random noise. Use domain knowledge to decide if the result matters.
The Bottom Line
Hypothesis testing is a tool for making decisions under uncertainty. It won't tell you what's true—it tells you what your data supports.
Set your parameters before you start. Use the right test. Check your assumptions. Report honestly.
That's all there is to it.