Statistics Hypothesis Testing- A Complete Beginner's Guide

What Hypothesis Testing Actually Is

You have a claim. Hypothesis testing is the method you use to check if that claim holds up against real data. That's it. Nothing fancy.

You start with what you believe is true (the alternative hypothesis). Then you assume the opposite is true (the null hypothesis). You run statistical tests to see if your data gives you enough evidence to reject that null hypothesis.

If the evidence is strong enough, you reject the null. If not, you fail to reject it. You never "prove" your hypothesis—you just fail to disprove it.

Why This Matters

Every A/B test, every medical trial, every quality control check uses hypothesis testing. If you make decisions based on data, you need to understand this. Period.

Wrong hypothesis testing = wrong business decisions = lost money. That's the chain.

The Core Concepts You Must Know

Null Hypothesis (H₀)

This is your baseline assumption. It states there's no effect, no difference, no relationship. You're assuming the status quo is correct until proven otherwise.

Example: "This new drug has no effect on patients" or "Customer A and Customer B spend the same amount."

Alternative Hypothesis (H₁ or Hₐ)

This is what you want to prove. It states there IS an effect, a difference, or a relationship.

Example: "This new drug lowers blood pressure" or "Customer A spends more than Customer B."

P-Value

The p-value tells you the probability of getting your results IF the null hypothesis were true.

Low p-value = your data is unlikely under the null hypothesis = evidence against the null.

High p-value = your data could easily happen by chance = no evidence against the null.

Most people set their threshold at 0.05. That's arbitrary. It's a convention, not a law.

Significance Level (α)

This is your cutoff. If p-value ≤ α, you reject the null hypothesis.

Common choices: 0.05, 0.01, 0.10. Pick before you collect data. Don't change it after.

Type I and Type II Errors

You're going to make mistakes. Know what they are:

Type I Error (False Positive): You reject the null when it's actually true. You think you found an effect that doesn't exist. Probability = α.
Type II Error (False Negative): You fail to reject the null when it's actually false. You missed a real effect. Probability = β.

You can reduce both errors by increasing sample size. That's usually the practical solution.

Test Statistics

Your test produces a number (z-score, t-score, F-value, chi-square). You compare this to a critical value or convert it to a p-value. The test statistic depends on which test you're running.

Types of Hypothesis Tests

Which test you use depends on what you're comparing and your data type.

Test	Use When	Data Type
Z-Test	Comparing a mean to a known population mean	Continuous, large sample (n≥30)
T-Test	Comparing means when population std dev is unknown	Continuous, small or large sample
Chi-Square Test	Testing relationships between categorical variables	Categorical/frequency
ANOVA	Comparing means across 3+ groups	Continuous, 3+ groups
Correlation Test	Testing if two variables are related	Two continuous variables

One-Tailed vs Two-Tailed Tests

Two-tailed test: You're testing for any difference (greater OR less than). Use this when you don't have a directional hypothesis.

One-tailed test: You're testing for a difference in a specific direction only (greater OR less than, not both). Use this when you have a strong theoretical reason for expecting a specific direction.

One-tailed tests give you more power to detect an effect in your direction. But they're also easier to misuse. Most researchers stick with two-tailed tests.

How To Run a Hypothesis Test

Step 1: State Your Hypotheses

Write out H₀ and H₁ in plain English and mathematically. Be specific.

Step 2: Choose Your Significance Level

Set α before you collect data. 0.05 is standard. Write it down.

Step 3: Collect Data

Get your sample. Size matters—a small sample might miss real effects.

Step 4: Calculate Your Test Statistic

Run the numbers. Use software (R, Python, Excel, whatever works). Get your t-value, z-value, or whatever statistic your test requires.

Step 5: Find the P-Value or Critical Value

Compare your test statistic to the distribution. Get your p-value.

Step 6: Make Your Decision

If p-value ≤ α → Reject H₀. You have statistically significant results.

If p-value > α → Fail to reject H₀. You don't have enough evidence.

Step 7: Interpret in Context

What does this mean for your actual problem? Don't just report "p < 0.05." Explain what you found.

Common Mistakes Beginners Make

P-hacking: Running tests until you find p < 0.05, then stopping. This produces fake results. Pre-register your analysis.
Ignoring effect size: Statistical significance ≠ practical importance. A tiny effect can be "significant" with a large sample.
Confusing correlation with causation: A hypothesis test can show variables are related. Proving causation requires experimental design.
Using the wrong test: Don't use a t-test for categorical data. Don't use ANOVA for two groups. Match your test to your question.
Forgetting assumptions: Most tests assume normality, independence, or equal variances. Check them.

Practical Example

Your e-commerce site has a 3% conversion rate. You test a new checkout flow with 1,000 visitors. 42 convert (4.2%).

Step 1: H₀: Conversion rate ≤ 3%. H₁: Conversion rate > 3%.

Step 2: α = 0.05.

Step 3-4: Run a one-proportion z-test. You get p = 0.03.

Step 5: 0.03 < 0.05. Reject H₀.

Step 6: The new checkout flow significantly improves conversion. Deploy it.

That's hypothesis testing in action. No fluff.

What to Do When Results Are Significant

Don't just report the p-value. Tell people:

What you tested
What you found
What it means practically
What limitations exist

Statistical significance doesn't validate your business idea. It just tells you the effect is unlikely to be random noise. Use domain knowledge to decide if the result matters.

The Bottom Line

Hypothesis testing is a tool for making decisions under uncertainty. It won't tell you what's true—it tells you what your data supports.

Set your parameters before you start. Use the right test. Check your assumptions. Report honestly.

That's all there is to it.