Hypothesis Testing Example- Step-by-Step Statistical Guide

What Hypothesis Testing Actually Is

Hypothesis testing is a statistical method for making decisions about a population based on sample data. You start with an assumption, collect evidence, and then either reject or fail to reject that assumption based on probability rules.

That's it. No magic. No interpretation required beyond the numbers.

The assumption you start with is called the null hypothesis (H₀). The alternative you're testing against is the alternative hypothesis (H₁ or Ha). You assume H₀ is true until the data gives you strong enough evidence to踢它一脚.

The Core Logic: Proof by Contradiction

Statistics works like a courtroom. You assume innocence until proven guilty. The null hypothesis is innocent. The data is your evidence. If the evidence is strong enough, you reject innocence.

If the evidence isn't strong enough, you fail to reject the null. You don't accept it as true. You just didn't have enough proof to throw it out.

People mess this up constantly. You never "accept" the null hypothesis. You either reject it or you don't.

Key Terms You Need to Know

Step-by-Step Hypothesis Testing Example

The Scenario

A coffee shop claims their espresso shots average 30ml. You think they're short-changing customers. You measure 40 randomly selected shots and get an average of 28.5ml with a standard deviation of 4ml.

Is the coffee shop lying?

Step 1: State Your Hypotheses

H₀: μ = 30ml (the claim is correct)

H₁: μ < 30ml (the claim is wrong, they're giving less)

This is a one-tailed test because you're only testing if they're short, not if they're overfilling.

Step 2: Choose Your Significance Level

Use α = 0.05. This is standard practice unless you have a reason to be more or less strict.

Step 3: Calculate the Test Statistic

Since we know the population standard deviation is unknown and we have sample data, use a t-test.

t = (x̄ - μ) / (s / √n)

t = (28.5 - 30) / (4 / √40)

t = -1.5 / (4 / 6.32)

t = -1.5 / 0.633

t = -2.37

Step 4: Find the Critical Value

Degrees of freedom = n - 1 = 39

For a one-tailed t-test at α = 0.05 with df = 39, the critical value is approximately -1.685

Step 5: Make Your Decision

Your calculated t = -2.37

Critical t = -1.685

-2.37 < -1.685

Your test statistic falls in the rejection region. Reject the null hypothesis.

Step 6: State Your Conclusion

At the 0.05 significance level, there's sufficient statistical evidence to conclude the coffee shop is giving less than 30ml per shot.

The p-value for t = -2.37 with df = 39 is approximately 0.011. That's less than 0.05, confirming our decision.

One-Tailed vs Two-Tailed Tests

This matters more than most people realize.

Test Type When to Use Rejection Region
Two-tailed Testing if a parameter differs from value (direction unknown) Both tails of distribution
Left-tailed Testing if parameter is less than value Left tail only
Right-tailed Testing if parameter is greater than value Right tail only

Using a two-tailed test when you should have used one-tailed is one of the most common hypothesis testing mistakes. It makes it harder to reject H₀, which might be fine if you're being conservative, but it's technically wrong if your hypothesis has a directional component.

Common Types of Hypothesis Tests

The example above used a one-sample t-test. Here are the others you need:

One-Sample t-test

Test a population mean against a known or hypothesized value. Use when you have one group and know σ is unknown.

Two-Sample t-test (Independent)

Compare means of two independent groups. "Does Group A score higher than Group B?"

Paired t-test

Compare means from the same group at different times or under different conditions. "Did test scores improve after tutoring?"

Z-test

Like the t-test but use when σ is known or your sample is large (typically n > 30). Most real-world scenarios don't give you σ, so t-tests are more common.

Chi-Square Test

Test categorical data. "Is there a relationship between gender and voting preference?"

ANOVA

Compare means across three or more groups. One-way ANOVA tests if at least one group mean differs from the others.

Test Data Type Groups What It Tests
One-sample t Continuous 1 Mean vs value
Two-sample t Continuous 2 Mean difference
Paired t Continuous 1 (repeated) Before vs after
Chi-square Categorical Any Independence/fit
ANOVA Continuous 3+ Mean equality

Mistakes That Kill Your Analysis

These errors show up constantly in bad research:

How to Run a Hypothesis Test in Practice

In Python (scipy.stats)

from scipy import stats

# One-sample t-test
t_stat, p_value = stats.ttest_1samp(data, population_mean)

# Two-sample t-test
t_stat, p_value = stats.ttest_ind(group1, group2)

# Paired t-test
t_stat, p_value = stats.ttest_rel(before, after)

In R

# One-sample t-test
t.test(data, mu = population_mean)

# Two-sample t-test
t.test(group1 ~ group2, data = dataframe)

# Paired t-test
t.test(before, after, paired = TRUE)

In Excel

Use the Data Analysis ToolPak. Select "t-Test: Two-Sample Assuming Equal Variances" or similar options depending on your test type.

For a one-sample t-test in Excel: =TTEST(array, 0, 1) where array is your data and 0 is the hypothesized mean.

What Alpha Level Should You Use?

0.05 is convention, not a law. Here's when to deviate:

Set your alpha before you collect data. Don't change it after seeing results.

The Honest Truth About Hypothesis Testing

Hypothesis testing is a tool, not a conclusion. A significant result doesn't prove your hypothesis is true. It means the data was inconsistent with the null. That's all.

Replicate your results. Check assumptions. Report effect sizes. Consider confidence intervals alongside p-values.

The p-value tells you whether to be surprised. It doesn't tell you whether something matters.