Correlation Practice Problems and Answers- Statistics
What Is Correlation in Statistics? Let's Cut to the Chase
Correlation measures the strength and direction of the relationship between two variables. That's it. Nothing fancy.
You need to know two main types:
- Pearson correlation (r) — measures linear relationships between continuous variables
- Spearman correlation (ρ) — measures monotonic relationships using ranks
Pearson ranges from -1 to +1. Spearman does too. Zero means no linear relationship. The closer to ±1, the stronger the relationship.
Practice Problems with Full Solutions
These are real problems. Work through them before checking the answers.
Problem 1: Calculating Pearson Correlation by Hand
You have data on study hours and exam scores for 5 students:
| Student | Study Hours (X) | Exam Score (Y) |
|---|---|---|
| A | 2 | 70 |
| B | 4 | 85 |
| C | 6 | 90 |
| D | 8 | 95 |
| E | 10 | 100 |
Calculate the Pearson correlation coefficient.
Problem 1 Solution
Using the formula:
r = Σ[(xi - x̄)(yi - ȳ)] / √[Σ(xi - x̄)² × Σ(yi - ȳ)²]
- x̄ = (2+4+6+8+10)/5 = 6
- ȳ = (70+85+90+95+100)/5 = 88
- Σ(xi - x̄)(yi - ȳ) = 280
- Σ(xi - x̄)² = 40
- Σ(yi - ȳ)² = 1960
r = 280 / √(40 × 1960) = 280 / √78400 = 280 / 280 = 1.0
Perfect positive correlation. Makes sense — more study hours, higher scores.
Problem 2: Interpreting a Weak Correlation
A study finds r = 0.23 between ice cream sales and shark attacks.
What does this actually tell you?
Problem 2 Solution
Correlation ≠ causation. This is the most important thing to remember.
r = 0.23 means a weak positive relationship. Both variables increase together, but the relationship is loose. The real cause is a confounding variable — summer. Hot weather increases both ice cream sales and beach activity (more shark encounters).
Don't make the mistake of assuming ice cream causes shark attacks.
Problem 3: Negative Correlation
Data on hours of sleep and number of errors made:
| Person | Sleep (hours) | Errors |
|---|---|---|
| 1 | 4 | 12 |
| 2 | 5 | 10 |
| 3 | 6 | 8 |
| 4 | 7 | 6 |
| 5 | 8 | 4 |
Calculate r.
Problem 3 Solution
When one variable goes up and the other goes down, you get a negative r.
r = -1.0
Perfect negative correlation. Less sleep = more errors. This relationship is strong and consistent.
Problem 4: Spearman vs. Pearson
Why might Spearman be better for this data?
| Contestant | Raw Score | Rank |
|---|---|---|
| A | 95 | 1 |
| B | 89 | 2 |
| C | 92 | 3 |
| D | 85 | 4 |
| E | 87 | 5 |
Problem 4 Solution
Spearman (ρ) uses ranks instead of raw values. Use it when:
- Data is ordinal (ranks, ratings)
- Outliers are present
- The relationship isn't linear but is monotonic
For this data, Pearson and Spearman would give similar results. But if Contestant A scored 150 (outlier), Pearson would drop while Spearman stays stable.
Correlation Types Comparison
| Type | Range | Best For | Affected by Outliers? |
|---|---|---|---|
| Pearson r | -1 to +1 | Linear relationships, continuous data | Yes |
| Spearman ρ | -1 to +1 | Ranks, ordinal data, monotonic relationships | No |
| Kendall's τ | -1 to +1 | Small samples, tied ranks | No |
How to Calculate Correlation: Step-by-Step
Here's the process for Pearson correlation:
- Collect your pairs — you need (x, y) for each observation
- Calculate means — find x̄ and ȳ
- Find deviations — subtract means from each value
- Multiply deviations — (xi - x̄)(yi - ȳ) for each pair
- Sum the products — this is your numerator
- Square and sum deviations — separately for x and y
- Divide and square root — apply the full formula
Or just use Excel: =CORREL(array1, array2)
Common Mistakes That Kill Your Analysis
- Assuming causation — correlation only shows association
- Ignoring outliers — one extreme point can drastically change r
- Using wrong correlation type — don't force Pearson on ranked data
- Forgetting sample size — small samples give unreliable estimates
- Misreading the strength — r = 0.5 is moderate, not strong
Quick Reference: Interpreting r Values
| r Value | Interpretation |
|---|---|
| 0.00 – 0.19 | Very weak |
| 0.20 – 0.39 | Weak |
| 0.40 – 0.59 | Moderate |
| 0.60 – 0.79 | Strong |
| 0.80 – 1.00 | Very strong |
These are guidelines, not rules. Context matters.
Bottom Line
Correlation is a tool. Like any tool, it fails when used wrong.
Know your correlation type. Know your data. Never confuse correlation with causation.