Correlation Practice Problems and Answers- Statistics

What Is Correlation in Statistics? Let's Cut to the Chase

Correlation measures the strength and direction of the relationship between two variables. That's it. Nothing fancy.

You need to know two main types:

Pearson ranges from -1 to +1. Spearman does too. Zero means no linear relationship. The closer to ±1, the stronger the relationship.

Practice Problems with Full Solutions

These are real problems. Work through them before checking the answers.

Problem 1: Calculating Pearson Correlation by Hand

You have data on study hours and exam scores for 5 students:

Student Study Hours (X) Exam Score (Y)
A 2 70
B 4 85
C 6 90
D 8 95
E 10 100

Calculate the Pearson correlation coefficient.

Problem 1 Solution

Using the formula:

r = Σ[(xi - x̄)(yi - ȳ)] / √[Σ(xi - x̄)² × Σ(yi - ȳ)²]

r = 280 / √(40 × 1960) = 280 / √78400 = 280 / 280 = 1.0

Perfect positive correlation. Makes sense — more study hours, higher scores.

Problem 2: Interpreting a Weak Correlation

A study finds r = 0.23 between ice cream sales and shark attacks.

What does this actually tell you?

Problem 2 Solution

Correlation ≠ causation. This is the most important thing to remember.

r = 0.23 means a weak positive relationship. Both variables increase together, but the relationship is loose. The real cause is a confounding variable — summer. Hot weather increases both ice cream sales and beach activity (more shark encounters).

Don't make the mistake of assuming ice cream causes shark attacks.

Problem 3: Negative Correlation

Data on hours of sleep and number of errors made:

Person Sleep (hours) Errors
1 4 12
2 5 10
3 6 8
4 7 6
5 8 4

Calculate r.

Problem 3 Solution

When one variable goes up and the other goes down, you get a negative r.

r = -1.0

Perfect negative correlation. Less sleep = more errors. This relationship is strong and consistent.

Problem 4: Spearman vs. Pearson

Why might Spearman be better for this data?

Contestant Raw Score Rank
A 95 1
B 89 2
C 92 3
D 85 4
E 87 5

Problem 4 Solution

Spearman (ρ) uses ranks instead of raw values. Use it when:

For this data, Pearson and Spearman would give similar results. But if Contestant A scored 150 (outlier), Pearson would drop while Spearman stays stable.

Correlation Types Comparison

Type Range Best For Affected by Outliers?
Pearson r -1 to +1 Linear relationships, continuous data Yes
Spearman ρ -1 to +1 Ranks, ordinal data, monotonic relationships No
Kendall's τ -1 to +1 Small samples, tied ranks No

How to Calculate Correlation: Step-by-Step

Here's the process for Pearson correlation:

  1. Collect your pairs — you need (x, y) for each observation
  2. Calculate means — find x̄ and ȳ
  3. Find deviations — subtract means from each value
  4. Multiply deviations — (xi - x̄)(yi - ȳ) for each pair
  5. Sum the products — this is your numerator
  6. Square and sum deviations — separately for x and y
  7. Divide and square root — apply the full formula

Or just use Excel: =CORREL(array1, array2)

Common Mistakes That Kill Your Analysis

Quick Reference: Interpreting r Values

r Value Interpretation
0.00 – 0.19 Very weak
0.20 – 0.39 Weak
0.40 – 0.59 Moderate
0.60 – 0.79 Strong
0.80 – 1.00 Very strong

These are guidelines, not rules. Context matters.

Bottom Line

Correlation is a tool. Like any tool, it fails when used wrong.

Know your correlation type. Know your data. Never confuse correlation with causation.