Scatter Plot Correlation Value- Finding and Interpreting
What Is Correlation in a Scatter Plot?
A scatter plot shows the relationship between two variables. Each point on the chart represents a data pair. Correlation tells you how those points relate to each other.
That's it. You're looking for patterns. Do the points trend upward, downward, or show no pattern at all?
The Three Types of Correlation
Positive correlation: As one variable increases, the other increases. The points trend upward like a hill. Example: hours studied vs. test scores.
Negative correlation: As one variable increases, the other decreases. The points trend downward. Example: age of a car vs. its resale value.
No correlation: The points are scattered randomly. No relationship exists between the variables. Example: shoe size vs. intelligence.
Understanding the Correlation Coefficient (r)
The correlation coefficient, represented as r, gives you a number between -1 and +1. This number tells you the strength and direction of the relationship.
What the Numbers Mean
- r = +1.0 — Perfect positive correlation. Every point falls exactly on a straight line going up.
- r = 0 — No correlation at all. Random scatter.
- r = -1.0 — Perfect negative correlation. Every point falls exactly on a straight line going down.
- r = 0.5 to 0.7 — Moderate positive correlation.
- r = -0.5 to -0.7 — Moderate negative correlation.
- r = 0.8 or higher — Strong correlation. Rare in real-world data.
Quick Reference Table
| r Value | Interpretation |
|---|---|
| +0.8 to +1.0 | Strong positive |
| +0.5 to +0.7 | Moderate positive |
| +0.2 to +0.4 | Weak positive |
| -0.2 to +0.2 | Negligible/no correlation |
| -0.4 to -0.2 | Weak negative |
| -0.7 to -0.5 | Moderate negative |
| -1.0 to -0.8 | Strong negative |
How to Find the Correlation Value
Excel Method
Use the CORREL function. Select two columns of data:
=CORREL(A2:A20, B2:B20)
This returns the r value instantly.
Python Method
import numpy as np
from scipy import stats
x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]
r, p_value = stats.pearsonr(x, y)
print(f"Correlation: {r}")
Google Sheets Method
Use =CORREL(range1, range2) — same syntax as Excel.
Manual Calculation (Pearson's r)
The formula looks like this:
r = [Σ(x-x̄)(y-ȳ)] / [√(Σ(x-x̄)²) × √(Σ(y-ȳ)²)]
Don't calculate this by hand unless you enjoy pain. Use software instead.
What to Look for in the Scatter Plot Itself
You can often spot correlation without calculating anything:
- Points form a clear line → strong correlation
- Points cluster loosely around an invisible line → weak correlation
- Points form a curve, not a line → relationship exists but it's not linear
- Points look like a cloud → no correlation
Common Mistakes to Avoid
Confusing correlation with causation: Just because two variables correlate doesn't mean one causes the other. Ice cream sales and drowning rates both increase in summer. Ice cream doesn't cause drowning.
Ignoring outliers: One extreme point can dramatically change your r value. Always visualize your data before trusting the number.
Assuming linearity: A low r value doesn't mean no relationship exists. The variables might have a curved relationship. Always plot your data first.
Small sample sizes: With 5 data points, you can get any r value by chance. Bigger samples give more reliable results.
When to Use Scatter Plots
Scatter plots work when both variables are continuous and numerical. Good use cases:
- Height vs. weight
- Temperature vs. electricity usage
- Advertising spend vs. sales revenue
Bad use cases:
- Comparing categories (use bar charts instead)
- Time series with trends (use line charts)
R-Squared: The Coefficient of Determination
You might see R² reported alongside r. R² is simply r squared. It tells you what percentage of the variation in Y is explained by X.
Example: If r = 0.8, then R² = 0.64. This means 64% of the variation in the dependent variable is explained by the independent variable. The remaining 36% comes from other factors.
Practical Example
Let's say you're analyzing study time vs. exam scores for 50 students. You collect the data, plot it, and calculate r = 0.72.
Interpretation: There's a moderate-to-strong positive relationship. Students who study more tend to score higher. But 48% of exam score variation comes from factors other than study time — natural ability, exam anxiety, quality of notes, etc.
This is useful. But you can't say studying causes better scores based on this alone.
Getting Started Checklist
- Plot your data first — never trust a number without seeing the visualization
- Check for outliers or curved patterns
- Calculate r using Excel, Python, or Google Sheets
- Square r to get R² if you need explained variation
- Report the p-value to confirm the correlation is statistically significant
- Never claim causation without controlled experiments
Correlation analysis is a starting point, not a conclusion. It tells you a relationship exists. It doesn't tell you why.