Scatter Plot Relationships- Examples and Interpretation
What a Scatter Plot Actually Shows You
A scatter plot displays two variables on an X-Y graph. Each point represents one observation. That's it. Nothing fancy.
The real value? You can see how two things relate to each other at a glance. Does one increase when the other increases? Decrease? Nothing happens?
Most people stare at these graphs and see a blob of dots. That's useless. You need to know what you're looking for.
The Three Types of Relationships
Positive Correlation
When X goes up, Y goes up. The points trend upward from left to right.
Example: hours studied vs. test scores. More studying, higher scores. Makes sense.
Watch out: correlation is not causation. People ignore this constantly. Just because two things move together doesn't mean one causes the other.
Negative Correlation
When X goes up, Y goes down. The points trend downward from left to right.
Example: price vs. demand. Higher price, lower demand. Basic economics.
Again, causation is not guaranteed. Don't assume the direction of influence just from the pattern.
No Correlation
X moves however it wants. Y doesn't care. The points are scattered randomly with no clear direction.
Example: shoe size vs. intelligence. Nothing there. Some things genuinely don't relate.
Relationship Strength: How Tight Is the Pattern?
The direction tells you one thing. The tightness of the pattern tells you another.
- Strong relationship: points cluster tightly around an invisible line. You can make accurate predictions.
- Weak relationship: points are spread out. The pattern exists but predictions are unreliable.
- No relationship: scattered chaos. No predictive value whatsoever.
A scatter plot with a visible pattern but huge scatter is basically useless for prediction. Don't pretend otherwise.
Outliers: The Points That Break the Rules
Outliers are data points that don't fit the general pattern. They stick out.
Before you ignore them or delete them, ask:
- Is it a data entry error? Fix it or remove it.
- Is it a legitimate anomaly? Investigate it. Sometimes outliers are the interesting part.
- Is it skewing your perception of the relationship? Calculate with and without it.
Outliers tell stories. Figure out what the story is.
Linear vs. Curvilinear Relationships
Most people look for straight lines. Reality doesn't always cooperate.
Linear: points roughly follow a straight line. Simple. Predictable. Easy to model.
Curvilinear: points follow a curve. The relationship changes direction. More complex. Requires different analysis.
Example of curvilinear: stress vs. performance. A little stress helps. Too much tanks performance. That's a curve, not a line.
Reading a Scatter Plot: Step by Step
- Check the axes first. What are you actually measuring? What units? What ranges? Missing this changes everything.
- Identify the direction. Up, down, or nowhere?
- Assess the strength. Tight cluster or messy spread?
- Look for outliers. Anything that doesn't fit?
- Consider the shape. Line or curve?
- Think critically. Is the relationship meaningful? What could explain it?
Relationship Types at a Glance
| Pattern | Direction | Strength | Example |
|---|---|---|---|
| Strong positive | Upward | Tight clustering | Height vs. shoe size in adults |
| Weak positive | Upward | Scattered points | Study hours vs. income later in life |
| Strong negative | Downward | Tight clustering | Speed vs. time to finish |
| Weak negative | Downward | Scattered points | TV watching vs. grades |
| No correlation | None | Random scatter | Birthday month vs. height |
| Curvilinear | Changes | Varies | Age vs. athletic performance |
Common Mistakes People Make
Ignoring scale: A tiny axis range makes everything look correlated. Check your axes before drawing conclusions.
Extrapolating beyond the data: You can only make claims within the range you have data for. Don't predict outside that range unless you know what you're doing.
Confusing correlation with causation: Already said it. Said it again because people still get this wrong.
Ignoring confounding variables: Two things can correlate because a third thing drives both. Ice cream sales and drowning rates both increase in summer. The third variable is temperature.
When Scatter Plots Lie to You
Aggregated data hides patterns. If you see weak correlation in city-level data, check individual-level data. Patterns emerge and disappear depending on your unit of analysis.
Small sample sizes look messy. A scatter plot with 10 points tells you almost nothing. A scatter plot with 10,000 points tells you something real.
Axis manipulation distorts perception. Start axes at zero or don't? Showing breaks? These choices change how the relationship looks. Know what you're looking at.
Practical How To: Creating a Basic Scatter Plot
In Excel or Google Sheets:
- Put your X variable in column A, Y variable in column B
- Select both columns
- Insert → Chart → Scatter
- Add axis labels and a title
- Look at it
In Python with matplotlib:
plt.scatter(x_data, y_data)
plt.xlabel('X Variable Name')
plt.ylabel('Y Variable Name')
plt.show()
In R:
plot(x_data, y_data, main="Title", xlab="X", ylab="Y")
What Scatter Plots Can't Tell You
They can't prove causation. They can't account for time delays between variables. They can't handle more than two variables simultaneously without becoming 3D plots or multiple plots.
For three variables, use a bubble chart (third variable as bubble size) or create multiple scatter plots.
The Bottom Line
Scatter plots are simple tools. You plot points, you look for patterns, you interpret what you see. The math is straightforward.
Where people fail is in the interpretation. They see a pattern and jump to conclusions. They ignore confounding variables. They assume causation where only correlation exists.
Look at the plot. See what's there. Don't see more than what's there.