Scatter Plots- Creation and Interpretation
What Is a Scatter Plot?
A scatter plot is a graph that displays values for two variables on an X-Y coordinate system. Each point on the graph represents a single data observation. That's it. Nothing fancy.
You plot one variable on the horizontal axis (X) and another on the vertical axis (Y). The pattern these points form tells you something about the relationship between the two variables. 📊
When to Use a Scatter Plot
Use a scatter plot when you want to see:
- If two variables have any relationship at all
- What kind of relationship exists (positive, negative, or none)
- How strong that relationship is
- Whether outliers are present
- If the relationship is linear or curved
Don't use a scatter plot to compare categories or show distributions over time. That's what bar charts and line graphs are for.
Reading a Scatter Plot - The Basics
Start by looking at the overall pattern. Ignore individual points initially. Ask yourself: do the points trend upward, downward, or show no clear direction?
The direction of the pattern tells you if variables move together or in opposite directions. The tightness of the pattern shows how strongly they're connected. Scattered points mean weak relationship. Points forming a clear line mean strong relationship.
Types of Correlations
Positive Correlation
When X increases, Y also increases. Height and weight. Study time and test scores. Advertising spend and sales. The points trend upward from left to right.
Negative Correlation
When X increases, Y decreases. Exercise frequency and body fat percentage. Price and demand. The points trend downward from left to right.
No Correlation
No discernible pattern. X and Y move independently. Shoe size and intelligence. Points are randomly distributed across the graph.
Non-Linear Correlation
The relationship curves. It might go up then down, or follow some other pattern. Linear models won't capture this. You need polynomial or other curved models.
Outliers - What They Tell You
Outliers are points that fall far from the main cluster. They matter. A lot.
Outliers can indicate:
- Data entry errors
- Genuinely unusual cases worth investigating
- Variables outside normal ranges
- Limitations in your current model
Don't automatically delete outliers. Look at them first. Sometimes they reveal the most interesting insights. Sometimes they're just typos. Know the difference.
How to Create a Scatter Plot
Step 1: Gather your data pairs. You need two columns of numbers with matching observations. Each row is one point.
Step 2: Label your axes. Put your independent variable (the one you control or suspect influences the other) on the X-axis. Put the dependent variable on the Y-axis.
Step 3: Set your scales. Make the scales appropriate for your data. Don't stretch or compress axes unnecessarily—it distorts perception of the relationship.
Step 4: Plot each point. For each observation, find the X value on the horizontal axis and the Y value on the vertical axis. Mark where they meet.
Step 5: Add a trend line if needed. A regression line helps visualize the overall direction. Don't add one if the relationship is clearly non-linear.
Step 6: Label and title. Give it a clear title. Add labels for axes with units. Make it readable.
Common Mistakes to Avoid
Reversing axes. Putting the wrong variable on X or Y. The independent variable goes on X. The dependent variable goes on Y.
Starting axes at non-zero values to exaggerate patterns. This is misleading. Zero-based axes are more honest.
Ignoring overplotting. When you have thousands of points, they stack on top of each other. Use transparency, smaller points, or sample your data.
Assuming correlation means causation. Just because two things move together doesn't mean one causes the other. Both could be caused by a third factor.
Tools for Creating Scatter Plots
| Tool | Best For | Learning Curve |
|---|---|---|
| Excel / Google Sheets | Quick basic plots, business reports | Low |
| Python (Matplotlib, Seaborn) | Customization, automation, large datasets | Medium-High |
| R (ggplot2) | Statistical analysis, publications | Medium-High |
| Tableau | Interactive dashboards, presentations | Medium |
| Python (Plotly) | Interactive web visualizations | Medium |
For one-off analyses, Excel or Google Sheets work fine. For anything repetitive or complex, learn Python or R. They're worth the upfront time investment.
The Bottom Line
Scatter plots reveal relationships between two continuous variables. They're simple to read and create. Look at the direction, strength, and form of the pattern. Check for outliers. Don't read causation into correlation.
That's everything you need to know to start using them effectively.