Scatter Plots- Creation and Interpretation

What Is a Scatter Plot?

A scatter plot is a graph that displays values for two variables on an X-Y coordinate system. Each point on the graph represents a single data observation. That's it. Nothing fancy.

You plot one variable on the horizontal axis (X) and another on the vertical axis (Y). The pattern these points form tells you something about the relationship between the two variables. 📊

When to Use a Scatter Plot

Use a scatter plot when you want to see:

Don't use a scatter plot to compare categories or show distributions over time. That's what bar charts and line graphs are for.

Reading a Scatter Plot - The Basics

Start by looking at the overall pattern. Ignore individual points initially. Ask yourself: do the points trend upward, downward, or show no clear direction?

The direction of the pattern tells you if variables move together or in opposite directions. The tightness of the pattern shows how strongly they're connected. Scattered points mean weak relationship. Points forming a clear line mean strong relationship.

Types of Correlations

Positive Correlation

When X increases, Y also increases. Height and weight. Study time and test scores. Advertising spend and sales. The points trend upward from left to right.

Negative Correlation

When X increases, Y decreases. Exercise frequency and body fat percentage. Price and demand. The points trend downward from left to right.

No Correlation

No discernible pattern. X and Y move independently. Shoe size and intelligence. Points are randomly distributed across the graph.

Non-Linear Correlation

The relationship curves. It might go up then down, or follow some other pattern. Linear models won't capture this. You need polynomial or other curved models.

Outliers - What They Tell You

Outliers are points that fall far from the main cluster. They matter. A lot.

Outliers can indicate:

Don't automatically delete outliers. Look at them first. Sometimes they reveal the most interesting insights. Sometimes they're just typos. Know the difference.

How to Create a Scatter Plot

Step 1: Gather your data pairs. You need two columns of numbers with matching observations. Each row is one point.

Step 2: Label your axes. Put your independent variable (the one you control or suspect influences the other) on the X-axis. Put the dependent variable on the Y-axis.

Step 3: Set your scales. Make the scales appropriate for your data. Don't stretch or compress axes unnecessarily—it distorts perception of the relationship.

Step 4: Plot each point. For each observation, find the X value on the horizontal axis and the Y value on the vertical axis. Mark where they meet.

Step 5: Add a trend line if needed. A regression line helps visualize the overall direction. Don't add one if the relationship is clearly non-linear.

Step 6: Label and title. Give it a clear title. Add labels for axes with units. Make it readable.

Common Mistakes to Avoid

Reversing axes. Putting the wrong variable on X or Y. The independent variable goes on X. The dependent variable goes on Y.

Starting axes at non-zero values to exaggerate patterns. This is misleading. Zero-based axes are more honest.

Ignoring overplotting. When you have thousands of points, they stack on top of each other. Use transparency, smaller points, or sample your data.

Assuming correlation means causation. Just because two things move together doesn't mean one causes the other. Both could be caused by a third factor.

Tools for Creating Scatter Plots

Tool Best For Learning Curve
Excel / Google Sheets Quick basic plots, business reports Low
Python (Matplotlib, Seaborn) Customization, automation, large datasets Medium-High
R (ggplot2) Statistical analysis, publications Medium-High
Tableau Interactive dashboards, presentations Medium
Python (Plotly) Interactive web visualizations Medium

For one-off analyses, Excel or Google Sheets work fine. For anything repetitive or complex, learn Python or R. They're worth the upfront time investment.

The Bottom Line

Scatter plots reveal relationships between two continuous variables. They're simple to read and create. Look at the direction, strength, and form of the pattern. Check for outliers. Don't read causation into correlation.

That's everything you need to know to start using them effectively.