Scatter Plot and Correlation Coefficient- Complete Guide

What Is a Scatter Plot?

A scatter plot is a graph that shows individual data points plotted on an x-y coordinate system. Each point represents one observation. That's it. Nothing fancy.

You use scatter plots when you want to see the relationship between two variables. Does studying more hours lead to higher grades? Does more advertising spend increase sales? Scatter plots make these relationships visible.

The real power of scatter plots is their ability to reveal patterns that tables of numbers hide. You can spot trends, clusters, and outliers in seconds.

Understanding the Correlation Coefficient

The correlation coefficient, denoted as r, is a single number that quantifies how strongly two variables are related. It ranges from -1 to +1.

r = +1 means perfect positive correlation. Every increase in x matches an increase in y.

r = -1 means perfect negative correlation. Every increase in x matches a decrease in y.

r = 0 means no linear relationship exists between the variables.

Values between these extremes indicate partial relationships. The closer the absolute value is to 1, the stronger the relationship.

Types of Correlation

Positive Correlation

When r is between 0 and 1, you have positive correlation. As one variable increases, the other tends to increase. Height and weight. Price and demand for luxury goods. Study time and exam scores.

The points on your scatter plot will trend upward from left to right.

Negative Correlation

When r is between -1 and 0, you have negative correlation. As one variable increases, the other tends to decrease. Exercise frequency and body fat percentage. Price and sales volume for most products. Age and reaction time.

The points trend downward from left to right.

No Correlation

When r is close to 0, there's no linear relationship. Shoe size and intelligence. Hair length and income. The points scatter randomly with no discernible pattern.

Interpreting Correlation Strength

Most people overestimate what different r values mean. Here's the reality:

Correlation (r) Strength What It Means
0.00 - 0.19 Very Weak Practically no linear relationship
0.20 - 0.39 Weak Some relationship, but barely useful
0.40 - 0.59 Moderate Noticeable relationship worth investigating
0.60 - 0.79 Strong Clear relationship, reliable for prediction
0.80 - 1.00 Very Strong Near-perfect linear relationship

An r of 0.5 is not "pretty strong." It's moderate. Stop inflating your results.

How to Read a Scatter Plot

Look at the overall cloud of points. Does it lean upward? Downward? Sit flat? That's your first clue about correlation direction.

Then assess the tightness of the cloud. Points forming a thin line indicate strong correlation. Points scattered everywhere indicate weak or no correlation.

Watch for outliers. A single point far from the rest can dramatically affect your correlation coefficient. Always check for data entry errors.

Check for nonlinearity. Correlation coefficient only measures linear relationships. Curved patterns can have r near zero while having a very strong relationship. Always visualize your data first.

Common Mistakes to Avoid

Getting Started: Creating Your First Scatter Plot

Step 1: Collect Your Data

You need paired observations. 20-30 data points minimum for any meaningful analysis. Fewer than that and you're guessing more than analyzing.

Step 2: Plot Your Points

Put your independent variable (the one you think influences the other) on the x-axis. Put your dependent variable on the y-axis. Plot each pair as a single point.

Step 3: Calculate r

Use spreadsheet software like Excel or Google Sheets. The formula is =CORREL(array1, array2). Done.

Step 4: Interpret

Check the scatter plot shape first. Calculate r second. Report both the coefficient and a visual of your scatter plot. Never report r alone.

When to Use Each Tool

Use scatter plots when you need to show the relationship to others or when you're exploring data for the first time. Visual patterns are easier to communicate than numbers.

Use correlation coefficient when you need precise quantification, statistical testing, or building predictive models. r is a number you can use in further calculations.

Use both together. The scatter plot shows you what's happening. The correlation coefficient tells you how strongly. Neither alone gives you the full picture.