How to Find the Correlation Coefficient- Tutorial
What is the Correlation Coefficient?
The correlation coefficient is a number between -1 and +1 that tells you how strongly two variables move together. That's it. No fancy definitions needed.
A value of +1 means perfect positive correlation — when one variable goes up, the other goes up in a perfectly predictable way.
A value of -1 means perfect negative correlation — when one variable goes up, the other goes down in a perfectly predictable way.
A value of 0 means no linear relationship exists between the two variables.
Types of Correlation Coefficients
Most people mean the Pearson correlation when they say "correlation coefficient." It's the standard for measuring linear relationships between continuous variables.
But there are others:
- Pearson (r) — measures linear relationships, assumes normal distribution
- Spearman's rho — rank-based, works with ordinal data or non-normal distributions
- Kendall's tau — also rank-based, better for smaller datasets
Stick with Pearson unless you have a specific reason not to.
The Formula
The Pearson correlation coefficient formula looks like this:
r = [Σ(xi - x̄)(yi - ȳ)] / [√(Σ(xi - x̄)²) × √(Σ(yi - ȳ)²)]
Where:
- xi and yi are individual data points
- x̄ and ȳ are the means of each variable
Don't memorize this. Use software. This formula is here so you understand what's actually being calculated.
How to Find the Correlation Coefficient
By Hand (Step-by-Step)
If you're doing this by hand for a small dataset, here's the process:
- Calculate the mean of both x and y variables
- Subtract the mean from each x value (xi - x̄)
- Subtract the mean from each y value (yi - ȳ)
- Multiply each pair of deviations together
- Sum all those products
- Calculate the sum of squared deviations for x
- Calculate the sum of squared deviations for y
- Take the square root of the product from step 6 and 7
- Divide step 4 by step 8
For anything over 10 data points, use software.
In Excel
Excel makes this trivial.
Use the =CORREL(array1, array2) function.
That's it. Select your two columns of data, and Excel spits out the coefficient.
Make sure your data is clean first — Excel doesn't handle text or blank cells the way you might expect.
In Python
Python's pandas library has a built-in method:
df['column1'].corr(df['column2'])
Or use numpy:
np.corrcoef(array1, array2)
Both give you the Pearson coefficient by default. For Spearman, add , method='spearman' to the pandas version.
Using a Calculator
Online calculators exist. Desmos, GeoGebra, and most scientific calculators can compute this.
TI-84 users: go to STAT → CALC → LinRegTTest. The r-value it returns is your correlation coefficient.
Interpreting the Results
Here's the practical scale for Pearson correlation:
| Value | Interpretation |
|---|---|
| +0.70 to +1.00 | Strong positive correlation |
| +0.30 to +0.69 | Moderate positive correlation |
| +0.01 to +0.29 | Weak positive correlation |
| 0.00 | No correlation |
| -0.01 to -0.29 | Weak negative correlation |
| -0.30 to -0.69 | Moderate negative correlation |
| -0.70 to -1.00 | Strong negative correlation |
Correlation does NOT imply causation. A coefficient of 0.9 between ice cream sales and drowning deaths doesn't mean ice cream causes drowning. Both increase in summer. That's the actual explanation.
Common Mistakes to Avoid
- Assuming linearity — Pearson only measures linear relationships. Check a scatter plot first.
- Ignoring outliers — One extreme value can dramatically skew your coefficient.
- Using the wrong type — Ordinal data? Use Spearman, not Pearson.
- Confusing with slope — Correlation measures strength of relationship, not the slope of the line.
Quick Reference
| Tool/Method | Function/Method |
|---|---|
| Excel | =CORREL(range1, range2) |
| Python pandas | df.col1.corr(df.col2) |
| Python numpy | np.corrcoef(arr1, arr2) |
| TI-84 | STAT → CALC → LinRegTTest |
| Online | Desmos, GeoGebra, rapidtables.com |