How to Find the Correlation Coefficient- Tutorial

What is the Correlation Coefficient?

The correlation coefficient is a number between -1 and +1 that tells you how strongly two variables move together. That's it. No fancy definitions needed.

A value of +1 means perfect positive correlation — when one variable goes up, the other goes up in a perfectly predictable way.

A value of -1 means perfect negative correlation — when one variable goes up, the other goes down in a perfectly predictable way.

A value of 0 means no linear relationship exists between the two variables.

Types of Correlation Coefficients

Most people mean the Pearson correlation when they say "correlation coefficient." It's the standard for measuring linear relationships between continuous variables.

But there are others:

Stick with Pearson unless you have a specific reason not to.

The Formula

The Pearson correlation coefficient formula looks like this:

r = [Σ(xi - x̄)(yi - ȳ)] / [√(Σ(xi - x̄)²) × √(Σ(yi - ȳ)²)]

Where:

Don't memorize this. Use software. This formula is here so you understand what's actually being calculated.

How to Find the Correlation Coefficient

By Hand (Step-by-Step)

If you're doing this by hand for a small dataset, here's the process:

  1. Calculate the mean of both x and y variables
  2. Subtract the mean from each x value (xi - x̄)
  3. Subtract the mean from each y value (yi - ȳ)
  4. Multiply each pair of deviations together
  5. Sum all those products
  6. Calculate the sum of squared deviations for x
  7. Calculate the sum of squared deviations for y
  8. Take the square root of the product from step 6 and 7
  9. Divide step 4 by step 8

For anything over 10 data points, use software.

In Excel

Excel makes this trivial.

Use the =CORREL(array1, array2) function.

That's it. Select your two columns of data, and Excel spits out the coefficient.

Make sure your data is clean first — Excel doesn't handle text or blank cells the way you might expect.

In Python

Python's pandas library has a built-in method:

df['column1'].corr(df['column2'])

Or use numpy:

np.corrcoef(array1, array2)

Both give you the Pearson coefficient by default. For Spearman, add , method='spearman' to the pandas version.

Using a Calculator

Online calculators exist. Desmos, GeoGebra, and most scientific calculators can compute this.

TI-84 users: go to STAT → CALC → LinRegTTest. The r-value it returns is your correlation coefficient.

Interpreting the Results

Here's the practical scale for Pearson correlation:

ValueInterpretation
+0.70 to +1.00Strong positive correlation
+0.30 to +0.69Moderate positive correlation
+0.01 to +0.29Weak positive correlation
0.00No correlation
-0.01 to -0.29Weak negative correlation
-0.30 to -0.69Moderate negative correlation
-0.70 to -1.00Strong negative correlation

Correlation does NOT imply causation. A coefficient of 0.9 between ice cream sales and drowning deaths doesn't mean ice cream causes drowning. Both increase in summer. That's the actual explanation.

Common Mistakes to Avoid

Quick Reference

Tool/MethodFunction/Method
Excel=CORREL(range1, range2)
Python pandasdf.col1.corr(df.col2)
Python numpynp.corrcoef(arr1, arr2)
TI-84STAT → CALC → LinRegTTest
OnlineDesmos, GeoGebra, rapidtables.com