How to Find the Correlation Coefficient- Tutorial

What is the Correlation Coefficient?

The correlation coefficient is a number between -1 and +1 that tells you how strongly two variables move together. That's it. No fancy definitions needed.

A value of +1 means perfect positive correlation — when one variable goes up, the other goes up in a perfectly predictable way.

A value of -1 means perfect negative correlation — when one variable goes up, the other goes down in a perfectly predictable way.

A value of 0 means no linear relationship exists between the two variables.

Types of Correlation Coefficients

Most people mean the Pearson correlation when they say "correlation coefficient." It's the standard for measuring linear relationships between continuous variables.

But there are others:

Pearson (r) — measures linear relationships, assumes normal distribution
Spearman's rho — rank-based, works with ordinal data or non-normal distributions
Kendall's tau — also rank-based, better for smaller datasets

Stick with Pearson unless you have a specific reason not to.

The Formula

The Pearson correlation coefficient formula looks like this:

r = [Σ(xi - x̄)(yi - ȳ)] / [√(Σ(xi - x̄)²) × √(Σ(yi - ȳ)²)]

Where:

xi and yi are individual data points
x̄ and ȳ are the means of each variable

Don't memorize this. Use software. This formula is here so you understand what's actually being calculated.

How to Find the Correlation Coefficient

By Hand (Step-by-Step)

If you're doing this by hand for a small dataset, here's the process:

Calculate the mean of both x and y variables
Subtract the mean from each x value (xi - x̄)
Subtract the mean from each y value (yi - ȳ)
Multiply each pair of deviations together
Sum all those products
Calculate the sum of squared deviations for x
Calculate the sum of squared deviations for y
Take the square root of the product from step 6 and 7
Divide step 4 by step 8

For anything over 10 data points, use software.

In Excel

Excel makes this trivial.

Use the =CORREL(array1, array2) function.

That's it. Select your two columns of data, and Excel spits out the coefficient.

Make sure your data is clean first — Excel doesn't handle text or blank cells the way you might expect.

In Python

Python's pandas library has a built-in method:

df['column1'].corr(df['column2'])

Or use numpy:

np.corrcoef(array1, array2)

Both give you the Pearson coefficient by default. For Spearman, add , method='spearman' to the pandas version.

Using a Calculator

Online calculators exist. Desmos, GeoGebra, and most scientific calculators can compute this.

TI-84 users: go to STAT → CALC → LinRegTTest. The r-value it returns is your correlation coefficient.

Interpreting the Results

Here's the practical scale for Pearson correlation:

Value	Interpretation
+0.70 to +1.00	Strong positive correlation
+0.30 to +0.69	Moderate positive correlation
+0.01 to +0.29	Weak positive correlation
0.00	No correlation
-0.01 to -0.29	Weak negative correlation
-0.30 to -0.69	Moderate negative correlation
-0.70 to -1.00	Strong negative correlation

Correlation does NOT imply causation. A coefficient of 0.9 between ice cream sales and drowning deaths doesn't mean ice cream causes drowning. Both increase in summer. That's the actual explanation.

Common Mistakes to Avoid

Assuming linearity — Pearson only measures linear relationships. Check a scatter plot first.
Ignoring outliers — One extreme value can dramatically skew your coefficient.
Using the wrong type — Ordinal data? Use Spearman, not Pearson.
Confusing with slope — Correlation measures strength of relationship, not the slope of the line.

Quick Reference

Tool/Method	Function/Method
Excel	=CORREL(range1, range2)
Python pandas	df.col1.corr(df.col2)
Python numpy	np.corrcoef(arr1, arr2)
TI-84	STAT → CALC → LinRegTTest
Online	Desmos, GeoGebra, rapidtables.com