Least Squares Regression Line Equation- Complete Guide

What Is the Least Squares Regression Line?

The least squares regression line is the straight line that best fits a scatter plot of data points. It minimizes the vertical distances between the actual data points and the line itself.

That's it. No philosophy. No debate. It's a mathematical tool that finds the one line that comes closest to all your points at once.

You might hear it called the line of best fit, LSRL, or simply "the regression line." Same thing.

Why "Least Squares"?

The word "squares" tells you exactly how the line is chosen. For each data point, you calculate how far it sits from the line vertically. Then you square each distance (to remove negatives) and add them all up.

The regression line is the one that makes this sum as small as possible. Hence: least squares.

The Least Squares Regression Line Equation

The formula looks like this:

ŷ = bx + a

Where:

The Slope Formula (b)

b = Σ[(xi - x̄)(yi - ȳ)] / Σ[(xi - x̄)²]

This tells you how much y changes for each unit change in x. A slope of 2.5 means y goes up by 2.5 whenever x goes up by 1.

The Intercept Formula (a)

a = ȳ - b(x̄)

Simple. Take the mean of y values, subtract the slope times the mean of x values. This pins the line to the correct vertical position.

What Do the Slope and Intercept Actually Mean?

Slope (b): The predicted change in y for each one-unit increase in x. If b is positive, y tends to increase as x increases. If b is negative, y tends to decrease.

Intercept (a): The predicted value of y when x equals zero. Sometimes this is meaningful. Sometimes it's not. If x can never be zero in your context, don't read too much into the intercept.

How to Calculate the Least Squares Regression Line

Let's walk through a real example. You want to predict monthly rent (y) based on apartment square footage (x).

Apartment Sq Ft (x) Rent $ (y)
1 500 1200
2 750 1600
3 1000 2100
4 1250 2400
5 1500 2800

Step 1: Calculate the Means

= (500 + 750 + 1000 + 1250 + 1500) / 5 = 1000

ȳ = (1200 + 1600 + 2100 + 2400 + 2800) / 5 = 2020

Step 2: Calculate the Slope (b)

Build a table with (xi - x̄), (yi - ȳ), their product, and the squared x deviations:

xi yi (xi - x̄) (yi - ȳ) Product (xi - x̄)²
500 1200 -500 -820 410,000 250,000
750 1600 -250 -420 105,000 62,500
1000 2100 0 80 0 0
1250 2400 250 380 95,000 62,500
1500 2800 500 780 390,000 250,000
Totals 1,000,000 625,000

b = 1,000,000 / 625,000 = 1.6

Step 3: Calculate the Intercept (a)

a = 2020 - 1.6(1000) = 2020 - 1600 = 420

Step 4: Write the Equation

ŷ = 1.6x + 420

Interpretation: For every additional square foot, rent goes up by $1.60. A zero-square-foot apartment would theoretically rent for $420 (which makes no real-world sense, but that's the math).

How to Use It in Practice

Plug in any x value to get a predicted y:

What Makes a Good Regression Line?

Two key metrics tell you whether your line fits well:

R-squared (R²)

This tells you what percentage of the variation in y is explained by x. R² of 0.85 means 85% of y's movement is captured by the line. The rest is noise or other factors.

Range: 0 to 1. Higher is better, but not always. An R² of 0.9 in one context might be weak in another.

Standard Error of the Estimate (s)

This is the typical prediction error. If s = 200, your predictions are usually off by about $200 on average.

Lower is better. You want predictions to be close to actual values.

Common Mistakes to Avoid

When to Use Least Squares Regression

This method works when:

It doesn't work well when the relationship is curved, when you have categorical variables, or when your data has heavy outliers.

Least Squares vs. Other Methods

Method Best For Drawback
Least Squares Linear relationships, prediction Sensitive to outliers
Median Regression Data with extreme values Harder to interpret
Polynomial Regression Curved relationships Can overfit easily
Robust Regression Data with outliers More complex calculations

Getting Started: Quick Checklist

  1. Plot your data first. Scatter plot. Does a straight line look reasonable? If the pattern is curved, linear regression isn't your answer.
  2. Calculate x̄ and ȳ. Your starting point for everything else.
  3. Compute the slope (b). Use the formula or let software do it. Excel, Google Sheets, R, Python—all have built-in functions.
  4. Find the intercept (a). One subtraction.
  5. Write the equation. ŷ = bx + a.
  6. Check R² and standard error. Does the line actually explain your data?
  7. Validate. Hold out some data points. See how well your line predicts them.

The Bottom Line

The least squares regression line is a straightforward tool: find the straight line that comes closest to all your points. The math is simple, the interpretation is direct, and the formula has been battle-tested for over a century.

Use it when the relationship is linear. Check your R². Don't extrapolate beyond your data. That's all you need.