Interpreting Residual Plot Numbers
What Residual Plots Actually Tell You
Residual plots are the first thing data scientists check after running a regression. They're not optional decoration—they're diagnostic tools that tell you whether your model is lying to you.
The numbers on a residual plot aren't mysterious. Once you know what you're looking at, you'll spot problems that summary statistics hide completely.
The Numbers on Residual Plots: What's Actually There
Every residual plot displays two things:
- Residuals — the difference between actual values and predicted values. A residual of 50 means your model missed by 50 units.
- Fitted values — your model's predictions, typically on the x-axis
The scatter pattern of residuals around zero reveals whether your model assumptions hold. That's the whole point.
Reading the Vertical Axis (Residuals)
Residuals are your prediction errors. They cluster around zero when your model works. The spread tells you how wrong your predictions typically are.
A residual of +3.2 means the actual value was 3.2 units higher than predicted. A residual of -1.8 means your model overshot by 1.8 units.
The standard deviation of residuals is your model's typical error magnitude. If residuals range from -10 to +10, your predictions are usually off by double digits.
Reading the Horizontal Axis (Fitted Values)
Fitted values are your model's predictions. The x-axis usually spans from your lowest predicted value to your highest.
Why does this matter? Because residual patterns often change across different prediction ranges. A model might work great for low values and fall apart for high ones.
What Good Residual Plots Look Like
A well-behaved residual plot has one specific pattern: random scatter around a horizontal line at zero.
No funnels. No curves. No clusters. Just random noise evenly distributed from left to right.
This pattern tells you:
- Your model captured the systematic relationships in your data
- Prediction errors are consistent across all prediction ranges
- Your model isn't systematically over- or under-predicting
Red Flags: Patterns That Signal Problems
1. Funnel Shape (Heteroscedasticity)
Residuals spread out as fitted values increase. Small predictions cluster tightly; large predictions scatter wildly.
This means your model is more reliable for some values than others. Predictions for high-value cases come with huge uncertainty.
Fix it with weighted regression, transformations, or switching to robust methods.
2. Curved Pattern (Nonlinearity)
Residuals form a U-shape or inverted U rather than scattering randomly. This signals your model missed a nonlinear relationship.
A straight line can't capture curves in your data. The residual pattern shows exactly where your linear assumption breaks down.
Fix it by adding polynomial terms, using splines, or switching to nonlinear regression.
3. Outliers Pulling the Model
A few points sit far from the rest. These outliers distort your regression line, making it fit the noise rather than the signal.
Check if outliers are data entry errors. If legitimate, consider robust regression techniques that downweight extreme values.
4. Systematic Positive or Negative Residuals
Residuals cluster above zero in one region and below zero in another. Your model systematically underpredicts in some ranges and overpredicts in others.
This often happens when you missed an important predictor variable.
Standardized vs. Raw Residuals: Which Numbers to Use
Raw residuals have different scales depending on your outcome variable. A residual of 50 is huge if your variable ranges from 0-100, but tiny if your variable ranges from 0-10000.
Standardized residuals fix this. They divide each residual by its estimated standard deviation, giving you a consistent scale.
With standardized residuals:
- Values between -2 and +2 are normal
- Values beyond ±2 warrant investigation
- Values beyond ±3 are serious outliers
Always use standardized residuals for outlier detection. Raw residuals will fool you.
Comparing Residual Plots Across Models
You can learn more by plotting residuals from different models on the same axes. The model with the tightest, most random scatter around zero is your best choice.
Less residual variance means more precise predictions. That's the goal.
Tools for Creating Residual Plots
Most statistical software generates residual plots automatically. Here's how to access them:
| Tool | Method | Best For |
|---|---|---|
| Python (statsmodels) | plot_regress_exog() or manual scatter |
Custom visualizations, automation |
| R | plot(model) or ggplot2 |
Quick diagnostics, publication plots |
| SPSS | Analyze → Regression → Plots | Quick analysis without coding |
| Excel | Calculate residuals manually, then scatter plot | Basic checks, no software overhead |
How to Interpret Residual Plots: Practical Walkthrough
Let's walk through a real interpretation scenario.
You run a linear regression predicting house prices from square footage and number of bedrooms. Your R-squared looks decent at 0.73.
Then you plot residuals against fitted values and see a clear upward-opening funnel. What does this tell you?
Your model predicts expensive houses less accurately than cheap ones. High-value homes have much larger prediction errors than low-value homes.
The funnel shape means your prediction intervals for expensive houses are unreliable. You can't trust your model's predictions for luxury properties.
This is exactly why you check residual plots. R-squared told you the model was okay. The residual plot revealed it was only okay for certain cases.
Getting Started: Your First Residual Analysis
Step 1: Run your regression and save the residuals and fitted values.
Step 2: Create a scatter plot with fitted values on the x-axis and residuals on the y-axis.
Step 3: Add a horizontal reference line at zero.
Step 4: Look for four things: random scatter, funnels, curves, and outliers.
Step 5: If you find problems, try transformations (log, square root), add polynomial terms, or switch to robust methods.
Step 6: Replot residuals after changes to confirm improvements.
The Bottom Line
Residual plot numbers are just prediction errors plotted against predictions. The pattern tells you whether your model assumptions hold.
Random scatter = good. Funnels, curves, or outliers = problems your model hasn't solved.
Never trust regression results without checking the residual plot. Summary statistics lie; residuals don't.