Finding Unbiased Estimates in Statistics

What "Unbiased" Actually Means

An unbiased estimator is a statistic that, on average, hits the true population value. Not close. Not approximately right. The expected value equals the parameter you're estimating.

That's it. That's the whole definition.

If you collect samples forever and average your estimates, an unbiased estimator converges to the true value. A biased one doesn't. It systematically overshoots or undershoots, no matter how many times you sample.

Most textbooks bury this under pages of notation. Here's the plain version: E(θ̂) = θ means unbiased. E(θ̂) ≠ θ means biased. Stop there.

Why Bias Matters in Real Analysis

Bias isn't some academic concern. It directly affects your conclusions.

Say you're estimating average customer spend. If your estimator systematically underestimates, you'll underprice your product. If it overestimates, you'll misallocate budget based on inflated revenue projections.

Medical research is worse. A biased estimator of treatment effect doesn't just give you a wrong number—it gives you a wrong decision that affects patient outcomes.

Bias is systematic error. It doesn't average out with larger samples. You can have a million observations and still be wrong.

The Math Behind Unbiased Estimation

An estimator θ̂ of parameter θ is unbiased if:

E(θ̂ − θ) = 0

This equation says the average error is zero. The estimator overshoots sometimes, undershoots sometimes, but over infinite repetitions, it lands exactly on target.

For sample mean: E(x̄) = μ. The sample mean is unbiased for population mean. Always has been, always will be.

For sample variance, it's trickier. The intuitive formula (sum of squared deviations divided by n) is actually biased. You need to divide by n−1 instead.

The denominator matters. This is why your statistics software asks for "sample variance" and outputs a different number than the population variance formula you'd naively apply.

Common Unbiased Estimators You'll Actually Use

These are the estimators you'll encounter most often in practice:

Sample Variance: The Gotcha

The population variance formula divides by N. The sample variance formula divides by n−1. Why?

Because dividing by n systematically underestimates σ². The sample variance is a function of sample data, which varies less than the full population. Using n−1 corrects for this. It's called Bessel's correction, and it's not optional if you want accuracy.

Bias vs. Variance — The Real Tradeoff

Every estimator has bias and variance. You can't minimize both simultaneously. This is the bias-variance tradeoff, and it shapes how you choose estimators in practice.

Scenario Bias Variance Result
Sample mean Zero σ²/n Low MSE, good choice
Population variance (÷n) Negative (underestimates) Low Systematic error
Ridge regression Nonzero Reduced Often lower MSE than OLS
Naive estimator High Low Wrong answer consistently

Sometimes a biased estimator is better. If bias is small but variance drops significantly, the mean squared error (MSE) improves. MSE = Bias² + Variance. An unbiased estimator with huge variance can have worse MSE than a slightly biased one with tight variance.

This is why ridge regression and other shrinkage estimators exist. They introduce bias to reduce variance, and the net effect is better predictions.

How to Check If Your Estimator Is Unbiased

Step 1: Find the expected value of your estimator. This requires mathematical derivation or known results.

Step 2: Compare to the true parameter. If they're equal, it's unbiased.

Step 3: If you can't derive it analytically, use simulation. Generate thousands of samples from a known population, compute your estimator each time, and average the results. If the average equals the true parameter, it's unbiased.

Simulation approach works when you have a proposed estimator but can't prove unbiasedness mathematically. It's not a proof, but it's strong evidence.

Getting Started With Unbiased Estimation

If you're estimating a population parameter and want unbiasedness, here's your workflow:

  1. Identify the parameter — What are you trying to estimate? Mean, variance, proportion, regression coefficient?
  2. Use the standard unbiased estimator — Sample mean for μ, sample variance (÷n−1) for σ², sample proportion for π
  3. Check assumptions — Unbiasedness depends on assumptions holding. OLS is unbiased only if errors have zero mean, constant variance, and no correlation with regressors
  4. Calculate MSE — Verify that unbiasedness actually gives you better MSE than alternatives
  5. Report confidence intervals — Point estimates without uncertainty bounds are useless. Pair your unbiased estimate with a standard error

For most standard problems, the unbiased estimators are already known. You're not inventing anything. Use the established formulas, verify your assumptions, and move on.

Frequently Asked Questions

Can an unbiased estimator ever be worse than a biased one?

Yes. Unbiasedness is a desirable property, but it's not the only one. An unbiased estimator with high variance can have worse MSE than a biased estimator with low variance. Always evaluate the full picture.

Why is sample variance divided by n−1?

Because dividing by n systematically underestimates the population variance. The sample data doesn't capture the full population spread. Dividing by n−1 corrects this underestimation. It's Bessel's correction.

Are maximum likelihood estimators unbiased?

Not always. MLEs are asymptotically unbiased (they converge to the true value as sample size increases), but finite samples can have bias. Sample variance under MLE uses n, not n−1, and is therefore biased.

Does unbiasedness guarantee accuracy?

No. An unbiased estimator can have high variance, meaning individual estimates land far from the truth even though the long-run average is correct. Think of it like this: on average you're right, but each individual shot misses by a lot.

Unbiasedness is about average performance over infinite repetitions. In practice, you often care about single-estimate accuracy, where MSE matters more.