Standard Deviation Explained- Calculation and Use
What Standard Deviation Actually Is
Standard deviation measures how spread out numbers are from their average (mean). That's it. Nothing fancy.
If you have a dataset where every value is close to the mean, your standard deviation will be small. If your values are scattered all over the place, the standard deviation will be large.
This is useful because knowing just the mean tells you nothing about variability. Two datasets can have identical averages but completely different spreads. Standard deviation captures that difference.
Population vs Sample Standard Deviation
You need to know which one you're calculating because the formula differs.
Population standard deviation uses every single data point you have. You divide by N (the total count).
Sample standard deviation estimates the standard deviation of a larger population based on a sample. You divide by N-1 instead of N. This correction (Bessel's correction) gives you a more accurate estimate when you can't measure everyone.
In practice, most statistical software defaults to sample standard deviation unless you explicitly say otherwise.
The Formula (And Why It Looks Weird)
The population standard deviation formula is:
σ = √[Σ(xi - μ)² / N]
Here's what each part means:
- σ = standard deviation
- xi = each individual value
- μ = the mean
- Σ = sum everything
- N = number of values
You subtract the mean from each value, square the result, add them all up, divide by how many values you have, then take the square root. The squaring step removes negatives (a value 5 below the mean contributes the same as one 5 above the mean). The square root at the end brings the units back to the original scale.
The sample standard deviation formula is identical except you use N-1 instead of N.
Step-by-Step Calculation
Let's calculate the standard deviation for this dataset: 2, 4, 4, 4, 5, 5, 7, 9
Step 1: Find the Mean
Add everything up and divide by how many numbers you have.
(2 + 4 + 4 + 4 + 5 + 5 + 7 + 9) / 8 = 40 / 8 = 5
Step 2: Subtract the Mean from Each Value
2 - 5 = -3
4 - 5 = -1
4 - 5 = -1
4 - 5 = -1
5 - 5 = 0
5 - 5 = 0
7 - 5 = 2
9 - 5 = 4
Step 3: Square Each Result
9, 1, 1, 1, 0, 0, 4, 16
Step 4: Add the Squared Values
9 + 1 + 1 + 1 + 0 + 0 + 4 + 16 = 32
Step 5: Divide by N (or N-1 for a sample)
Assuming this is the full population: 32 / 8 = 4
Step 6: Take the Square Root
√4 = 2
The standard deviation is 2.
What the Numbers Actually Mean
A standard deviation of 2 on its own tells you nothing. You need context.
In a normal distribution (bell curve), roughly 68% of data falls within one standard deviation of the mean. About 95% falls within two standard deviations. About 99.7% falls within three.
Using our example: mean is 5, standard deviation is 2. So roughly 68% of values are between 3 and 7.
This is where standard deviation becomes practical. It lets you predict where most values will land.
Population vs Sample: When to Use Which
| Situation | Use | Divide By |
|---|---|---|
| You have every data point (entire population) | Population SD | N |
| You're working with a sample to estimate a larger population | Sample SD | N-1 |
| Quality control on all products made | Population SD | N |
| Surveying 500 people to estimate opinions of millions | Sample SD | N-1 |
Most business and research situations involve samples. You're measuring a subset to make inferences about a larger group.
Common Mistakes to Avoid
Using population formula on samples. This underestimates variability. Your standard deviation will be artificially small.
Confusing standard deviation with variance. Variance is the standard deviation squared. It's the intermediate step before you take the square root. Standard deviation is in the same units as your data; variance is in squared units.
Forgetting that standard deviation measures spread around the mean. If your data isn't centered around a meaningful average, standard deviation won't tell you much.
Assuming all data follows a normal distribution. Standard deviation assumes a bell curve. If your data is heavily skewed or has outliers, the 68-95-99.7 rule doesn't apply.
How to Calculate in Excel or Google Sheets
You don't need to do this by hand. Excel has built-in functions.
- =STDEV.P(range) — population standard deviation
- =STDEV.S(range) — sample standard deviation
Select your data range, pick the right function, done. The P version divides by N. The S version divides by N-1.
How to Calculate in Python
Python's numpy library makes this trivial:
import numpy as np
data = [2, 4, 4, 4, 5, 5, 7, 9]
np.std(data) # population SD
np.std(data, ddof=1) # sample SD (ddof=1 uses N-1)
When Standard Deviation Is Useless
Standard deviation fails in several common situations:
- Highly skewed data — outliers and asymmetric distributions distort it
- Data with natural boundaries — test scores from 0-100, ratings from 1-5
- Small samples — with fewer than about 30 data points, standard deviation becomes unreliable
- Non-continuous data — counts, categories, binary outcomes
In these cases, use the interquartile range, median absolute deviation, or other robust measures instead.
Quick Reference
| What You Have | Formula | In Excel |
|---|---|---|
| All data points | √[Σ(x-μ)² / N] | =STDEV.P() |
| A sample | √[Σ(x-x̄)² / (N-1)] | =STDEV.S() |
Standard deviation is a tool. Like any tool, it works well in the right situation and poorly in the wrong one. Know when to use it.