Box and Whisker Plot- Displaying Data Distribution
What Is a Box and Whisker Plot?
A box and whisker plot is a graph that shows how data is spread out. It displays the minimum, maximum, median, and quartiles all in one view. That's it. That's the whole point.
You see it as a box with lines sticking out from both sides. The box represents the middle half of your data. The lines (whiskers) show the rest of the spread.
Why Bother With Box Plots?
Bar charts lie to you. Line graphs hide your outliers. Scatter plots scatter everything everywhere.
Box plots show you:
- Where your data clusters
- Whether you have outliers dragging your averages
- How skewed your data is
- The actual spread, not just the center
If you've ever calculated a mean and thought "that doesn't feel right," a box plot would have shown you why. Outliers pull means around. Box plots expose that manipulation.
Anatomy of a Box Plot
The Five Numbers You Must Know
Every box plot encodes five numbers:
- Minimum — lowest value (excluding outliers)
- Q1 (First Quartile) — 25% of data falls below this point
- Median (Q2) — the middle value, 50% below, 50% above
- Q3 (Third Quartile) — 75% of data falls below this point
- Maximum — highest value (excluding outliers)
The Box Itself
The box stretches from Q1 to Q3. This distance is called the Interquartile Range (IQR). It contains the middle 50% of your data. If your data is a room, the box is where the actual normal people sit.
The Whiskers
Whiskers extend from the box to the minimum and maximum values. Some tools extend them to 1.5× the IQR. Anything beyond that? Those are outliers, shown as dots or asterisks.
Reading the Spread
A short box means your data clusters tightly. A tall box means your data is all over the place. Whiskers of different lengths? Your data is skewed in that direction.
Comparing Distributions Made Simple
Box plots shine when comparing groups. Put two or more boxes side by side and you can instantly see which group has higher values, more variation, or more outliers.
Compare test scores between classes, sales figures across regions, or response times between servers. Box plots make the comparison visual and immediate.
When Box Plots Fall Short
Box plots hide the sample size. A box built from 10 points looks identical to one built from 10,000. That's dangerous if you don't know your n.
They also hide the distribution shape. A bimodal distribution (two peaks) looks like a normal distribution in a box plot. You won't see that second peak hiding in there.
For small datasets under 10 points, just show the actual data points. A box plot on 5 values is pointless.
Tools for Creating Box Plots
| Tool | Best For | Learning Curve |
|---|---|---|
| Python (matplotlib/seaborn) | Automating analysis, large datasets | Medium |
| R (ggplot2) | Statistical work, publications | Medium |
| Excel | Quick business charts | Low |
| Google Sheets | Collaborative, free option | Low |
| Tableau | Dashboards, interactive viz | Medium |
| Online generators | One-off plots, no install | Very Low |
How to Create a Box Plot (Getting Started)
In Excel
Select your data. Go to Insert → Chart → Box and Whisker. Excel does the math for you. Format the chart, add a title, done.
If you don't see the Box and Whisker option, you're probably on an older Excel version. Upgrade or use a workaround with stacked bar charts.
In Python
Two lines get you a box plot:
import matplotlib.pyplot as plt plt.boxplot(data) plt.show()
That's the basic version. Add labels, multiple datasets, horizontal orientation, and notch options as needed.
In Google Sheets
Select data → Insert → Chart. In the chart editor, change chart type to Box plot. Google Sheets calls them "candlestick charts" sometimes. Same thing.
Reading a Box Plot: A Real Example
Let's say you're looking at monthly salaries at two companies:
Company A: Box from $45k to $65k, median at $52k, whiskers from $38k to $85k
Company B: Box from $48k to $58k, median at $53k, whiskers from $46k to $62k
Company A has a wider salary range and a few people making bank ($85k). Company B pays more consistently. The median is similar, but the spread tells a different story. This is what box plots reveal that averages hide.
Horizontal vs. Vertical
Doesn't matter for the data. Matters for your labels. Long category names? Use horizontal. Short labels with many categories? Vertical works fine. Flip it if your audience reads better that way.
The Bottom Line
Box and whisker plots are not fancy. They're not trendy. But they show you the truth about your data's spread in a single glance. Learn to read them. Learn to make them. Use them when you need to compare distributions or find outliers wrecking your analysis.
That's all you need. Start using them.