Box and Whisker Plot- Displaying Data Distribution

What Is a Box and Whisker Plot?

A box and whisker plot is a graph that shows how data is spread out. It displays the minimum, maximum, median, and quartiles all in one view. That's it. That's the whole point.

You see it as a box with lines sticking out from both sides. The box represents the middle half of your data. The lines (whiskers) show the rest of the spread.

Why Bother With Box Plots?

Bar charts lie to you. Line graphs hide your outliers. Scatter plots scatter everything everywhere.

Box plots show you:

If you've ever calculated a mean and thought "that doesn't feel right," a box plot would have shown you why. Outliers pull means around. Box plots expose that manipulation.

Anatomy of a Box Plot

The Five Numbers You Must Know

Every box plot encodes five numbers:

The Box Itself

The box stretches from Q1 to Q3. This distance is called the Interquartile Range (IQR). It contains the middle 50% of your data. If your data is a room, the box is where the actual normal people sit.

The Whiskers

Whiskers extend from the box to the minimum and maximum values. Some tools extend them to 1.5× the IQR. Anything beyond that? Those are outliers, shown as dots or asterisks.

Reading the Spread

A short box means your data clusters tightly. A tall box means your data is all over the place. Whiskers of different lengths? Your data is skewed in that direction.

Comparing Distributions Made Simple

Box plots shine when comparing groups. Put two or more boxes side by side and you can instantly see which group has higher values, more variation, or more outliers.

Compare test scores between classes, sales figures across regions, or response times between servers. Box plots make the comparison visual and immediate.

When Box Plots Fall Short

Box plots hide the sample size. A box built from 10 points looks identical to one built from 10,000. That's dangerous if you don't know your n.

They also hide the distribution shape. A bimodal distribution (two peaks) looks like a normal distribution in a box plot. You won't see that second peak hiding in there.

For small datasets under 10 points, just show the actual data points. A box plot on 5 values is pointless.

Tools for Creating Box Plots

Tool Best For Learning Curve
Python (matplotlib/seaborn) Automating analysis, large datasets Medium
R (ggplot2) Statistical work, publications Medium
Excel Quick business charts Low
Google Sheets Collaborative, free option Low
Tableau Dashboards, interactive viz Medium
Online generators One-off plots, no install Very Low

How to Create a Box Plot (Getting Started)

In Excel

Select your data. Go to Insert → Chart → Box and Whisker. Excel does the math for you. Format the chart, add a title, done.

If you don't see the Box and Whisker option, you're probably on an older Excel version. Upgrade or use a workaround with stacked bar charts.

In Python

Two lines get you a box plot:

import matplotlib.pyplot as plt

plt.boxplot(data)
plt.show()

That's the basic version. Add labels, multiple datasets, horizontal orientation, and notch options as needed.

In Google Sheets

Select data → Insert → Chart. In the chart editor, change chart type to Box plot. Google Sheets calls them "candlestick charts" sometimes. Same thing.

Reading a Box Plot: A Real Example

Let's say you're looking at monthly salaries at two companies:

Company A: Box from $45k to $65k, median at $52k, whiskers from $38k to $85k

Company B: Box from $48k to $58k, median at $53k, whiskers from $46k to $62k

Company A has a wider salary range and a few people making bank ($85k). Company B pays more consistently. The median is similar, but the spread tells a different story. This is what box plots reveal that averages hide.

Horizontal vs. Vertical

Doesn't matter for the data. Matters for your labels. Long category names? Use horizontal. Short labels with many categories? Vertical works fine. Flip it if your audience reads better that way.

The Bottom Line

Box and whisker plots are not fancy. They're not trendy. But they show you the truth about your data's spread in a single glance. Learn to read them. Learn to make them. Use them when you need to compare distributions or find outliers wrecking your analysis.

That's all you need. Start using them.