How to Increase Histogram Bins Without Losing Data Points
What Histogram Bins Actually Do
A histogram splits your data into intervals called bins. Each bin counts how many data points fall within its range. The number of bins you choose determines how much detail you see.
Too few bins and you lose resolution. Too many and the signal gets lost in noise. Most default settings give you around 10 bins, which is often not enough for meaningful analysis.
Why Increasing Bins Matters
When you're working with large datasets or data that has fine-grained patterns, 10 bins hide everything. You might have multiple peaks, outliers, or subtle distributions that only appear with more bins.
The catch: increasing bins doesn't create new data. It just shows your existing data with finer granularity. If your sample size is small, too many bins will leave many bins empty or with just one countβmaking the histogram look jagged and misleading.
The Math Behind Bin Selection
There's no universal rule, but these formulas help:
- Sturges' Rule: bins = 1 + 3.3 Γ log(n) β works okay for small datasets
- Freedman-Diaconis Rule: bin width = 2 Γ IQR Γ n^(-1/3) β better for skewed data
- Scott's Rule: bin width = 3.49 Γ Ο Γ n^(-1/3) β minimizes mean squared error
These are starting points, not gospel. Your data's structure matters more than any formula.
How to Increase Bins in Common Tools
Python with Matplotlib
Use the bins parameter. Pass an integer for the number of bins, or a sequence for custom bin edges.
import matplotlib.pyplot as plt
import numpy as np
data = np.random.randn(1000)
# Increase bins to 50
plt.hist(data, bins=50)
plt.show()
# Or specify exact bin edges
plt.hist(data, bins=np.linspace(-4, 4, 50))
plt.show()
R with ggplot2
Use bins in geom_histogram or specify binwidth.
library(ggplot2)
ggplot(data, aes(x = values)) +
geom_histogram(bins = 50) # 50 bins
# Or define bin width instead
ggplot(data, aes(x = values)) +
geom_histogram(binwidth = 0.1)
Excel
Select your data, go to Insert β Histogram. Right-click the horizontal axis β Format Axis. Change the number of bins manually.
Excel defaults to auto-binning which often under-bins. Set it to 30-50 bins for decent resolution on most datasets.
JavaScript with D3.js
const histogram = d3.bin()
.thresholds(50); // number of bins
const bins = histogram(data);
Comparing Bin Selection Methods
| Method | Best For | Downside |
|---|---|---|
| Fixed number of bins | Quick comparisons across datasets | May miss features or over-smooth |
| Fixed bin width | Data with natural scale units | Requires manual width selection |
| Sturges' formula | Small datasets under 1000 points | Underestimates bins for large data |
| Freedman-Diaconis | Skewed distributions | Can produce too many bins |
| Custom bin edges | Data with known thresholds | Requires domain knowledge |
When More Bins Actually Hurts
Increasing bins doesn't mean preserving more data. It means showing your data differently. Here's when you should stop:
- Sample size under 100 β you'll get mostly empty bins
- Bins with counts of 0 or 1 dominate the visualization
- The histogram looks like static rather than a distribution
- You're trying to read distribution shape but the noise is overwhelming
The goal is readability, not maximum bin count. There's a point where detail becomes clutter.
Getting Started: Practical Workflow
Here's how to find the right bin count for your data:
- Start with 20-30 bins on any dataset under 10,000 points
- Check bin counts β empty bins or bins with single counts mean you've gone too far
- Look for patterns β multiple peaks, gaps, or tails only visible at higher resolution
- Compare with fewer bins β if the shape changes drastically, your data might be too sparse for that resolution
- Use domain knowledge β if your data has natural groupings, align bins to them
For most real-world data, 30-100 bins strikes a balance. But always visualize first, trust second.
Bottom Line
You don't "increase bins without losing data points" β data points are either in a bin or they're not. What you can do is choose bin counts that reveal the true structure of your data without introducing visual noise.
Start with more bins than you think you need, then reduce until the signal is clear. Most default histogram settings are too coarse. Bump them up and see what you've been missing.