How to Compare Statistical Distributions- Techniques and Methods
Why Comparing Distributions Actually Matters
You have two datasets. They look different on paper, but are they statistically different? That's the question distribution comparison answers. Skip this step and you're just guessing.
Comparing distributions tells you whether your data came from the same source, whether a treatment actually changed something, or if you're dealing with outliers that need attention. No fluff needed here—this is a practical skill.
Visual Methods: Start Here, Always
Before running any test, look at your data. This isn't optional. Numbers lie; pictures don't (well, they can if you mess up the scales, but you get the point).
Histograms
Stack two histograms on top of each other or side by side. You'll immediately see differences in:
- Location (where the bulk of data sits)
- Spread (how wide the distribution is)
- Shape (symmetric, skewed, bimodal)
Use the same bin widths for both. Different bin sizes make comparison useless.
Box Plots
Box plots show median, quartiles, and outliers in one glance. When you compare two box plots:
- Do the medians line up?
- Are the boxes the same width?
- Do whisker lengths differ significantly?
- Are there outliers in one dataset but not the other?
This is your fast first pass. Takes 30 seconds and tells you where to dig deeper.
Density Plots
Density plots smooth out the histogram noise. Overlay two density curves and you can spot differences in shape that histograms might miss. Especially useful when sample sizes differ between datasets.
QQ Plots (Quantile-Quantile)
A QQ plot compares the quantiles of your data against a theoretical distribution or against another dataset. If points fall on the diagonal line, the distributions match. Deviations show exactly where they differ.
For comparing two empirical distributions, use a PP plot or a two-sample QQ plot.
Statistical Tests for Comparing Distributions
Visual inspection is step one. Tests give you numbers to back up what you saw. Here's what actually works.
Kolmogorov-Smirnov Test (K-S Test)
The K-S test compares two empirical distributions. It measures the maximum vertical distance between their cumulative distribution functions.
What it tells you: Whether the two samples come from the same distribution. That's it.
- Doesn't tell you how they differ, just that they do
- Sensitive to differences in location and scale
- Works for any continuous distribution
Use it when you need a binary answer: same or different.
Anderson-Darling Test
This is the K-S test's stricter cousin. It weights differences in the tails more heavily.
- Better at detecting differences in the tails of distributions
- More powerful than K-S for many common distribution types
- Available for testing against specific distributions (normal, exponential, etc.)
Use this when the tails matter—finance data, extreme events, that kind of thing.
Mann-Whitney U Test (Wilcoxon Rank-Sum)
This tests whether one distribution is stochastically greater than the other. It's like a t-test but doesn't assume normality.
- Compares ranks of values rather than values themselves
- Tells you if one group tends to have larger values
- Doesn't compare shapes—just stochastic dominance
Good when you want to know if one group is generally higher, regardless of distribution shape.
Chi-Square Test
For categorical or binned data, the chi-square test compares observed frequencies against expected frequencies.
- Requires binning continuous data first
- Sensitive to bin width choice
- Works for any distribution shape
Avoid binning if you have enough data for continuous tests. You're throwing away information.
Kruskal-Wallis Test
Extension of Mann-Whitney for comparing more than two groups. Use this when you have three or more distributions to compare at once.
Summary Statistics That Actually Matter
Don't just look at the mean. Mean tells you location. You need more.
Central Tendency
- Mean: Average. Misleading with outliers or skewed data
- Median: 50th percentile. More robust
- Mode: Most frequent value. Useful for multimodal data
Spread
- Variance/Standard Deviation: Average squared deviation from mean
- Interquartile Range (IQR): Range between 25th and 75th percentile. Not affected by outliers
- Range: Max minus min. Tell you about extremes
Shape
- Skewness: Positive = right-tailed, Negative = left-tailed. Zero = symmetric
- Kurtosis: How heavy the tails are compared to normal. Higher = more extreme values
Compare these stats side by side. If means differ but medians don't, you have outliers. If variances differ but means don't, the spread is what matters.
Comparing Specific Distribution Types
Against the Normal Distribution
- Shapiro-Wilk test: Best power for normality testing. Use this first.
- D'Agostino-Pearson test: Uses skewness and kurtosis. Good for larger samples.
- Lilliefors test: Like K-S but accounts for estimated parameters.
Comparing Two Empirical Distributions
- K-S test for any difference
- Anderson-Darling for weighted tail differences
- Permutation test for exact comparison (computationally expensive)
Comparing Multiple Groups
- Kruskal-Wallis (non-parametric alternative to ANOVA)
- F-test for variance comparison (but watch the assumptions)
- Levene's test for equality of variances—more robust than F-test
Quick Reference: Which Test When
| Your Goal | Test to Use | Assumptions |
|---|---|---|
| Any difference between two groups | Kolmogorov-Smirnov | Continuous data |
| Difference in tails | Anderson-Darling | Continuous data |
| One group tends to be higher | Mann-Whitney U | Ordinal data acceptable |
| Test against normal | Shapiro-Wilk | Random sampling, n between 3-5000 |
| Compare variances | Levene's Test | Approximately normal |
| Compare 3+ groups | Kruskal-Wallis | Independent samples |
| Categorical/binned data | Chi-Square | Expected frequencies > 5 |
Getting Started: Step-by-Step
Here's how to actually do this in practice.
Step 1: Plot First
Create histograms or density plots of both datasets. Same axes, same scales. Look for obvious differences in location, spread, and shape.
Step 2: Calculate Summary Stats
Get mean, median, variance, skewness, and kurtosis for both. Write them down side by side.
Step 3: Choose Your Test
Based on what you saw and what you want to know:
- Any difference? → K-S test
- Difference in tails? → Anderson-Darling
- One group higher? → Mann-Whitney
- Against normal? → Shapiro-Wilk
Step 4: Run the Test
Use Python, R, or any stats software. Get the test statistic and p-value.
Step 5: Interpret
P-value below your threshold (usually 0.05) means the distributions are significantly different. But remember: statistical significance isn't practical significance. A tiny difference can be statistically significant with large samples.
Step 6: Quantify the Difference
If they're different, quantify how. Effect size matters. Common measures:
- Cohen's d for location differences
- Ratio of variances for spread differences
- Overlap coefficient for overall similarity
Common Mistakes to Avoid
- Testing without plotting: You'll miss obvious issues like bimodality or outliers
- Ignoring sample size: Large samples make tiny differences significant. Look at effect size
- Using the wrong test: Non-normal data with a t-test. Just don't.
- Multiple comparisons without correction: Testing 20 things at p=0.05 gives you one false positive on average
- Confusing statistical and practical significance: A 0.01 difference in means can be "significant" with n=100,000
Tools That Do This
- Python: SciPy (scipy.stats), statsmodels. Everything you need, free.
- R: Built-in stats package, nortest for normality tests
- JASP: Free, point-and-click, good for learning
- SPSS: Expensive, but if your institution has a license
For most work, Python with SciPy is enough. The documentation is solid and the tests are implemented correctly.
The Bottom Line
Compare distributions by plotting first, then testing. Visual inspection catches things tests miss. Choose your test based on what you actually want to know—not what's convenient. And always report effect sizes alongside p-values.
No single test works for everything. Know what each test is actually testing. The K-S test and Mann-Whitney test answer different questions, even though people treat them interchangeably. Read the documentation. Check the assumptions.