Creating a Perfect Circle Scatter Plot- Data Visualization Guide
What Is a Circle Scatter Plot?
A circle scatter plot is a scatter plot where each data point is rendered as a circle. The circle's position on the X and Y axes represents two variables, and the circle size typically represents a third variable. This makes it a bubble chart in all but name.
Most people call it a bubble chart when size matters. They call it a circle scatter plot when size is more decorative or uniform. The distinction is blurry, and it doesn't matter what you call it.
When Circle Size Actually Helps
Circle size works when:
- You're visualizing three variables at once (X position, Y position, size)
- The size differences are substantial enough to see without squinting
- You have fewer than 500 data points
- The size variable has a meaningful relationship to the other two
Circle size fails when:
- You have thousands of points and they overlap into a blob
- Size represents a variable that should be on an axis instead
- You're trying to show precise values—circles are bad at precision
The Tools That Actually Work
Skip the fancy stuff unless you need it. Here's what works:
- Matplotlib (Python) — Fast, customizable, free. The go-to for most data work.
- Plotly — Interactive, looks good out of the box, works in Python and R and JavaScript.
- R ggplot2 — Powerful grammar of graphics, but the learning curve is real.
- Tableau — Drag-and-drop simplicity. Costs money but saves time.
- D3.js — Full control if you know JavaScript. Overkill for simple charts.
Comparison: Python Libraries for Circle Scatter Plots
| Library | Ease of Use | Interactivity | Best For |
|---|---|---|---|
| Matplotlib | Medium | None by default | Static reports, publications |
| Plotly | Easy | Built-in | Dashboards, web apps |
| Seaborn | Easy | None | Quick exploratory plots |
| Altair | Easy | Built-in | Declarative, Vega-Light backend |
How to Create One in Python (Matplotlib)
This is the fastest way to get a working circle scatter plot:
import matplotlib.pyplot as plt
import numpy as np
# Sample data
x = np.random.rand(50)
y = np.random.rand(50)
sizes = np.random.rand(50) * 500 # Size in points squared
plt.figure(figsize=(10, 6))
plt.scatter(x, y, s=sizes, alpha=0.6, c='steelblue', edgecolors='navy')
plt.xlabel('X Variable')
plt.ylabel('Y Variable')
plt.title('Circle Scatter Plot Example')
plt.show()
The s parameter controls size. The alpha parameter makes overlapping circles visible. Without alpha, overlapping circles hide each other and your plot becomes useless.
How to Create One in Plotly (Interactive)
Plotly gives you hover tooltips and zoom without extra work:
import plotly.express as px
import pandas as pd
df = pd.DataFrame({
'x': [1, 2, 3, 4, 5],
'y': [10, 15, 12, 18, 20],
'size': [30, 80, 50, 120, 90]
})
fig = px.scatter(df, x='x', y='y', size='size',
title='Interactive Circle Scatter Plot')
fig.show()
That's it. Plotly handles the scaling, coloring, and interactivity automatically.
Common Mistakes That Ruin Your Plot
1. Circles That Are Too Small
If your reader needs a magnifying glass, your sizes are too small. Scale your sizes so the smallest circle is still visible at normal zoom.
2. No Alpha Transparency
Overlapping circles are invisible without alpha. Use 0.4 to 0.7 depending on how much overlap you expect.
3. Size Encoding That Doesn't Match the Data
If your X variable ranges from 1 to 100, don't make size range from 1 to 10000. The visual weight should reflect the actual magnitude. Square root scaling helps:
sizes = np.sqrt(original_values) * scale_factor
4. Too Many Colors
Use one color with varying alpha, or two colors for categories. More than three colors creates noise, not information.
5. Trying to Show Exact Values
Circles are not precise. If you need precision, use a bar chart or a table. Circle scatter plots are for patterns and relationships, not exact readings.
Sizing: The Math Behind It
Circle area represents the third variable, but most people code by radius by mistake. This distorts your data:
- Area scaling — Correct. Area proportional to value.
- Radius scaling — Wrong. Doubling the radius quadruples the area.
To scale by area correctly:
# If your values are in 'values', scale by square root
sizes = np.sqrt(values) * constant
When to Use a Circle Scatter Plot (And When Not To)
Good use cases:
- Comparing GDP, population, and growth rate across countries
- Visualizing product price, quality rating, and sales volume
- Showing customer segments by acquisition cost, lifetime value, and retention
Bad use cases:
- Time series data (use a line chart)
- Categories with no meaningful size variable (use a regular scatter plot)
- More than 500 points (use a heatmap instead)
Color Coding in Circle Scatter Plots
You can encode a fourth variable with color. Here's how:
import matplotlib.pyplot as plt
import numpy as np
x = np.random.rand(100)
y = np.random.rand(100)
sizes = np.random.rand(100) * 500
colors = np.random.rand(100) # Fourth variable
plt.scatter(x, y, s=sizes, c=colors, cmap='viridis', alpha=0.7)
plt.colorbar(label='Fourth Variable')
plt.show()
Use cmap to pick a color scheme. Use sequential colormaps (like viridis) for continuous variables. Use categorical colors only for distinct groups.
Exporting Your Plot
Matplotlib:
plt.savefig('scatter.png', dpi=300, bbox_inches='tight')
plt.savefig('scatter.pdf', bbox_inches='tight') # Vector, better for print
Plotly:
fig.write_html('scatter.html') # Interactive web embed
fig.write_image('scatter.png', width=1200, height=800) # Static image
The Bottom Line
Circle scatter plots work when you have three variables and fewer than 500 data points. Use alpha to handle overlap. Scale by area, not radius. Keep colors minimal. If your data doesn't fit these constraints, pick a different chart type.
Most people overthink this. Get the data plotted, check if it communicates the pattern clearly, and adjust from there.