Random Number Generators- Are Some Numbers More Common?
What the Heck Is a Random Number Generator Anyway?
A Random Number Generator (RNG) is exactly what it sounds like—a tool that spits out numbers without any predictable pattern. That's the theory anyway. In practice, things get messy fast.
There are two main types: True Random Number Generators (TRNGs) and Pseudo-Random Number Generators (PRNGs). The difference matters more than most people realize.
True Random Number Generators
TRNGs pull randomness from physical phenomena. Radioactive decay. Thermal noise. Atmospheric static. These things are genuinely unpredictable—even with perfect knowledge of the universe's initial conditions, you couldn't forecast the exact timing of a radioactive atom's decay.
Hardware security modules in casinos use these. So do serious cryptographic systems.
Pseudo-Random Number Generators
PRNGs are algorithms. They use mathematical formulas to produce sequences that look random. But they're deterministic—run the same seed twice, get the same sequence.
Most programming languages ship with PRNGs. Python's random module. JavaScript's Math.random(). C's rand(). These aren't pulling entropy from radioactive decay. They're crunching numbers.
The Real Question: Are Some Numbers Actually More Common?
Here's the uncomfortable truth: it depends entirely on the RNG's quality.
A well-designed, properly tested RNG should produce a uniform distribution. Every number in its range has an equal chance of appearing. Over enough iterations, you'll see roughly the same frequency for each value.
But "roughly" does a lot of heavy lifting there.
Why "Bad" RNGs Favor Certain Numbers
Poorly designed or outdated RNGs absolutely have bias. They produce some numbers more frequently than others. This isn't theoretical—it's happened in real systems.
- Modulo bias: If an RNG generates numbers 0-99 but you want 0-60, using simple modulo operations introduces bias. The numbers 0-39 appear more often.
- Weak algorithms: Old RNGs like the Linear Congruential Generator (LCG) with poor parameters can show significant patterns.
- Seeding issues: Predictable seeds produce predictable outputs. Some systems still use timestamps as seeds.
Why Good RNGs Don't
Modern cryptographic RNGs and well-tested statistical RNGs pass rigorous tests for uniformity. The Dieharder test suite and TestU01 check for exactly this kind of bias.
If you're using a reputable RNG with proper implementation, the distribution should be statistically indistinguishable from true randomness over reasonable sample sizes.
When "More Common" Numbers Actually Matter
For most use cases—games, simulations, basic statistics—this doesn't matter. You're not going to notice if 7 appears 0.1% more often than 3.
But there are contexts where it absolutely does:
- Cryptography: Biased RNGs have broken encryption. Predictable keys mean broken security.
- Gambling and lotteries: Regulatory bodies audit RNG implementations. Bias means unfair games.
- Scientific simulations: Monte Carlo methods require proper distribution. Biased inputs corrupt results.
- Statistical sampling: Research validity depends on truly random selection.
Comparing RNG Types
| Type | Source | Predictability | Speed | Typical Use |
|---|---|---|---|---|
| Mersenne Twister | Algorithm | Predictable with seed | Fast | Simulations, games |
| Linear Congruential | Algorithm | Predictable, weak | Very fast | Legacy systems, basic apps |
| Xorshift | Algorithm | Predictable with seed | Fast | Game engines, non-crypto |
| Cryptographically Secure (CSPRNG) | Algorithm + entropy | Computationally infeasible | Slower | Security, cryptography |
| Hardware RNG | Physical noise | Unpredictable | Variable | Casinos, key generation |
| Atmospheric noise API | Natural phenomena | Unpredictable | Network-dependent | Public randomness sources |
How to Test If Your RNG Is Biased
You can run basic tests yourself without a statistics PhD. Here's a practical approach:
The Frequency Test
Generate a large sample (100,000+ numbers) and count occurrences of each value. For a 6-sided dice RNG, you'd expect each number around 16,667 times. If 6 appears 25,000 times, something's wrong.
Python example:
import random
from collections import Counter
# Generate 100,000 "dice rolls"
rolls = [random.randint(1, 6) for _ in range(100000)]
counts = Counter(rolls)
for num, count in sorted(counts.items()):
deviation = abs(count - 16666.67) / 16666.67 * 100
print(f"Number {num}: {count} ({deviation:.2f}% deviation)")
If any number deviates by more than 2-3%, investigate your RNG.
The Chi-Square Test
For more rigorous analysis, run a chi-square test. It tells you whether observed frequencies match expected frequencies statistically.
Most statistics packages include this. If the p-value is below 0.05, your RNG has significant bias.
What to Actually Use
Skip the homebrew RNGs. Here are proven choices:
- Crypto applications: Use
secretsmodule in Python,crypto.getRandomValues()in JavaScript, or OS-level sources like/dev/urandom. - Games and simulations: Mersenne Twister is fine. It's fast and statistically solid for non-security uses.
- Statistical work: Verify your RNG passes Dieharder or TestU01 before trusting it with research data.
The Bottom Line
Some RNGs absolutely produce certain numbers more often. The ones you should worry about are old, improperly implemented, or designed for non-security purposes but used in security contexts.
Quality modern RNGs don't have this problem—at least not at scales you'd notice. But "quality" is doing heavy lifting again. Know what you're using and why. Don't use a Mersenne Twister for key generation. Don't use rand() for anything that matters.
The bias question has a simple answer: it depends on the RNG. A bad one, yes. A good one, no—at least not within statistical significance for most purposes.