Team Strength Scores
A strength score is one value that measures a teams strength. Baked into this score is::
- Offensive efficiency (points scored per 100 possessions)
- Defensive efficiency (points allowed per 100 possessions)
- Opponent strength
- Tempo (so fast and slow teams can be compared fairly)
- Injuries
- Performance at home and away
Example: Duke's 2025 Rating was +39. A 16-seed might be -7. That 46 point gap tells you Duke is much stronger.
Converting Strength to Win Probability
To simulate games, you need probabilities. I use a logistic function (similar to ELO rating systems):
The 11 is a scaling factor. Bigger Strength Score differences → more lopsided probabilities:
- 1-seed (+35) vs 16-seed (−6): ~99% for the 1-seed
- 5-seed (+22) vs 12-seed (+18): ~65% for the 5-seed
- Two equal teams: 50% each
Simulating a Bracket
Each simulation runs through all 63 games in order. For each game, a random number is generated and compared to the win probability. If the random number is less than the probability, the favorite wins. Otherwise, the game is marked as an upset.
Round-by-Round Process
Round 1 — First Round (32 games)
The first round uses sports betting odds to calculate the win probability. For each of the 32 games:
- Look up the probability for that seed matchup (e.g., 1 vs 16 = 99%)
- Generate a random number between 0 and 1
- If random number < probability, the higher seed wins. Otherwise, upset.
- Winner advances to Round 2
Example: 1-seed vs 16-seed with 99% win probability. Random number = 0.842. Since 0.842 < 0.99, the 1-seed wins and advances.
Rounds 2-6 — Round of 32, 16, 8, 4 and 2 (31 games)
In rounds 2-6, winners face each other. Now probabilities are calculated dynamically based on the teams that advanced from Round 1 and their Strength Scores.
- Take the Strength Scores of both teams that won in Round 1
- Calculate win probability using the logistic formula: P(A wins) = 1 / (1 + e−(StrengthA − StrengthB) / 11)
- Generate random number and simulate the outcome
- Winner advances to Round 3
Example: Auburn (Strength +35) plays Michigan (Strength +22). Difference = 13 points. Probability ≈ 78% for Auburn. Random number determines outcome.
Why Brackets Differ
Every simulation uses the same probabilities but different random numbers. An 80% favorite still loses 20% of the time. When you run a billion simulations, you get billions of different random sequences, which produces billions of unique brackets.
Early upsets change everything downstream. If a 12-seed beats a 5-seed in Round 1, that 12-seed faces different opponents (with different Strength Score values) in Round 2, which changes all subsequent probabilities in that region.
Storing a Trillion Brackets
Each bracket is 63 games (win/loss for each matchup). This fits perfectly in 63 bits of a 64-bit integer. In round 1, if the Bit = 0 it means the top-seeded team won. If the Bit = 1 it means there was an upset.
Storage math: 1 trillion brackets × 8 bytes each = ~8 terabytes of raw data.
Numba JIT compiler runs simulations across multiple CPU cores. Generates millions of brackets per second.
Data split into 8 GB shard files. Makes it easier to write in parallel and store in cloud object storage.
Each bracket stored as a single 64-bit unsigned integer. Bit positions correspond to specific games.
Verification
To prove brackets existed before the tournament, a Merkle Tree is used.
How It Works
- Split the trillion brackets into chunks (e.g., 8 MB each)
- Hash each chunk with BLAKE3 (produces a 32-byte fingerprint)
- Build a binary tree where each parent = hash(left child + right child)
- Publish the single root hash before the tournament starts
The root hash commits to the entire dataset. You can't change a single bit in any bracket without changing the root hash. To prove a specific bracket exists in the dataset, provide a Merkle proof (~1 KB) that chains from that bracket's chunk up to the published root.
Technical details on Merkle trees, chunk hashes, and verification.
Verification Documentation →Summary
Inputs: Strength Scores for each team
Simulation: Logistic function converts strength differences to probabilities, Monte Carlo samples outcomes
Storage: 63-bit encoding, sharded files, 8 TB total
Verification: Merkle tree commitment, published root hash
That's it. The system tracks how many of the trillion pre-generated brackets remain perfect as tournament games finish.