183 lines
8.8 KiB
Markdown
183 lines
8.8 KiB
Markdown
---
|
||
title: "I Built a Rating System for My Pickleball League (And Definitely Didn't Cook the Books)"
|
||
date: 2026-02-26
|
||
draft: true
|
||
tags: ["pickleball", "rating systems", "ELO", "statistics"]
|
||
categories: ["Projects", "Recreation"]
|
||
---
|
||
|
||
After running my pickleball league with Glicko-2 for over a month, I realized the system had problems. So I did what any reasonable person would do: I threw it out and rebuilt it from scratch with an ELO system.
|
||
|
||
And yes, I happen to be the biggest beneficiary of the change. Coincidence? Probably. Let me explain the math, and you can be the judge.
|
||
|
||
## The Problem: Glicko-2 Was Overkill
|
||
|
||
Glicko-2 is a sophisticated rating system designed for competitive chess. It tracks three values per player:
|
||
|
||
- **Rating** — Your skill estimate (default: 1500)
|
||
- **Rating Deviation** — How *uncertain* the system is about your skill
|
||
- **Volatility** — How *consistent* you are
|
||
|
||
The math involves converting to different scales, computing probabilities with hyperbolic functions, and solving iteratively for new volatility. It's clever, but for a casual league of six players, it's like bringing a sports car to a parking lot.
|
||
|
||
But the real problem was this: I added a *margin bonus* to account for wins by different margins (winning 11-9 vs 11-2). The formula?
|
||
|
||
```
|
||
weighted_score = base_score + tanh(margin/11 × 0.3) × (base_score - 0.5)
|
||
```
|
||
|
||
**Translation:** I took the hyperbolic tangent of a fraction, multiplied by an arbitrary constant (why 0.3? No particular reason), and called it science.
|
||
|
||
This is what's known as "making stuff up." It had no theoretical basis and was impossible to explain to players.
|
||
|
||
## The Doubles Problem
|
||
|
||
The old system calculated team ratings by averaging both partners' ratings. Sounds reasonable, right?
|
||
|
||
Until you think about it: If you (1400) play with a strong partner (1700) against two 1550s, the system thinks it's an even match. But *you* were carried by a stronger player! Winning that match shouldn't boost your rating as much as winning with a weaker partner.
|
||
|
||
The system didn't account for partner strength, making it unfair for everyone.
|
||
|
||
## Enter: Pure ELO
|
||
|
||
ELO is elegantly simple. Every player has *one number* representing their skill. When two players compete:
|
||
|
||
1. Calculate the probability that one player beats the other based on rating difference
|
||
2. Compare expected performance to actual performance
|
||
3. Adjust ratings based on the difference
|
||
|
||
The key formula is:
|
||
|
||
```
|
||
Expected Win Probability = 1 / (1 + 10^((opponent_rating - your_rating) / 400))
|
||
```
|
||
|
||
If you're 1500 and your opponent is 1500, you should win 50% of the time. If you're 1600 and they're 1500, you should win about 64% of the time. Simple.
|
||
|
||
After a match:
|
||
|
||
```
|
||
Rating Change = K × (Actual Performance - Expected Performance)
|
||
```
|
||
|
||
Where `K = 32` (how much weight each match carries) and `Actual Performance` is your *per-point performance*:
|
||
|
||
```
|
||
Actual Performance = Points Scored / Total Points Played
|
||
```
|
||
|
||
Win 11-9? That's 0.55 (55% of points). Win 11-2? That's 0.846 (84.6%). This captures match quality far better than binary win/loss.
|
||
|
||
## The Secret Sauce: The Effective Opponent Formula
|
||
|
||
In doubles, we use:
|
||
|
||
```
|
||
Effective Opponent Rating = Opponent1 + Opponent2 - Your Teammate
|
||
```
|
||
|
||
**Why this works:**
|
||
|
||
If your teammate is strong, the effective opponent rating drops—because your teammate made the match easier. If your teammate is weak, the effective opponent rating rises—because you were undermanned.
|
||
|
||
Beating 1500-rated opponents with a 1600-rated partner? Effective opponent: 1400. You gain less because your partner carried you.
|
||
|
||
Beating 1500-rated opponents with a 1400-rated partner? Effective opponent: 1600. You gain more because you did heavy lifting.
|
||
|
||
This is *fair*.
|
||
|
||
## The Migration: Before and After
|
||
|
||
Here's where things get spicy. I replayed all 29 historical matches through the new ELO system:
|
||
|
||
<table style="width: 100%; border-collapse: collapse; margin: 20px 0;">
|
||
<tr style="background-color: #f0f0f0;">
|
||
<th style="border: 1px solid #ddd; padding: 10px; text-align: left;">Player</th>
|
||
<th style="border: 1px solid #ddd; padding: 10px; text-align: right;">Old Glicko-2</th>
|
||
<th style="border: 1px solid #ddd; padding: 10px; text-align: right;">New ELO</th>
|
||
<th style="border: 1px solid #ddd; padding: 10px; text-align: right;">Change</th>
|
||
<th style="border: 1px solid #ddd; padding: 10px; text-align: right;">Matches</th>
|
||
</tr>
|
||
<tr>
|
||
<td style="border: 1px solid #ddd; padding: 10px;">Andrew Stricklin</td>
|
||
<td style="border: 1px solid #ddd; padding: 10px; text-align: right;">1651</td>
|
||
<td style="border: 1px solid #ddd; padding: 10px; text-align: right;">1538</td>
|
||
<td style="border: 1px solid #ddd; padding: 10px; text-align: right;"><span style="color: #c80000;">−113</span></td>
|
||
<td style="border: 1px solid #ddd; padding: 10px; text-align: right;">19</td>
|
||
</tr>
|
||
<tr style="background-color: #fafafa;">
|
||
<td style="border: 1px solid #ddd; padding: 10px;">David Pabst</td>
|
||
<td style="border: 1px solid #ddd; padding: 10px; text-align: right;">1562</td>
|
||
<td style="border: 1px solid #ddd; padding: 10px; text-align: right;">1522</td>
|
||
<td style="border: 1px solid #ddd; padding: 10px; text-align: right;"><span style="color: #c80000;">−40</span></td>
|
||
<td style="border: 1px solid #ddd; padding: 10px; text-align: right;">11</td>
|
||
</tr>
|
||
<tr>
|
||
<td style="border: 1px solid #ddd; padding: 10px;">Jacklyn Wyszynski</td>
|
||
<td style="border: 1px solid #ddd; padding: 10px; text-align: right;">1557</td>
|
||
<td style="border: 1px solid #ddd; padding: 10px; text-align: right;">1514</td>
|
||
<td style="border: 1px solid #ddd; padding: 10px; text-align: right;"><span style="color: #c80000;">−43</span></td>
|
||
<td style="border: 1px solid #ddd; padding: 10px; text-align: right;">9</td>
|
||
</tr>
|
||
<tr style="background-color: #fafafa;">
|
||
<td style="border: 1px solid #ddd; padding: 10px;">Eliana Crew</td>
|
||
<td style="border: 1px solid #ddd; padding: 10px; text-align: right;">1485</td>
|
||
<td style="border: 1px solid #ddd; padding: 10px; text-align: right;">1497</td>
|
||
<td style="border: 1px solid #ddd; padding: 10px; text-align: right;"><span style="color: #00640a;">+11</span></td>
|
||
<td style="border: 1px solid #ddd; padding: 10px; text-align: right;">13</td>
|
||
</tr>
|
||
<tr>
|
||
<td style="border: 1px solid #ddd; padding: 10px;">Krzysztof Radziszeski</td>
|
||
<td style="border: 1px solid #ddd; padding: 10px; text-align: right;">1473</td>
|
||
<td style="border: 1px solid #ddd; padding: 10px; text-align: right;">1476</td>
|
||
<td style="border: 1px solid #ddd; padding: 10px; text-align: right;"><span style="color: #00640a;">+3</span></td>
|
||
<td style="border: 1px solid #ddd; padding: 10px; text-align: right;">25</td>
|
||
</tr>
|
||
<tr style="background-color: #fafafa;">
|
||
<td style="border: 1px solid #ddd; padding: 10px;"><strong>Dane Sabo</strong></td>
|
||
<td style="border: 1px solid #ddd; padding: 10px; text-align: right;">1290</td>
|
||
<td style="border: 1px solid #ddd; padding: 10px; text-align: right;">1449</td>
|
||
<td style="border: 1px solid #ddd; padding: 10px; text-align: right;"><strong><span style="color: #00640a;">+159</span></strong></td>
|
||
<td style="border: 1px solid #ddd; padding: 10px; text-align: right;">25</td>
|
||
</tr>
|
||
</table>
|
||
|
||
### Observations
|
||
|
||
**The Rating Spread Compressed**
|
||
|
||
The old system spread players across 361 rating points. The new system compresses them into 89 points. This makes sense—we're a recreational group, not chess grandmasters. The new system rates us fairly within a tighter band.
|
||
|
||
**The Winners**
|
||
|
||
- **Dane Sabo**: +159 points. The old system penalized him for losses with weaker partners. The effective opponent formula gives credit for "carrying." (Purely coincidental that I benefit from my own math.)
|
||
- **Eliana Crew**: +11 points
|
||
- **Krzysztof Radziszeski**: +3 points
|
||
|
||
**The Losers**
|
||
|
||
- **Andrew Stricklin**: −113 points. Still ranked #1, but the old system over-credited wins with strong partners.
|
||
- **Jacklyn Wyszynski**: −43 points
|
||
- **David Pabst**: −40 points
|
||
|
||
## A Note on Conflicts of Interest
|
||
|
||
You may notice that the system designer (me) is also the biggest beneficiary of the new ratings, gaining a convenient 159 points.
|
||
|
||
I want to assure you this is *purely coincidental* and the result of *rigorous mathematical analysis*, not at all influenced by the fact that I was tired of being ranked last.
|
||
|
||
The new formulas are based on *sound theoretical principles* that just *happen* to conclude I was being unfairly penalized all along.
|
||
|
||
*Trust the math.* 😉
|
||
|
||
## Why This System Works
|
||
|
||
**For a small league:**
|
||
- Simple to understand (one rating per player)
|
||
- Fair to individual skill (per-point scoring)
|
||
- Respects partnership (effective opponent formula)
|
||
- Transparent (you can calculate rating changes yourself)
|
||
- Fast convergence (5-10 matches to stabilize a rating)
|
||
|
||
**The bottom line:** Your rating now reflects your true skill more accurately than before. Even if it means Dane finally looks respectable.
|