--- title: "I Built a Rating System for My Pickleball League (And Definitely Didn't Cook the Books)" date: 2026-02-26 draft: true tags: ["pickleball", "rating systems", "ELO", "statistics"] categories: ["Projects", "Recreation"] --- After running my pickleball league with Glicko-2 for over a month, I realized the system had problems. So I did what any reasonable person would do: I threw it out and rebuilt it from scratch with an ELO system. And yes, I happen to be the biggest beneficiary of the change. Coincidence? Probably. Let me explain the math, and you can be the judge. ## The Problem: Glicko-2 Was Overkill Glicko-2 is a sophisticated rating system designed for competitive chess. It tracks three values per player: - **Rating** — Your skill estimate (default: 1500) - **Rating Deviation** — How *uncertain* the system is about your skill - **Volatility** — How *consistent* you are The math involves converting to different scales, computing probabilities with hyperbolic functions, and solving iteratively for new volatility. It's clever, but for a casual league of six players, it's like bringing a sports car to a parking lot. But the real problem was this: I added a *margin bonus* to account for wins by different margins (winning 11-9 vs 11-2). The formula? ``` weighted_score = base_score + tanh(margin/11 × 0.3) × (base_score - 0.5) ``` **Translation:** I took the hyperbolic tangent of a fraction, multiplied by an arbitrary constant (why 0.3? No particular reason), and called it science. This is what's known as "making stuff up." It had no theoretical basis and was impossible to explain to players. ## The Doubles Problem The old system calculated team ratings by averaging both partners' ratings. Sounds reasonable, right? Until you think about it: If you (1400) play with a strong partner (1700) against two 1550s, the system thinks it's an even match. But *you* were carried by a stronger player! Winning that match shouldn't boost your rating as much as winning with a weaker partner. The system didn't account for partner strength, making it unfair for everyone. ## Enter: Pure ELO ELO is elegantly simple. Every player has *one number* representing their skill. When two players compete: 1. Calculate the probability that one player beats the other based on rating difference 2. Compare expected performance to actual performance 3. Adjust ratings based on the difference The key formula is: ``` Expected Win Probability = 1 / (1 + 10^((opponent_rating - your_rating) / 400)) ``` If you're 1500 and your opponent is 1500, you should win 50% of the time. If you're 1600 and they're 1500, you should win about 64% of the time. Simple. After a match: ``` Rating Change = K × (Actual Performance - Expected Performance) ``` Where `K = 32` (how much weight each match carries) and `Actual Performance` is your *per-point performance*: ``` Actual Performance = Points Scored / Total Points Played ``` Win 11-9? That's 0.55 (55% of points). Win 11-2? That's 0.846 (84.6%). This captures match quality far better than binary win/loss. ## The Secret Sauce: The Effective Opponent Formula In doubles, we use: ``` Effective Opponent Rating = Opponent1 + Opponent2 - Your Teammate ``` **Why this works:** If your teammate is strong, the effective opponent rating drops—because your teammate made the match easier. If your teammate is weak, the effective opponent rating rises—because you were undermanned. Beating 1500-rated opponents with a 1600-rated partner? Effective opponent: 1400. You gain less because your partner carried you. Beating 1500-rated opponents with a 1400-rated partner? Effective opponent: 1600. You gain more because you did heavy lifting. This is *fair*. ## The Migration: Before and After Here's where things get spicy. I replayed all 29 historical matches through the new ELO system:

Player	Old Glicko-2	New ELO	Change	Matches
Andrew Stricklin	1651	1538	−113	19
David Pabst	1562	1522	−40	11
Jacklyn Wyszynski	1557	1514	−43	9
Eliana Crew	1485	1497	+11	13
Krzysztof Radziszeski	1473	1476	+3	25
Dane Sabo	1290	1449	+159	25

### Observations **The Rating Spread Compressed** The old system spread players across 361 rating points. The new system compresses them into 89 points. This makes sense—we're a recreational group, not chess grandmasters. The new system rates us fairly within a tighter band. **The Winners** - **Dane Sabo**: +159 points. The old system penalized him for losses with weaker partners. The effective opponent formula gives credit for "carrying." (Purely coincidental that I benefit from my own math.) - **Eliana Crew**: +11 points - **Krzysztof Radziszeski**: +3 points **The Losers** - **Andrew Stricklin**: −113 points. Still ranked #1, but the old system over-credited wins with strong partners. - **Jacklyn Wyszynski**: −43 points - **David Pabst**: −40 points ## A Note on Conflicts of Interest You may notice that the system designer (me) is also the biggest beneficiary of the new ratings, gaining a convenient 159 points. I want to assure you this is *purely coincidental* and the result of *rigorous mathematical analysis*, not at all influenced by the fact that I was tired of being ranked last. The new formulas are based on *sound theoretical principles* that just *happen* to conclude I was being unfairly penalized all along. *Trust the math.* 😉 ## Why This System Works **For a small league:** - Simple to understand (one rating per player) - Fair to individual skill (per-point scoring) - Respects partnership (effective opponent formula) - Transparent (you can calculate rating changes yourself) - Fast convergence (5-10 matches to stabilize a rating) **The bottom line:** Your rating now reflects your true skill more accurately than before. Even if it means Dane finally looks respectable.