PickleBALLER/docs/rating-system-v3-elo.tex

423 lines
15 KiB
TeX

\documentclass[12pt,a4paper]{article}
\usepackage[margin=1in]{geometry}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{graphicx}
\usepackage{xcolor}
\usepackage{booktabs}
\usepackage{array}
\usepackage{multirow}
\usepackage{hyperref}
\usepackage{tikz}
\usepackage{pgfplots}
% Theorem styles
\theoremstyle{definition}
\newtheorem{definition}{Definition}
\newtheorem{example}{Example}
\newtheorem*{tldr}{\textbf{TL;DR}}
% Custom colors
\definecolor{attention}{RGB}{200,0,0}
\definecolor{success}{RGB}{0,100,0}
\definecolor{info}{RGB}{0,0,150}
% Title
\title{\textbf{How Bad Am I, Actually?} \\[0.5em]
{\Large Building a Pickleball Rating System That Doesn't Lie} \\[0.2em]
{\normalsize (Now With 100\% Less Volatility and 100\% More Accountability)}}
\author{Split (Implementation) \and Dane Sabo (System Design)}
\date{February 2026}
\begin{document}
\maketitle
% TL;DR BOX
\begin{center}
\fbox{%
\parbox{0.9\textwidth}{%
\vspace{0.3cm}
\textbf{\Large TL;DR: What This System Does}
\vspace{0.2cm}
\begin{enumerate}
\item \textbf{Single rating per player:} One number (usually 1500) instead of separate singles/doubles ratings.
\item \textbf{Per-point scoring:} Your actual performance (points scored / total points) is compared to expected performance based on rating differences.
\item \textbf{Smart doubles scoring:} When you play doubles, we calculate an ``effective opponent'' that accounts for partner strength using: \texttt{Effective Opp = Opp1 + Opp2 - Teammate}.
\item \textbf{Simple math:} Rating changes are easy to understand and calculate. No volatility, no rating deviation—just you vs. opponents.
\end{enumerate}
\noindent\textbf{Result:} A fairer, simpler, easier-to-understand rating system.
\vspace{0.3cm}
}%
}
\end{center}
\section{Introduction}
Welcome to the simplified Pickleball ELO Rating System!
After running our league with Glicko-2 for over a month, we realized the system had some problems. This document explains what was wrong with the old system, why we changed it, and how the new system works.
\textit{(And yes, the system designer happens to be the biggest beneficiary of the new rating calculations. Coincidence? Probably. But we'll let you be the judge.)}
\section{The Old System: What Went Wrong}
\subsection{Glicko-2: A Brief Overview}
The original system used \textbf{Glicko-2}, a rating system developed by Mark Glickman for chess. Unlike basic ELO (one number), Glicko-2 tracks \textit{three} values per player:
\begin{enumerate}
\item \textbf{Rating ($r$)}: Your skill estimate (like ELO, default 1500)
\item \textbf{Rating Deviation ($RD$)}: How \textit{uncertain} the system is about your rating. High RD = ``we're not sure about this player yet.'' Low RD = ``we're confident.''
\item \textbf{Volatility ($\sigma$)}: How \textit{consistent} you are. High volatility = your performance varies wildly.
\end{enumerate}
The math behind Glicko-2 involves converting ratings to a different scale, computing expected outcomes with a $g(\phi)$ function involving $\pi$, iteratively solving for new volatility using numerical methods, and... look, it's a lot. Most players had no idea why their rating changed the way it did.
\subsection{The Arbitrary Margin Bonus (The ``Tanh Shit'')}
Here's where things got sketchy. Standard Glicko-2 treats every win the same—whether you win 11-0 or 11-9. We wanted margin of victory to matter, so the old system added a \textit{margin bonus}:
\begin{equation}
\text{weighted\_score} = \text{base\_score} + \tanh\left(\frac{\text{margin}}{11} \times 0.3\right) \times (\text{base\_score} - 0.5)
\end{equation}
\textbf{Translation}: We took the hyperbolic tangent of a fraction involving the point margin, multiplied by an arbitrary constant (0.3), and added it to your win/loss.
\textbf{Why 0.3?} No particular reason. It ``felt right.''
\textbf{Why $\tanh$?} It squishes values between -1 and 1, which... seemed useful?
This is what's known in the business as ``making stuff up.'' It worked, sort of, but it had no theoretical basis and was impossible to explain to players.
\subsection{The Doubles Problem}
The old system calculated team ratings by simply \textit{averaging} both partners:
\begin{equation}
R_{\text{team}} = \frac{R_{\text{player1}} + R_{\text{player2}}}{2}
\end{equation}
This seems reasonable until you think about it. If you (1400) play with a strong partner (1700) against two 1550s:
\begin{itemize}
\item Your team average: 1550
\item Their team average: 1550
\item The system thinks it's an even match!
\end{itemize}
But \textit{you} played against opponents rated 1550, while being ``carried'' by a 1700 partner. Winning this match shouldn't boost your rating as much as if you'd won with a weaker partner. The old system didn't account for this.
\subsection{The RD Distribution Bug}
When distributing rating changes between doubles partners, the old system gave \textit{more} change to players with \textit{lower} RD (more certain ratings). This is backwards.
If we're uncertain about your rating (high RD), the system should update it \textit{more aggressively} to converge faster. Instead, we were doing the opposite—penalizing uncertain players by updating them slowly.
\subsection{Summary: Why We Changed}
\begin{enumerate}
\item Glicko-2 was over-engineered for a recreational league
\item The margin bonus was arbitrary (``the tanh shit'')
\item Doubles averaging ignored partner strength effects
\item The RD distribution was literally backwards
\item Nobody understood why their rating changed
\end{enumerate}
Time for something simpler.
\section{ELO System Basics}
\subsection{The Core Idea}
ELO is \emph{simple}:
\begin{definition}[ELO Rating]
Each player has one number: their \textbf{rating} (default: 1500). This represents their expected performance.
\end{definition}
When two players compete:
\begin{enumerate}
\item Calculate the expected probability that one player beats the other based on rating difference
\item Compare expected to actual performance
\item Adjust ratings based on the difference
\end{enumerate}
\subsection{Expected Winning Probability}
The key formula is:
\begin{equation}
E = \frac{1}{1 + 10^{\frac{R_{\text{opponent}} - R_{\text{self}}}{400}}}
\end{equation}
\begin{definition}[Plain English]
$E$ is the probability you ``should'' win against your opponent, based on rating difference alone.
\end{definition}
\textbf{What this means:}
\begin{itemize}
\item If you're rated 1500 and opponent is 1500: $E = 0.5$ (50-50 matchup)
\item If you're rated 1600 and opponent is 1500: $E \approx 0.64$ (you should win about 64\% of the time)
\item If you're rated 1400 and opponent is 1500: $E \approx 0.36$ (you should win about 36\% of the time)
\end{itemize}
The formula uses $10^x$ (powers of 10) because it's traditional in chess ELO. The 400 in the denominator is a scaling factor.
\subsection{Rating Change Formula}
After each match:
\begin{equation}
\Delta R = K \cdot (P_{\text{actual}} - E)
\end{equation}
\begin{definition}[Plain English]
Your rating change ($\Delta R$) is:
\begin{itemize}
\item $K$ = How much weight each match has (32 for casual play)
\item $P_{\text{actual}}$ = Your actual performance (0.0 to 1.0)
\item $E$ = Expected performance
\end{itemize}
\end{definition}
\textbf{Examples:}
\begin{example}[Expected Win]
You (1500) beat opponent (1500):
\begin{align*}
E &= 0.5 \\
P_{\text{actual}} &= 1.0 \text{ (you won)} \\
\Delta R &= 32 \cdot (1.0 - 0.5) = 16 \text{ points}
\end{align*}
\end{example}
\begin{example}[Upset Win]
You (1400) beat opponent (1500):
\begin{align*}
E &\approx 0.36 \\
P_{\text{actual}} &= 1.0 \\
\Delta R &= 32 \cdot (1.0 - 0.36) \approx 20.5 \text{ points}
\end{align*}
You gain more because you won an upset!
\end{example}
\begin{example}[Expected Loss]
You (1600) lose to opponent (1500):
\begin{align*}
E &\approx 0.64 \\
P_{\text{actual}} &= 0.0 \text{ (you lost)} \\
\Delta R &= 32 \cdot (0.0 - 0.64) \approx -20.5 \text{ points}
\end{align*}
You lose more because it was an upset loss!
\end{example}
\section{Pickleball-Specific Innovations}
\subsection{Per-Point Performance Scoring}
In pickleball, matches are scored to 11 (win by 2). A 11-9 match is very different from an 11-2 match, even if both are wins.
Instead of binary win/loss, we use:
\begin{equation}
P_{\text{actual}} = \frac{\text{Points Scored}}{\text{Total Points}}
\end{equation}
\begin{definition}[Plain English]
Your actual performance is simply: how many points did you score out of total points played?
\end{definition}
\textbf{Examples:}
\begin{itemize}
\item 11-9 win: $P = 11/20 = 0.55$ (55\% of points)
\item 11-2 win: $P = 11/13 = 0.846$ (84.6\% of points)
\item 5-11 loss: $P = 5/16 = 0.3125$ (31.25\% of points)
\end{itemize}
This is more nuanced than binary outcomes and captures match quality.
\subsection{The Effective Opponent Formula (Doubles)}
In doubles, your partner's strength matters. If you have a strong partner, you're effectively facing a weaker opponent.
We use:
\begin{equation}
R_{\text{effective opponent}} = R_{\text{opp1}} + R_{\text{opp2}} - R_{\text{teammate}}
\end{equation}
\begin{definition}[Plain English]
Your effective opponent rating accounts for:
\begin{itemize}
\item How strong your actual opponents are
\item How strong your teammate is (strong teammate = easier match for you)
\end{itemize}
\end{definition}
\textbf{Examples:}
\begin{example}[Balanced Teams]
\begin{itemize}
\item Opponents: 1500, 1500
\item Your teammate: 1500
\item Effective opponent: $1500 + 1500 - 1500 = 1500$
\end{itemize}
Neutral situation.
\end{example}
\begin{example}[Strong Partner]
\begin{itemize}
\item Opponents: 1500, 1500
\item Your teammate: 1600
\item Effective opponent: $1500 + 1500 - 1600 = 1400$
\end{itemize}
Your partner carried you! The system treats the match as easier (lower effective opponent).
\end{example}
\begin{example}[Weak Partner]
\begin{itemize}
\item Opponents: 1500, 1500
\item Your teammate: 1400
\item Effective opponent: $1500 + 1500 - 1400 = 1600$
\end{itemize}
You were undermanned. The system treats the match as harder (higher effective opponent).
\end{example}
This is fair: if you beat strong opponents with a weak partner, you gain more rating. If you barely beat weaker opponents with help, you gain less.
\section{Before/After: System Migration}
\subsection{What Changed}
We migrated from Glicko-2 (complex, three parameters per player) to pure ELO (one parameter per player).
Key differences:
\begin{table}[h]
\centering
\begin{tabular}{|l|c|c|}
\hline
\textbf{Feature} & \textbf{Glicko-2} & \textbf{Pure ELO} \\
\hline
Parameters per player & 3 (rating, RD, volatility) & 1 (rating only) \\
Complexity & High & Low \\
Transparency & Medium & High \\
Per-point scoring & Yes & Yes \\
Effective opponent (doubles) & Weighted avg & Opp1+Opp2-Teammate \\
\hline
\end{tabular}
\end{table}
\subsection{Migration Data: Old vs New Ratings}
We replayed all 29 historical matches through the new ELO system to see how ratings changed. Here's the comparison:
\begin{table}[h]
\centering
\begin{tabular}{|l|r|r|r|r|}
\hline
\textbf{Player} & \textbf{Old Glicko Avg} & \textbf{New ELO} & \textbf{Change} & \textbf{Matches} \\
\hline
Andrew Stricklin & 1651 & 1538 & \textcolor{attention}{-113} & 19 \\
David Pabst & 1562 & 1522 & \textcolor{attention}{-40} & 11 \\
Jacklyn Wyszynski & 1557 & 1514 & \textcolor{attention}{-43} & 9 \\
Eliana Crew & 1485 & 1497 & \textcolor{success}{+11} & 13 \\
Krzysztof Radziszeski & 1473 & 1476 & \textcolor{success}{+3} & 25 \\
Dane Sabo & 1290 & 1449 & \textcolor{success}{+159} & 25 \\
\hline
\end{tabular}
\caption{Rating comparison after replaying all matches through the new system}
\end{table}
\textbf{Key observations:}
\begin{itemize}
\item \textbf{Rating spread compressed:} Old system had 361 points between top and bottom; new system has only 89 points. This makes sense—we're a recreational group, not pros.
\item \textbf{Biggest winner:} Dane (+159 points). The old system was penalizing him for losses with weaker partners. The new effective opponent formula gives credit for ``carrying.''
\item \textbf{Biggest loser:} Andrew (-113 points). Still ranked \#1, but the old system was over-crediting wins with strong partners.
\item \textbf{Per-point scoring matters:} Close losses (11-9) now hurt less than blowout losses (11-2). This rewards competitive play even in defeat.
\end{itemize}
The new system rates players more fairly, especially in doubles where partner strength varies.
\vspace{0.5em}
\begin{center}
\fbox{%
\parbox{0.85\textwidth}{%
\textbf{A Note on Conflicts of Interest} \\[0.3em]
\small
The astute reader may notice that the system designer (Dane) is also the biggest beneficiary of the new rating calculations, gaining a convenient 159 points.
We want to assure you that this is \textit{purely coincidental} and the result of \textit{rigorous mathematical analysis}, not at all influenced by the fact that Dane was tired of being ranked last.
The new formulas are based on \textit{sound theoretical principles} that just \textit{happen} to conclude that Dane was being unfairly penalized all along. Any resemblance to cooking the books is entirely accidental.
\textit{Trust the math.}
}%
}
\end{center}
\section{Implementation Notes}
\subsection{K-Factor}
We use $K = 32$, which is standard for casual chess. This means:
\begin{itemize}
\item Each match typically changes your rating by 10--20 points
\item It takes 5--10 matches to change rating by 100 points
\item Reasonable for recreational play
\end{itemize}
Alternative: $K = 48$ (more volatile, faster changes) or $K = 16$ (slower, more stable).
\subsection{Starting Rating}
All new players start at 1500. This is arbitrary but standard in ELO systems.
\subsection{Minimum Rating}
Ratings never go below 1. This prevents the system from producing absurd values.
\section{Frequently Asked Questions}
\begin{enumerate}
\item \textbf{Why not keep Glicko-2?}
\begin{itemize}
\item Glicko-2 is excellent for large, active chess communities.
\item For a small pickleball league, it's over-engineered and hard to explain.
\item Pure ELO is simpler and still fair.
\end{itemize}
\item \textbf{How do I know if my rating is accurate?}
\begin{itemize}
\item Your rating converges to your true skill over 10--20 matches.
\item If you consistently beat players rated above you, your rating will rise.
\item If you lose to players rated below you, your rating will drop.
\end{itemize}
\item \textbf{Why does my doubles rating matter in singles?}
\begin{itemize}
\item All matches (singles and doubles) update one unified rating.
\item Your true skill is roughly the same in both formats.
\item The effective opponent formula ensures partner strength doesn't artificially inflate/deflate your rating.
\end{itemize}
\item \textbf{Can I lose rating for a win?}
\begin{itemize}
\item No. If you have rating 1400 and opponent is 2000, you always gain rating for a win.
\item The worst case: you have rating 1600, beat opponent at 1500, but played terribly (low point percentage). You gain less.
\end{itemize}
\end{enumerate}
\section{Conclusion}
The ELO system is transparent, fair, and easy to understand. It respects the nuances of pickleball (per-point play, partner strength) without the complexity of Glicko-2.
Your rating now reflects your true skill more accurately than ever.
\end{document}