From 2533b589a035a1a6238387a96e56e735fe579a00 Mon Sep 17 00:00:00 2001 From: Split Date: Thu, 26 Feb 2026 12:04:03 -0500 Subject: [PATCH] Add detailed old system explanation, tanh critique, and conflict of interest disclaimer --- docs/rating-system-v3-elo.tex | 88 ++++++++++++++++++++++++++++++++--- 1 file changed, 82 insertions(+), 6 deletions(-) diff --git a/docs/rating-system-v3-elo.tex b/docs/rating-system-v3-elo.tex index 10aab29..bb9f6ba 100644 --- a/docs/rating-system-v3-elo.tex +++ b/docs/rating-system-v3-elo.tex @@ -60,15 +60,74 @@ Welcome to the simplified Pickleball ELO Rating System! -After running our league with Glicko-2 for over a month, we realized: +After running our league with Glicko-2 for over a month, we realized the system had some problems. This document explains what was wrong with the old system, why we changed it, and how the new system works. + +\textit{(And yes, the system designer happens to be the biggest beneficiary of the new rating calculations. Coincidence? Probably. But we'll let you be the judge.)} + +\section{The Old System: What Went Wrong} + +\subsection{Glicko-2: A Brief Overview} + +The original system used \textbf{Glicko-2}, a rating system developed by Mark Glickman for chess. Unlike basic ELO (one number), Glicko-2 tracks \textit{three} values per player: + \begin{enumerate} -\item Glicko-2 was overkill for our small recreational league -\item Many players didn't understand how rating changes worked -\item We didn't need rating deviation or volatility tracking -\item A simple, transparent ELO system would be easier to maintain and explain +\item \textbf{Rating ($r$)}: Your skill estimate (like ELO, default 1500) +\item \textbf{Rating Deviation ($RD$)}: How \textit{uncertain} the system is about your rating. High RD = ``we're not sure about this player yet.'' Low RD = ``we're confident.'' +\item \textbf{Volatility ($\sigma$)}: How \textit{consistent} you are. High volatility = your performance varies wildly. \end{enumerate} -This document explains pure ELO and the improvements we made to handle pickleball's unique challenges (especially doubles with different partner strengths). +The math behind Glicko-2 involves converting ratings to a different scale, computing expected outcomes with a $g(\phi)$ function involving $\pi$, iteratively solving for new volatility using numerical methods, and... look, it's a lot. Most players had no idea why their rating changed the way it did. + +\subsection{The Arbitrary Margin Bonus (The ``Tanh Shit'')} + +Here's where things got sketchy. Standard Glicko-2 treats every win the same—whether you win 11-0 or 11-9. We wanted margin of victory to matter, so the old system added a \textit{margin bonus}: + +\begin{equation} +\text{weighted\_score} = \text{base\_score} + \tanh\left(\frac{\text{margin}}{11} \times 0.3\right) \times (\text{base\_score} - 0.5) +\end{equation} + +\textbf{Translation}: We took the hyperbolic tangent of a fraction involving the point margin, multiplied by an arbitrary constant (0.3), and added it to your win/loss. + +\textbf{Why 0.3?} No particular reason. It ``felt right.'' + +\textbf{Why $\tanh$?} It squishes values between -1 and 1, which... seemed useful? + +This is what's known in the business as ``making stuff up.'' It worked, sort of, but it had no theoretical basis and was impossible to explain to players. + +\subsection{The Doubles Problem} + +The old system calculated team ratings by simply \textit{averaging} both partners: + +\begin{equation} +R_{\text{team}} = \frac{R_{\text{player1}} + R_{\text{player2}}}{2} +\end{equation} + +This seems reasonable until you think about it. If you (1400) play with a strong partner (1700) against two 1550s: +\begin{itemize} +\item Your team average: 1550 +\item Their team average: 1550 +\item The system thinks it's an even match! +\end{itemize} + +But \textit{you} played against opponents rated 1550, while being ``carried'' by a 1700 partner. Winning this match shouldn't boost your rating as much as if you'd won with a weaker partner. The old system didn't account for this. + +\subsection{The RD Distribution Bug} + +When distributing rating changes between doubles partners, the old system gave \textit{more} change to players with \textit{lower} RD (more certain ratings). This is backwards. + +If we're uncertain about your rating (high RD), the system should update it \textit{more aggressively} to converge faster. Instead, we were doing the opposite—penalizing uncertain players by updating them slowly. + +\subsection{Summary: Why We Changed} + +\begin{enumerate} +\item Glicko-2 was over-engineered for a recreational league +\item The margin bonus was arbitrary (``the tanh shit'') +\item Doubles averaging ignored partner strength effects +\item The RD distribution was literally backwards +\item Nobody understood why their rating changed +\end{enumerate} + +Time for something simpler. \section{ELO System Basics} @@ -284,6 +343,23 @@ Dane Sabo & 1290 & 1449 & \textcolor{success}{+159} & 25 \\ The new system rates players more fairly, especially in doubles where partner strength varies. +\vspace{0.5em} +\begin{center} +\fbox{% +\parbox{0.85\textwidth}{% +\textbf{A Note on Conflicts of Interest} \\[0.3em] +\small +The astute reader may notice that the system designer (Dane) is also the biggest beneficiary of the new rating calculations, gaining a convenient 159 points. + +We want to assure you that this is \textit{purely coincidental} and the result of \textit{rigorous mathematical analysis}, not at all influenced by the fact that Dane was tired of being ranked last. + +The new formulas are based on \textit{sound theoretical principles} that just \textit{happen} to conclude that Dane was being unfairly penalized all along. Any resemblance to cooking the books is entirely accidental. + +\textit{Trust the math.} +}% +} +\end{center} + \section{Implementation Notes} \subsection{K-Factor}