Add detailed old system explanation, tanh critique, and conflict of interest disclaimer

2026-02-26 12:04:03 -05:00 · 2026-02-26 12:04:03 -05:00 · 2533b589a0
commit 2533b589a0
parent 858636018b
1 changed files with 82 additions and 6 deletions
--- a/docs/rating-system-v3-elo.tex
+++ b/docs/rating-system-v3-elo.tex
@ -60,15 +60,74 @@
 Welcome to the simplified Pickleball ELO Rating System!
-After running our league with Glicko-2 for over a month, we realized:
+After running our league with Glicko-2 for over a month, we realized the system had some problems. This document explains what was wrong with the old system, why we changed it, and how the new system works.
 \textit{(And yes, the system designer happens to be the biggest beneficiary of the new rating calculations. Coincidence? Probably. But we'll let you be the judge.)}
 \section{The Old System: What Went Wrong}
 \subsection{Glicko-2: A Brief Overview}
 The original system used \textbf{Glicko-2}, a rating system developed by Mark Glickman for chess. Unlike basic ELO (one number), Glicko-2 tracks \textit{three} values per player:
 \begin{enumerate}
-\item Glicko-2 was overkill for our small recreational league
+\item \textbf{Rating ($r$)}: Your skill estimate (like ELO, default 1500)
-\item Many players didn't understand how rating changes worked
+\item \textbf{Rating Deviation ($RD$)}: How \textit{uncertain} the system is about your rating. High RD = ``we're not sure about this player yet.'' Low RD = ``we're confident.''
-\item We didn't need rating deviation or volatility tracking
+\item \textbf{Volatility ($\sigma$)}: How \textit{consistent} you are. High volatility = your performance varies wildly.
 \item A simple, transparent ELO system would be easier to maintain and explain
 \end{enumerate}
-This document explains pure ELO and the improvements we made to handle pickleball's unique challenges (especially doubles with different partner strengths).
+The math behind Glicko-2 involves converting ratings to a different scale, computing expected outcomes with a $g(\phi)$ function involving $\pi$, iteratively solving for new volatility using numerical methods, and... look, it's a lot. Most players had no idea why their rating changed the way it did.
 \subsection{The Arbitrary Margin Bonus (The ``Tanh Shit'')}
 Here's where things got sketchy. Standard Glicko-2 treats every win the same—whether you win 11-0 or 11-9. We wanted margin of victory to matter, so the old system added a \textit{margin bonus}:
 \begin{equation}
 \text{weighted\_score} = \text{base\_score} + \tanh\left(\frac{\text{margin}}{11} \times 0.3\right) \times (\text{base\_score} - 0.5)
 \end{equation}
 \textbf{Translation}: We took the hyperbolic tangent of a fraction involving the point margin, multiplied by an arbitrary constant (0.3), and added it to your win/loss.
 \textbf{Why 0.3?} No particular reason. It ``felt right.''
 \textbf{Why $\tanh$?} It squishes values between -1 and 1, which... seemed useful?
 This is what's known in the business as ``making stuff up.'' It worked, sort of, but it had no theoretical basis and was impossible to explain to players.
 \subsection{The Doubles Problem}
 The old system calculated team ratings by simply \textit{averaging} both partners:
 \begin{equation}
 R_{\text{team}} = \frac{R_{\text{player1}} + R_{\text{player2}}}{2}
 \end{equation}
 This seems reasonable until you think about it. If you (1400) play with a strong partner (1700) against two 1550s:
 \begin{itemize}
 \item Your team average: 1550
 \item Their team average: 1550
 \item The system thinks it's an even match!
 \end{itemize}
 But \textit{you} played against opponents rated 1550, while being ``carried'' by a 1700 partner. Winning this match shouldn't boost your rating as much as if you'd won with a weaker partner. The old system didn't account for this.
 \subsection{The RD Distribution Bug}
 When distributing rating changes between doubles partners, the old system gave \textit{more} change to players with \textit{lower} RD (more certain ratings). This is backwards.
 If we're uncertain about your rating (high RD), the system should update it \textit{more aggressively} to converge faster. Instead, we were doing the opposite—penalizing uncertain players by updating them slowly.
 \subsection{Summary: Why We Changed}
 \begin{enumerate}
 \item Glicko-2 was over-engineered for a recreational league
 \item The margin bonus was arbitrary (``the tanh shit'')
 \item Doubles averaging ignored partner strength effects
 \item The RD distribution was literally backwards
 \item Nobody understood why their rating changed
 \end{enumerate}
 Time for something simpler.
 \section{ELO System Basics}
@ -284,6 +343,23 @@ Dane Sabo & 1290 & 1449 & \textcolor{success}{+159} & 25 \\
 The new system rates players more fairly, especially in doubles where partner strength varies.
 \vspace{0.5em}
 \begin{center}
 \fbox{%
 \parbox{0.85\textwidth}{%
 \textbf{A Note on Conflicts of Interest} \\[0.3em]
 \small
 The astute reader may notice that the system designer (Dane) is also the biggest beneficiary of the new rating calculations, gaining a convenient 159 points. 
 We want to assure you that this is \textit{purely coincidental} and the result of \textit{rigorous mathematical analysis}, not at all influenced by the fact that Dane was tired of being ranked last.
 The new formulas are based on \textit{sound theoretical principles} that just \textit{happen} to conclude that Dane was being unfairly penalized all along. Any resemblance to cooking the books is entirely accidental.
 \textit{Trust the math.}
 }%
 }
 \end{center}
 \section{Implementation Notes}
 \subsection{K-Factor}