Add detailed old system explanation, tanh critique, and conflict of interest disclaimer

2026-02-26 12:04:03 -05:00 · 2026-02-26 12:04:03 -05:00 · 2533b589a0
commit 2533b589a0
parent 858636018b
1 changed files with 82 additions and 6 deletions
--- a/docs/rating-system-v3-elo.tex
+++ b/docs/rating-system-v3-elo.tex
@ -60,15 +60,74 @@

 Welcome to the simplified Pickleball ELO Rating System!

-After running our league with Glicko-2 for over a month, we realized:
+After running our league with Glicko-2 for over a month, we realized the system had some problems. This document explains what was wrong with the old system, why we changed it, and how the new system works.
+
+\textit{(And yes, the system designer happens to be the biggest beneficiary of the new rating calculations. Coincidence? Probably. But we'll let you be the judge.)}
+
+\section{The Old System: What Went Wrong}
+
+\subsection{Glicko-2: A Brief Overview}
+
+The original system used \textbf{Glicko-2}, a rating system developed by Mark Glickman for chess. Unlike basic ELO (one number), Glicko-2 tracks \textit{three} values per player:
+
 \begin{enumerate}
-\item Glicko-2 was overkill for our small recreational league
-\item Many players didn't understand how rating changes worked
-\item We didn't need rating deviation or volatility tracking
-\item A simple, transparent ELO system would be easier to maintain and explain
+\item \textbf{Rating ($r$)}: Your skill estimate (like ELO, default 1500)
+\item \textbf{Rating Deviation ($RD$)}: How \textit{uncertain} the system is about your rating. High RD = ``we're not sure about this player yet.'' Low RD = ``we're confident.''
+\item \textbf{Volatility ($\sigma$)}: How \textit{consistent} you are. High volatility = your performance varies wildly.
 \end{enumerate}

-This document explains pure ELO and the improvements we made to handle pickleball's unique challenges (especially doubles with different partner strengths).
+The math behind Glicko-2 involves converting ratings to a different scale, computing expected outcomes with a $g(\phi)$ function involving $\pi$, iteratively solving for new volatility using numerical methods, and... look, it's a lot. Most players had no idea why their rating changed the way it did.
+
+\subsection{The Arbitrary Margin Bonus (The ``Tanh Shit'')}
+
+Here's where things got sketchy. Standard Glicko-2 treats every win the same—whether you win 11-0 or 11-9. We wanted margin of victory to matter, so the old system added a \textit{margin bonus}:
+
+\begin{equation}
+\text{weighted\_score} = \text{base\_score} + \tanh\left(\frac{\text{margin}}{11} \times 0.3\right) \times (\text{base\_score} - 0.5)
+\end{equation}
+
+\textbf{Translation}: We took the hyperbolic tangent of a fraction involving the point margin, multiplied by an arbitrary constant (0.3), and added it to your win/loss.
+
+\textbf{Why 0.3?} No particular reason. It ``felt right.''
+
+\textbf{Why $\tanh$?} It squishes values between -1 and 1, which... seemed useful?
+
+This is what's known in the business as ``making stuff up.'' It worked, sort of, but it had no theoretical basis and was impossible to explain to players.
+
+\subsection{The Doubles Problem}
+
+The old system calculated team ratings by simply \textit{averaging} both partners:
+
+\begin{equation}
+R_{\text{team}} = \frac{R_{\text{player1}} + R_{\text{player2}}}{2}
+\end{equation}
+
+This seems reasonable until you think about it. If you (1400) play with a strong partner (1700) against two 1550s:
+\begin{itemize}
+\item Your team average: 1550
+\item Their team average: 1550
+\item The system thinks it's an even match!
+\end{itemize}
+
+But \textit{you} played against opponents rated 1550, while being ``carried'' by a 1700 partner. Winning this match shouldn't boost your rating as much as if you'd won with a weaker partner. The old system didn't account for this.
+
+\subsection{The RD Distribution Bug}
+
+When distributing rating changes between doubles partners, the old system gave \textit{more} change to players with \textit{lower} RD (more certain ratings). This is backwards.
+
+If we're uncertain about your rating (high RD), the system should update it \textit{more aggressively} to converge faster. Instead, we were doing the opposite—penalizing uncertain players by updating them slowly.
+
+\subsection{Summary: Why We Changed}
+
+\begin{enumerate}
+\item Glicko-2 was over-engineered for a recreational league
+\item The margin bonus was arbitrary (``the tanh shit'')
+\item Doubles averaging ignored partner strength effects
+\item The RD distribution was literally backwards
+\item Nobody understood why their rating changed
+\end{enumerate}
+
+Time for something simpler.

 \section{ELO System Basics}

@ -284,6 +343,23 @@ Dane Sabo & 1290 & 1449 & \textcolor{success}{+159} & 25 \\

 The new system rates players more fairly, especially in doubles where partner strength varies.

+\vspace{0.5em}
+\begin{center}
+\fbox{%
+\parbox{0.85\textwidth}{%
+\textbf{A Note on Conflicts of Interest} \\[0.3em]
+\small
+The astute reader may notice that the system designer (Dane) is also the biggest beneficiary of the new rating calculations, gaining a convenient 159 points. 
+
+We want to assure you that this is \textit{purely coincidental} and the result of \textit{rigorous mathematical analysis}, not at all influenced by the fact that Dane was tired of being ranked last.
+
+The new formulas are based on \textit{sound theoretical principles} that just \textit{happen} to conclude that Dane was being unfairly penalized all along. Any resemblance to cooking the books is entirely accidental.
+
+\textit{Trust the math.}
+}%
+}
+\end{center}
+
 \section{Implementation Notes}

 \subsection{K-Factor}