Add detailed old system explanation, tanh critique, and conflict of interest disclaimer

This commit is contained in:
Split 2026-02-26 12:04:03 -05:00
parent 858636018b
commit 2533b589a0

View File

@ -60,15 +60,74 @@
Welcome to the simplified Pickleball ELO Rating System! Welcome to the simplified Pickleball ELO Rating System!
After running our league with Glicko-2 for over a month, we realized: After running our league with Glicko-2 for over a month, we realized the system had some problems. This document explains what was wrong with the old system, why we changed it, and how the new system works.
\textit{(And yes, the system designer happens to be the biggest beneficiary of the new rating calculations. Coincidence? Probably. But we'll let you be the judge.)}
\section{The Old System: What Went Wrong}
\subsection{Glicko-2: A Brief Overview}
The original system used \textbf{Glicko-2}, a rating system developed by Mark Glickman for chess. Unlike basic ELO (one number), Glicko-2 tracks \textit{three} values per player:
\begin{enumerate} \begin{enumerate}
\item Glicko-2 was overkill for our small recreational league \item \textbf{Rating ($r$)}: Your skill estimate (like ELO, default 1500)
\item Many players didn't understand how rating changes worked \item \textbf{Rating Deviation ($RD$)}: How \textit{uncertain} the system is about your rating. High RD = ``we're not sure about this player yet.'' Low RD = ``we're confident.''
\item We didn't need rating deviation or volatility tracking \item \textbf{Volatility ($\sigma$)}: How \textit{consistent} you are. High volatility = your performance varies wildly.
\item A simple, transparent ELO system would be easier to maintain and explain
\end{enumerate} \end{enumerate}
This document explains pure ELO and the improvements we made to handle pickleball's unique challenges (especially doubles with different partner strengths). The math behind Glicko-2 involves converting ratings to a different scale, computing expected outcomes with a $g(\phi)$ function involving $\pi$, iteratively solving for new volatility using numerical methods, and... look, it's a lot. Most players had no idea why their rating changed the way it did.
\subsection{The Arbitrary Margin Bonus (The ``Tanh Shit'')}
Here's where things got sketchy. Standard Glicko-2 treats every win the same—whether you win 11-0 or 11-9. We wanted margin of victory to matter, so the old system added a \textit{margin bonus}:
\begin{equation}
\text{weighted\_score} = \text{base\_score} + \tanh\left(\frac{\text{margin}}{11} \times 0.3\right) \times (\text{base\_score} - 0.5)
\end{equation}
\textbf{Translation}: We took the hyperbolic tangent of a fraction involving the point margin, multiplied by an arbitrary constant (0.3), and added it to your win/loss.
\textbf{Why 0.3?} No particular reason. It ``felt right.''
\textbf{Why $\tanh$?} It squishes values between -1 and 1, which... seemed useful?
This is what's known in the business as ``making stuff up.'' It worked, sort of, but it had no theoretical basis and was impossible to explain to players.
\subsection{The Doubles Problem}
The old system calculated team ratings by simply \textit{averaging} both partners:
\begin{equation}
R_{\text{team}} = \frac{R_{\text{player1}} + R_{\text{player2}}}{2}
\end{equation}
This seems reasonable until you think about it. If you (1400) play with a strong partner (1700) against two 1550s:
\begin{itemize}
\item Your team average: 1550
\item Their team average: 1550
\item The system thinks it's an even match!
\end{itemize}
But \textit{you} played against opponents rated 1550, while being ``carried'' by a 1700 partner. Winning this match shouldn't boost your rating as much as if you'd won with a weaker partner. The old system didn't account for this.
\subsection{The RD Distribution Bug}
When distributing rating changes between doubles partners, the old system gave \textit{more} change to players with \textit{lower} RD (more certain ratings). This is backwards.
If we're uncertain about your rating (high RD), the system should update it \textit{more aggressively} to converge faster. Instead, we were doing the opposite—penalizing uncertain players by updating them slowly.
\subsection{Summary: Why We Changed}
\begin{enumerate}
\item Glicko-2 was over-engineered for a recreational league
\item The margin bonus was arbitrary (``the tanh shit'')
\item Doubles averaging ignored partner strength effects
\item The RD distribution was literally backwards
\item Nobody understood why their rating changed
\end{enumerate}
Time for something simpler.
\section{ELO System Basics} \section{ELO System Basics}
@ -284,6 +343,23 @@ Dane Sabo & 1290 & 1449 & \textcolor{success}{+159} & 25 \\
The new system rates players more fairly, especially in doubles where partner strength varies. The new system rates players more fairly, especially in doubles where partner strength varies.
\vspace{0.5em}
\begin{center}
\fbox{%
\parbox{0.85\textwidth}{%
\textbf{A Note on Conflicts of Interest} \\[0.3em]
\small
The astute reader may notice that the system designer (Dane) is also the biggest beneficiary of the new rating calculations, gaining a convenient 159 points.
We want to assure you that this is \textit{purely coincidental} and the result of \textit{rigorous mathematical analysis}, not at all influenced by the fact that Dane was tired of being ranked last.
The new formulas are based on \textit{sound theoretical principles} that just \textit{happen} to conclude that Dane was being unfairly penalized all along. Any resemblance to cooking the books is entirely accidental.
\textit{Trust the math.}
}%
}
\end{center}
\section{Implementation Notes} \section{Implementation Notes}
\subsection{K-Factor} \subsection{K-Factor}