Add detailed old system explanation, tanh critique, and conflict of interest disclaimer
This commit is contained in:
parent
858636018b
commit
2533b589a0
@ -60,15 +60,74 @@
|
||||
|
||||
Welcome to the simplified Pickleball ELO Rating System!
|
||||
|
||||
After running our league with Glicko-2 for over a month, we realized:
|
||||
After running our league with Glicko-2 for over a month, we realized the system had some problems. This document explains what was wrong with the old system, why we changed it, and how the new system works.
|
||||
|
||||
\textit{(And yes, the system designer happens to be the biggest beneficiary of the new rating calculations. Coincidence? Probably. But we'll let you be the judge.)}
|
||||
|
||||
\section{The Old System: What Went Wrong}
|
||||
|
||||
\subsection{Glicko-2: A Brief Overview}
|
||||
|
||||
The original system used \textbf{Glicko-2}, a rating system developed by Mark Glickman for chess. Unlike basic ELO (one number), Glicko-2 tracks \textit{three} values per player:
|
||||
|
||||
\begin{enumerate}
|
||||
\item Glicko-2 was overkill for our small recreational league
|
||||
\item Many players didn't understand how rating changes worked
|
||||
\item We didn't need rating deviation or volatility tracking
|
||||
\item A simple, transparent ELO system would be easier to maintain and explain
|
||||
\item \textbf{Rating ($r$)}: Your skill estimate (like ELO, default 1500)
|
||||
\item \textbf{Rating Deviation ($RD$)}: How \textit{uncertain} the system is about your rating. High RD = ``we're not sure about this player yet.'' Low RD = ``we're confident.''
|
||||
\item \textbf{Volatility ($\sigma$)}: How \textit{consistent} you are. High volatility = your performance varies wildly.
|
||||
\end{enumerate}
|
||||
|
||||
This document explains pure ELO and the improvements we made to handle pickleball's unique challenges (especially doubles with different partner strengths).
|
||||
The math behind Glicko-2 involves converting ratings to a different scale, computing expected outcomes with a $g(\phi)$ function involving $\pi$, iteratively solving for new volatility using numerical methods, and... look, it's a lot. Most players had no idea why their rating changed the way it did.
|
||||
|
||||
\subsection{The Arbitrary Margin Bonus (The ``Tanh Shit'')}
|
||||
|
||||
Here's where things got sketchy. Standard Glicko-2 treats every win the same—whether you win 11-0 or 11-9. We wanted margin of victory to matter, so the old system added a \textit{margin bonus}:
|
||||
|
||||
\begin{equation}
|
||||
\text{weighted\_score} = \text{base\_score} + \tanh\left(\frac{\text{margin}}{11} \times 0.3\right) \times (\text{base\_score} - 0.5)
|
||||
\end{equation}
|
||||
|
||||
\textbf{Translation}: We took the hyperbolic tangent of a fraction involving the point margin, multiplied by an arbitrary constant (0.3), and added it to your win/loss.
|
||||
|
||||
\textbf{Why 0.3?} No particular reason. It ``felt right.''
|
||||
|
||||
\textbf{Why $\tanh$?} It squishes values between -1 and 1, which... seemed useful?
|
||||
|
||||
This is what's known in the business as ``making stuff up.'' It worked, sort of, but it had no theoretical basis and was impossible to explain to players.
|
||||
|
||||
\subsection{The Doubles Problem}
|
||||
|
||||
The old system calculated team ratings by simply \textit{averaging} both partners:
|
||||
|
||||
\begin{equation}
|
||||
R_{\text{team}} = \frac{R_{\text{player1}} + R_{\text{player2}}}{2}
|
||||
\end{equation}
|
||||
|
||||
This seems reasonable until you think about it. If you (1400) play with a strong partner (1700) against two 1550s:
|
||||
\begin{itemize}
|
||||
\item Your team average: 1550
|
||||
\item Their team average: 1550
|
||||
\item The system thinks it's an even match!
|
||||
\end{itemize}
|
||||
|
||||
But \textit{you} played against opponents rated 1550, while being ``carried'' by a 1700 partner. Winning this match shouldn't boost your rating as much as if you'd won with a weaker partner. The old system didn't account for this.
|
||||
|
||||
\subsection{The RD Distribution Bug}
|
||||
|
||||
When distributing rating changes between doubles partners, the old system gave \textit{more} change to players with \textit{lower} RD (more certain ratings). This is backwards.
|
||||
|
||||
If we're uncertain about your rating (high RD), the system should update it \textit{more aggressively} to converge faster. Instead, we were doing the opposite—penalizing uncertain players by updating them slowly.
|
||||
|
||||
\subsection{Summary: Why We Changed}
|
||||
|
||||
\begin{enumerate}
|
||||
\item Glicko-2 was over-engineered for a recreational league
|
||||
\item The margin bonus was arbitrary (``the tanh shit'')
|
||||
\item Doubles averaging ignored partner strength effects
|
||||
\item The RD distribution was literally backwards
|
||||
\item Nobody understood why their rating changed
|
||||
\end{enumerate}
|
||||
|
||||
Time for something simpler.
|
||||
|
||||
\section{ELO System Basics}
|
||||
|
||||
@ -284,6 +343,23 @@ Dane Sabo & 1290 & 1449 & \textcolor{success}{+159} & 25 \\
|
||||
|
||||
The new system rates players more fairly, especially in doubles where partner strength varies.
|
||||
|
||||
\vspace{0.5em}
|
||||
\begin{center}
|
||||
\fbox{%
|
||||
\parbox{0.85\textwidth}{%
|
||||
\textbf{A Note on Conflicts of Interest} \\[0.3em]
|
||||
\small
|
||||
The astute reader may notice that the system designer (Dane) is also the biggest beneficiary of the new rating calculations, gaining a convenient 159 points.
|
||||
|
||||
We want to assure you that this is \textit{purely coincidental} and the result of \textit{rigorous mathematical analysis}, not at all influenced by the fact that Dane was tired of being ranked last.
|
||||
|
||||
The new formulas are based on \textit{sound theoretical principles} that just \textit{happen} to conclude that Dane was being unfairly penalized all along. Any resemblance to cooking the books is entirely accidental.
|
||||
|
||||
\textit{Trust the math.}
|
||||
}%
|
||||
}
|
||||
\end{center}
|
||||
|
||||
\section{Implementation Notes}
|
||||
|
||||
\subsection{K-Factor}
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user