PickleBALLER/docs/rating-system-v3-elo.tex

\documentclass[12pt,a4paper]{article}

\usepackage[margin=1in]{geometry}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{graphicx}
\usepackage{xcolor}
\usepackage{booktabs}
\usepackage{array}
\usepackage{multirow}
\usepackage{hyperref}
\usepackage{tikz}
\usepackage{pgfplots}

% Theorem styles
\theoremstyle{definition}
\newtheorem{definition}{Definition}
\newtheorem{example}{Example}
\newtheorem*{tldr}{\textbf{TL;DR}}

% Custom colors
\definecolor{attention}{RGB}{200,0,0}
\definecolor{success}{RGB}{0,100,0}
\definecolor{info}{RGB}{0,0,150}

% Title
\title{\textbf{How Bad Am I, Actually?} \\[0.5em]
  {\Large Building a Pickleball Rating System That Doesn't Lie} \\[0.2em]
  {\normalsize (Now With 100\% Less Volatility and 100\% More Accountability)}}
\author{Split (Implementation) \and Dane Sabo (System Design)}
\date{February 2026}

\begin{document}

\maketitle

% TL;DR BOX
\begin{center}
\fbox{%
  \parbox{0.9\textwidth}{%
    \vspace{0.3cm}
    \textbf{\Large TL;DR: What This System Does}
    \vspace{0.2cm}

    \begin{enumerate}
      \item \textbf{Single rating per player:} One number (usually 1500) instead of separate singles/doubles ratings.
      \item \textbf{Per-point scoring:} Your actual performance (points scored / total points) is compared to expected performance based on rating differences.
      \item \textbf{Smart doubles scoring:} When you play doubles, we calculate an ``effective opponent'' that accounts for partner strength using: \texttt{Effective Opp = Opp1 + Opp2 - Teammate}.
      \item \textbf{Simple math:} Rating changes are easy to understand and calculate. No volatility, no rating deviation—just you vs. opponents.
    \end{enumerate}

    \noindent\textbf{Result:} A fairer, simpler, easier-to-understand rating system.
    \vspace{0.3cm}
  }%
}
\end{center}

\section{Introduction}

Welcome to the simplified Pickleball ELO Rating System!

After running our league with Glicko-2 for over a month, we realized:
\begin{enumerate}
\item Glicko-2 was overkill for our small recreational league
\item Many players didn't understand how rating changes worked
\item We didn't need rating deviation or volatility tracking
\item A simple, transparent ELO system would be easier to maintain and explain
\end{enumerate}

This document explains pure ELO and the improvements we made to handle pickleball's unique challenges (especially doubles with different partner strengths).

\section{ELO System Basics}

\subsection{The Core Idea}

ELO is \emph{simple}:

\begin{definition}[ELO Rating]
Each player has one number: their \textbf{rating} (default: 1500). This represents their expected performance.
\end{definition}

When two players compete:
\begin{enumerate}
\item Calculate the expected probability that one player beats the other based on rating difference
\item Compare expected to actual performance
\item Adjust ratings based on the difference
\end{enumerate}

\subsection{Expected Winning Probability}

The key formula is:

\begin{equation}
E = \frac{1}{1 + 10^{\frac{R_{\text{opponent}} - R_{\text{self}}}{400}}}
\end{equation}

\begin{definition}[Plain English]
$E$ is the probability you ``should'' win against your opponent, based on rating difference alone.
\end{definition}

\textbf{What this means:}
\begin{itemize}
\item If you're rated 1500 and opponent is 1500: $E = 0.5$ (50-50 matchup)
\item If you're rated 1600 and opponent is 1500: $E \approx 0.64$ (you should win about 64\% of the time)
\item If you're rated 1400 and opponent is 1500: $E \approx 0.36$ (you should win about 36\% of the time)
\end{itemize}

The formula uses $10^x$ (powers of 10) because it's traditional in chess ELO. The 400 in the denominator is a scaling factor.

\subsection{Rating Change Formula}

After each match:

\begin{equation}
\Delta R = K \cdot (P_{\text{actual}} - E)
\end{equation}

\begin{definition}[Plain English]
Your rating change ($\Delta R$) is:
\begin{itemize}
\item $K$ = How much weight each match has (32 for casual play)
\item $P_{\text{actual}}$ = Your actual performance (0.0 to 1.0)
\item $E$ = Expected performance
\end{itemize}
\end{definition}

\textbf{Examples:}

\begin{example}[Expected Win]
You (1500) beat opponent (1500):
\begin{align*}
E &= 0.5 \\
P_{\text{actual}} &= 1.0 \text{ (you won)} \\
\Delta R &= 32 \cdot (1.0 - 0.5) = 16 \text{ points}
\end{align*}
\end{example}

\begin{example}[Upset Win]
You (1400) beat opponent (1500):
\begin{align*}
E &\approx 0.36 \\
P_{\text{actual}} &= 1.0 \\
\Delta R &= 32 \cdot (1.0 - 0.36) \approx 20.5 \text{ points}
\end{align*}
You gain more because you won an upset!
\end{example}

\begin{example}[Expected Loss]
You (1600) lose to opponent (1500):
\begin{align*}
E &\approx 0.64 \\
P_{\text{actual}} &= 0.0 \text{ (you lost)} \\
\Delta R &= 32 \cdot (0.0 - 0.64) \approx -20.5 \text{ points}
\end{align*}
You lose more because it was an upset loss!
\end{example}

\section{Pickleball-Specific Innovations}

\subsection{Per-Point Performance Scoring}

In pickleball, matches are scored to 11 (win by 2). A 11-9 match is very different from an 11-2 match, even if both are wins.

Instead of binary win/loss, we use:

\begin{equation}
P_{\text{actual}} = \frac{\text{Points Scored}}{\text{Total Points}}
\end{equation}

\begin{definition}[Plain English]
Your actual performance is simply: how many points did you score out of total points played?
\end{definition}

\textbf{Examples:}
\begin{itemize}
\item 11-9 win: $P = 11/20 = 0.55$ (55\% of points)
\item 11-2 win: $P = 11/13 = 0.846$ (84.6\% of points)
\item 5-11 loss: $P = 5/16 = 0.3125$ (31.25\% of points)
\end{itemize}

This is more nuanced than binary outcomes and captures match quality.

\subsection{The Effective Opponent Formula (Doubles)}

In doubles, your partner's strength matters. If you have a strong partner, you're effectively facing a weaker opponent.

We use:

\begin{equation}
R_{\text{effective opponent}} = R_{\text{opp1}} + R_{\text{opp2}} - R_{\text{teammate}}
\end{equation}

\begin{definition}[Plain English]
Your effective opponent rating accounts for:
\begin{itemize}
\item How strong your actual opponents are
\item How strong your teammate is (strong teammate = easier match for you)
\end{itemize}
\end{definition}

\textbf{Examples:}

\begin{example}[Balanced Teams]
\begin{itemize}
\item Opponents: 1500, 1500
\item Your teammate: 1500
\item Effective opponent: $1500 + 1500 - 1500 = 1500$
\end{itemize}
Neutral situation.
\end{example}

\begin{example}[Strong Partner]
\begin{itemize}
\item Opponents: 1500, 1500
\item Your teammate: 1600
\item Effective opponent: $1500 + 1500 - 1600 = 1400$
\end{itemize}
Your partner carried you! The system treats the match as easier (lower effective opponent).
\end{example}

\begin{example}[Weak Partner]
\begin{itemize}
\item Opponents: 1500, 1500
\item Your teammate: 1400
\item Effective opponent: $1500 + 1500 - 1400 = 1600$
\end{itemize}
You were undermanned. The system treats the match as harder (higher effective opponent).
\end{example}

This is fair: if you beat strong opponents with a weak partner, you gain more rating. If you barely beat weaker opponents with help, you gain less.

\section{Before/After: System Migration}

\subsection{What Changed}

We migrated from Glicko-2 (complex, three parameters per player) to pure ELO (one parameter per player).

Key differences:

\begin{table}[h]
\centering
\begin{tabular}{|l|c|c|}
\hline
\textbf{Feature} & \textbf{Glicko-2} & \textbf{Pure ELO} \\
\hline
Parameters per player & 3 (rating, RD, volatility) & 1 (rating only) \\
Complexity & High & Low \\
Transparency & Medium & High \\
Per-point scoring & Yes & Yes \\
Effective opponent (doubles) & Weighted avg & Opp1+Opp2-Teammate \\
\hline
\end{tabular}
\end{table}

\subsection{Migration Data: Old vs New Ratings}

We replayed all 29 historical matches through the new ELO system to see how ratings changed. Here's the comparison:

\begin{table}[h]
\centering
\begin{tabular}{|l|r|r|r|r|}
\hline
\textbf{Player} & \textbf{Old Glicko Avg} & \textbf{New ELO} & \textbf{Change} & \textbf{Matches} \\
\hline
Andrew Stricklin & 1651 & 1538 & \textcolor{attention}{-113} & 19 \\
David Pabst & 1562 & 1522 & \textcolor{attention}{-40} & 11 \\
Jacklyn Wyszynski & 1557 & 1514 & \textcolor{attention}{-43} & 9 \\
Eliana Crew & 1485 & 1497 & \textcolor{success}{+11} & 13 \\
Krzysztof Radziszeski & 1473 & 1476 & \textcolor{success}{+3} & 25 \\
Dane Sabo & 1290 & 1449 & \textcolor{success}{+159} & 25 \\
\hline
\end{tabular}
\caption{Rating comparison after replaying all matches through the new system}
\end{table}

\textbf{Key observations:}
\begin{itemize}
\item \textbf{Rating spread compressed:} Old system had 361 points between top and bottom; new system has only 89 points. This makes sense—we're a recreational group, not pros.
\item \textbf{Biggest winner:} Dane (+159 points). The old system was penalizing him for losses with weaker partners. The new effective opponent formula gives credit for ``carrying.''
\item \textbf{Biggest loser:} Andrew (-113 points). Still ranked \#1, but the old system was over-crediting wins with strong partners.
\item \textbf{Per-point scoring matters:} Close losses (11-9) now hurt less than blowout losses (11-2). This rewards competitive play even in defeat.
\end{itemize}

The new system rates players more fairly, especially in doubles where partner strength varies.

\section{Implementation Notes}

\subsection{K-Factor}

We use $K = 32$, which is standard for casual chess. This means:
\begin{itemize}
\item Each match typically changes your rating by 10--20 points
\item It takes 5--10 matches to change rating by 100 points
\item Reasonable for recreational play
\end{itemize}

Alternative: $K = 48$ (more volatile, faster changes) or $K = 16$ (slower, more stable).

\subsection{Starting Rating}

All new players start at 1500. This is arbitrary but standard in ELO systems.

\subsection{Minimum Rating}

Ratings never go below 1. This prevents the system from producing absurd values.

\section{Frequently Asked Questions}

\begin{enumerate}
\item \textbf{Why not keep Glicko-2?}
  \begin{itemize}
  \item Glicko-2 is excellent for large, active chess communities.
  \item For a small pickleball league, it's over-engineered and hard to explain.
  \item Pure ELO is simpler and still fair.
  \end{itemize}

\item \textbf{How do I know if my rating is accurate?}
  \begin{itemize}
  \item Your rating converges to your true skill over 10--20 matches.
  \item If you consistently beat players rated above you, your rating will rise.
  \item If you lose to players rated below you, your rating will drop.
  \end{itemize}

\item \textbf{Why does my doubles rating matter in singles?}
  \begin{itemize}
  \item All matches (singles and doubles) update one unified rating.
  \item Your true skill is roughly the same in both formats.
  \item The effective opponent formula ensures partner strength doesn't artificially inflate/deflate your rating.
  \end{itemize}

\item \textbf{Can I lose rating for a win?}
  \begin{itemize}
  \item No. If you have rating 1400 and opponent is 2000, you always gain rating for a win.
  \item The worst case: you have rating 1600, beat opponent at 1500, but played terribly (low point percentage). You gain less.
  \end{itemize}

\end{enumerate}

\section{Conclusion}

The ELO system is transparent, fair, and easy to understand. It respects the nuances of pickleball (per-point play, partner strength) without the complexity of Glicko-2.

Your rating now reflects your true skill more accurately than ever.

\end{document}