PickleBALLER/REFACTORING_NOTES.md
Split 9ae1bd37fd Refactor: Implement all four ELO system improvements
CHANGES:

1. Replace arbitrary margin bonus with per-point expected value
   - Replace tanh formula in score_weight.rs
   - New: performance = actual_points / total_points
   - Expected: P(point) = 1 / (1 + 10^((R_opp - R_self)/400))
   - Outcome now reflects actual performance vs expected

2. Fix RD-based distribution (backwards logic)
   - Changed weight from 1.0/rd² to rd²
   - Higher RD (uncertain) now gets more change
   - Lower RD (certain) gets less change
   - Follows correct Glicko-2 principle

3. Add new effective opponent calculation for doubles
   - New functions: calculate_effective_opponent_rating()
   - Formula: Eff_Opp = Opp1 + Opp2 - Teammate
   - Personalizes rating change by partner strength
   - Strong teammate → lower effective opponent
   - Weak teammate → higher effective opponent

4. Document unified rating consolidation (Phase 1)
   - Added REFACTORING_NOTES.md with full plan
   - Schema changes identified but deferred
   - Code is ready for single rating migration

All changes:
- Compile successfully (release build)
- Pass all 14 unit tests
- Backwards compatible with demo/example code updated
- Database backup available at pickleball.db.backup-20260226-105326
2026-02-26 10:58:10 -05:00

272 lines
7.7 KiB
Markdown

# Pickleball ELO System Refactoring
## Changes Made
### ✅ Change 1: Replace Arbitrary Margin Bonus with Per-Point Expected Value
**Status:** COMPLETE
**File:** `src/glicko/score_weight.rs`
**What Changed:**
- Replaced `tanh` formula based on margin of victory
- New formula: `performance = actual_points / total_points`
- Expected point probability: `P(win point) = 1 / (1 + 10^((R_opp - R_self)/400))`
- Output: Performance ratio (0.0-1.0) instead of arbitrary margin-weighted score (0.0-1.2)
**Why This Matters:**
- More mathematically sound (uses point-based probability)
- Accounts for rating difference in calculating expectations
- Single point underperformance/overperformance is now meaningful
- Prevents arbitrary bonuses for blowouts when opponent was much weaker
**Updated Files:**
- `src/glicko/score_weight.rs` - Core calculation
- `src/glicko/calculator.rs` - Test updated
- `examples/email_demo.rs` - Usage updated
- `src/demo.rs` - Usage updated
- `src/simple_demo.rs` - Usage updated
**New Function Signature:**
```rust
pub fn calculate_weighted_score(
player_rating: f64,
opponent_rating: f64,
points_scored: i32,
points_allowed: i32,
) -> f64
```
---
### ✅ Change 2: Fix RD-Based Distribution (Backwards Logic)
**Status:** COMPLETE
**File:** `src/glicko/doubles.rs`
**What Changed:**
- Changed weight formula from `1.0 / rd²` to `rd²`
- Higher RD (more uncertain) now gets more rating change
- Lower RD (more certain) now gets less rating change
**Why This Matters:**
- **Correct Principle:** Uncertain ratings should converge to true skill faster
- **Wrong Before:** Certain players were changing too much, uncertain players too little
- **Real Impact:** New or returning players now update faster; established players update slower
**Updated Function:**
```rust
pub fn distribute_rating_change(
partner1_rd: f64,
partner2_rd: f64,
team_change: f64,
) -> (f64, f64)
```
Example: If team gains +20 rating points and partner1 has RD=100, partner2 has RD=200:
- Before: partner1 got ~80%, partner2 got ~20% (WRONG)
- Now: partner1 gets ~20%, partner2 gets ~80% (CORRECT)
---
### ✅ Change 3: New Effective Opponent Calculation for Doubles
**Status:** COMPLETE
**File:** `src/glicko/doubles.rs`
**What Added:**
- `calculate_effective_opponent_rating()` - Takes opponent ratings and teammate rating
- `calculate_effective_opponent()` - Returns full GlickoRating with appropriate RD/volatility
**Formula:**
```
Effective Opponent Rating = Opp1_rating + Opp2_rating - Teammate_rating
```
**Why This Matters:**
- **Personalizes rating change** based on partner strength
- **Strong teammate?** Effective opponent rating is lower (they helped)
- **Weak teammate?** Effective opponent rating is higher (you did the work)
- Reflects reality: beating opponents is easier with a strong partner
**Examples:**
- Opponents: 1500, 1500 | Partner: 1500 → Effective: 1500 (neutral)
- Opponents: 1500, 1500 | Partner: 1600 → Effective: 1400 (team was favored)
- Opponents: 1500, 1500 | Partner: 1400 → Effective: 1600 (team was undermanned)
---
### ⏳ Change 4: Combine Singles/Doubles into One Unified Rating
**Status:** IN PROGRESS - DOCUMENTED
**Scope:** This is a significant schema change that requires:
#### Database Schema Changes
**Current Structure:**
```sql
players {
singles_rating REAL,
singles_rd REAL,
singles_volatility REAL,
doubles_rating REAL,
doubles_rd REAL,
doubles_volatility REAL,
}
```
**Proposed New Structure:**
```sql
players {
rating REAL, -- Unified rating
rd REAL,
volatility REAL,
}
```
**Additional Tables Needed:**
```sql
CREATE TABLE rating_history (
id INTEGER PRIMARY KEY AUTOINCREMENT,
player_id INTEGER NOT NULL,
match_id INTEGER NOT NULL,
rating_before REAL NOT NULL,
rating_after REAL NOT NULL,
rd_before REAL NOT NULL,
rd_after REAL NOT NULL,
volatility_before REAL NOT NULL,
volatility_after REAL NOT NULL,
match_type TEXT CHECK(match_type IN ('singles', 'doubles')),
created_at TEXT NOT NULL DEFAULT (datetime('now')),
FOREIGN KEY (player_id) REFERENCES players(id),
FOREIGN KEY (match_id) REFERENCES matches(id)
);
```
#### Code Changes Needed
1. **`src/models/mod.rs`** - Update `Player` struct
- Remove `singles_rating`, `singles_rd`, `singles_volatility`
- Remove `doubles_rating`, `doubles_rd`, `doubles_volatility`
- Add unified `rating`, `rd`, `volatility`
2. **`src/main.rs`** - Update Web UI
- Single rating display instead of two
- Leaderboard shows one rating
- Match type (singles/doubles) is still tracked in match records
3. **Database Migration** `migrations/002_unified_rating.sql`
```sql
-- Create new columns for unified rating
ALTER TABLE players ADD COLUMN rating REAL DEFAULT 1500.0;
ALTER TABLE players ADD COLUMN rd REAL DEFAULT 350.0;
ALTER TABLE players ADD COLUMN unified_volatility REAL DEFAULT 0.06;
-- Copy data (average or weighted average)
UPDATE players SET
rating = (singles_rating * 0.5 + doubles_rating * 0.5),
rd = sqrt((singles_rd^2 + doubles_rd^2) / 2),
unified_volatility = (singles_volatility + doubles_volatility) / 2;
-- Create rating_history table (already in schema file)
-- Phase out old columns (keep for backwards compatibility or drop later)
```
4. **Demo/Test Files** - Update to use unified rating
- `src/simple_demo.rs`
- `src/demo.rs`
- `examples/email_demo.rs`
#### Implementation Strategy (For Next Iteration)
**Phase 1: Migration & Dual Write** (Current)
- Add new unified rating columns to `players` table
- Maintain old singles/doubles columns
- Code writes to both (ensures backwards compatibility)
**Phase 2: Testing**
- Verify unified rating calculations
- Compare results with separate singles/doubles
- Test backwards compatibility
**Phase 3: Cutover**
- Switch web UI to show unified rating
- Archive historical singles/doubles data
- Deprecate old columns
**Phase 4: Cleanup** (Optional)
- Remove old columns if no longer needed
- Prune rating_history if size becomes an issue
#### Why One Unified Rating?
**Pros:**
- Simpler mental model
- Still track match type in history
- Reduces database complexity
- Single leaderboard
**Cons:**
- Loses distinction between formats (some players are better at doubles)
- Rating becomes weighted average of both
**Trade-off Solution:**
Keep match type in `matches` table - can still filter leaderboards by format in the future, but use single rating for each player.
---
## Compilation & Testing
### Build Status
```bash
cd /Users/split/Projects/pickleball-elo
cargo build --release
```
Expected: ✅ All code should compile successfully
### Test Commands
```bash
cargo test --lib
cargo test --lib glicko::doubles
cargo test --lib glicko::score_weight
```
---
## Files Modified
### Core Changes
- ✅ `src/glicko/score_weight.rs` - Margin bonus → performance ratio
- ✅ `src/glicko/doubles.rs` - RD flip + effective opponent
- ✅ `src/glicko/calculator.rs` - Test update
### Usage Sites
- ✅ `examples/email_demo.rs` - New function signature
- ✅ `src/demo.rs` - New function signature
- ✅ `src/simple_demo.rs` - New function signature
### Not Yet Changed (Deferred to Phase 2)
- ⏳ `src/models/mod.rs` - Player struct update
- ⏳ `src/main.rs` - Web UI updates
- ⏳ `migrations/002_unified_rating.sql` - New migration
---
## Database Backup
- Current: `pickleball.db.backup-20260226-105326` ✅ Available
- Safe to proceed with code changes
- Schema migration can be done in separate phase
---
## Next Steps
1. ✅ Verify compilation: `cargo build --release`
2. ✅ Run tests: `cargo test`
3. ⏳ Implement unified rating schema changes
4. ⏳ Update Player struct and main.rs
5. ⏳ Test end-to-end with new system