# Firefly rebuild runbook One-time migration: wipe the CSV-era transactions and rebuild from FITID-stable QFX so every transaction has a permanent dedup key and a clean account taxonomy. Read this before running anything in this folder. ## Why a rebuild (not in-place cleanup) Firefly history is young (everything ~Aug 2025+, ~950 txns, minimal manual data). Old CSV imports left ~343 fragmented junk expense accounts and no stable external_ids. A clean rebuild keyed on QFX `FITID` is a better foundation than reassigning junk in place. Decided 2026-05-17. ## Hard prerequisites (do not skip) 1. **Firefly DB backup.** Destructive, no undo. Do not run the wipe until a DB dump/snapshot exists. 2. **Exports** (in `../EXPORTS/`, gitignored): Apple/PNC/Costco QFX, Aug 2025 -> now, FITID on 100% of rows. Schwab/Coinbase/Cash (~35 txns) are CSV-only/manual, handled separately. ## Reconciliation (the trust gate) Per account: `opening_balance = QFX_ledger - sum(all that account's lines)`. Classification (transfer vs expense) never changes an account's own balance, so `opening + sum == ledger` must hold to the cent before trusting the wipe. Verified: PNC opening $6,866.10, Apple -$4,498.79, Costco -$2,541.57 (all tie). `rebuild_dryrun.py` recomputes this; re-run after any change. ## Classification rules (PNC = the hub) - **Transfers** -- ALWAYS owned by the PNC leg: PNC's posting date and PNC's FITID are authoritative, the card/brokerage counterpart line is paired by amount (+/- a few days) and dropped. Every transfer lives under PNC, one consistent date, never double-counted. Pairs: APPLECARD GSBANK -> Apple Credit Card; CITI AUTOPAY -> Costco Visa Card; SCHWAB MONEYLINK -> Schwab Stocks/Savings (disambiguate by amount); ATM WITHDRAWAL -> Cash; CARVANA PAYOUT -> Illiquid Assets; big ATM DEPOSIT -> Coverdell; CAPITAL ONE -> Capital One (closed). Codified in the skill's `references/transfers.md`. - **Income/expense**: Pitt salary -> Wages; Duquesne Light -> Utilities: Electric; Compeer -> Rent; etc. - **Don't Know**: Venmo/CashApp/Zelle ("poker"), unrecallable checks, unknown ATM deposits -> the `Don't Know` account, review later. Never guessed. - **Special accounts**: `Illiquid Assets` (cars; sale = transfer in), `Don't Know` (catch-all). See the skill's memory / taxonomy notes. ## Investment accounts Do NOT transaction-import Schwab/Roth/Coverdell/Coinbase (noise, and assets != currency). Model as monthly-valued: opening balance + external MoneyLink transfers (from the PNC side) + one monthly valuation adjustment booked to `Investment Appreciation` / `Investment: Interest`. Dane supplies the current value at import; delta = the adjustment. Savings<->Stocks journals are transfers. ## Execution order 1. `python rebuild_dryrun.py` -> confirm all accounts still reconcile. 2. Build the full normalized dataset (PNC + Apple + Costco, transfers typed, payments paired/deduped, opening balances set). 3. Drive review via the skill's browser workflow (`references/review-workflow.md`): `--review-html`, resolve the ~190 tail merchants in-situ (search-then-ask, <80% => ask), Export `decisions.json`. 4. **Confirm DB backup exists.** 5. Wipe transactions, prune empty junk expense accounts. 6. `--decisions decisions.json --post`. Reconcile final balances against the derived figures above. ## Files here - `rebuild_pnc.py` -- PNC classifier + reconciliation (read-only) - `rebuild_dryrun.py` -- consolidated per-account reconciliation (read-only) - `pnc_classified.json` -- PNC classification output - `merchant_clusters.{json,md}` -- cluster proposal (taxonomy bootstrap) - `mock_firefly.py` -- stdlib mock used for skill eval/testing - `*review_preview*.html` -- review-UI previews on real data Nothing here writes to Firefly except the final `--post` in step 6.