finances/migration/README.md
Dane Sabo 26fb19ca9a Migration runbook + rebuild tooling; 10 PNC/income/Don't Know rules
- migration/README.md: cold-start rebuild runbook (reconciliation gate,
  classification rules, transfer pairing, investment policy, execution order)
- migration/build_rebuild_dataset.py: consolidated 3-QFX builder with PNC-
  owned transfers, counterpart pairing & drop, per-account reconciliation
- migration/rebuild_clusters.{json,md}: clustering proposal for the rebuild
- migration/rebuild_review.html: read-only browser review for the 1017-txn
  rebuild plan (transfers under PNC, category fixes baked in)
- migration/{pnc_review,review_preview_mixed}.html: earlier UI previews
- merchant_map.json: add 10 settled deterministic rules (Duquesne Light,
  Pitt Salary, Interest Payment, IRS, Pitt Tuition, Daily Cash Adjustment,
  ATM Surcharge/Yardi/Venmo/Zelle->Don't Know) so the skill stops flagging
  pre-classified PNC lines as UNMATCHED
2026-05-25 18:54:50 -04:00

79 lines
3.8 KiB
Markdown

# Firefly rebuild runbook
One-time migration: wipe the CSV-era transactions and rebuild from
FITID-stable QFX so every transaction has a permanent dedup key and a clean
account taxonomy. Read this before running anything in this folder.
## Why a rebuild (not in-place cleanup)
Firefly history is young (everything ~Aug 2025+, ~950 txns, minimal manual
data). Old CSV imports left ~343 fragmented junk expense accounts and no
stable external_ids. A clean rebuild keyed on QFX `FITID` is a better
foundation than reassigning junk in place. Decided 2026-05-17.
## Hard prerequisites (do not skip)
1. **Firefly DB backup.** Destructive, no undo. Do not run the wipe until a
DB dump/snapshot exists.
2. **Exports** (in `../EXPORTS/`, gitignored): Apple/PNC/Costco QFX, Aug 2025
-> now, FITID on 100% of rows. Schwab/Coinbase/Cash (~35 txns) are
CSV-only/manual, handled separately.
## Reconciliation (the trust gate)
Per account: `opening_balance = QFX_ledger - sum(all that account's lines)`.
Classification (transfer vs expense) never changes an account's own balance,
so `opening + sum == ledger` must hold to the cent before trusting the wipe.
Verified: PNC opening $6,866.10, Apple -$4,498.79, Costco -$2,541.57 (all
tie). `rebuild_dryrun.py` recomputes this; re-run after any change.
## Classification rules (PNC = the hub)
- **Transfers** -- ALWAYS owned by the PNC leg: PNC's posting date and PNC's
FITID are authoritative, the card/brokerage counterpart line is paired by
amount (+/- a few days) and dropped. Every transfer lives under PNC, one
consistent date, never double-counted. Pairs: APPLECARD GSBANK -> Apple
Credit Card; CITI AUTOPAY -> Costco Visa Card; SCHWAB MONEYLINK -> Schwab
Stocks/Savings (disambiguate by amount); ATM WITHDRAWAL -> Cash; CARVANA
PAYOUT -> Illiquid Assets; big ATM DEPOSIT -> Coverdell; CAPITAL ONE ->
Capital One (closed). Codified in the skill's `references/transfers.md`.
- **Income/expense**: Pitt salary -> Wages; Duquesne Light -> Utilities:
Electric; Compeer -> Rent; etc.
- **Don't Know**: Venmo/CashApp/Zelle ("poker"), unrecallable checks, unknown
ATM deposits -> the `Don't Know` account, review later. Never guessed.
- **Special accounts**: `Illiquid Assets` (cars; sale = transfer in),
`Don't Know` (catch-all). See the skill's memory / taxonomy notes.
## Investment accounts
Do NOT transaction-import Schwab/Roth/Coverdell/Coinbase (noise, and assets
!= currency). Model as monthly-valued: opening balance + external MoneyLink
transfers (from the PNC side) + one monthly valuation adjustment booked to
`Investment Appreciation` / `Investment: Interest`. Dane supplies the current
value at import; delta = the adjustment. Savings<->Stocks journals are
transfers.
## Execution order
1. `python rebuild_dryrun.py` -> confirm all accounts still reconcile.
2. Build the full normalized dataset (PNC + Apple + Costco, transfers typed,
payments paired/deduped, opening balances set).
3. Drive review via the skill's browser workflow
(`references/review-workflow.md`): `--review-html`, resolve the ~190 tail
merchants in-situ (search-then-ask, <80% => ask), Export `decisions.json`.
4. **Confirm DB backup exists.**
5. Wipe transactions, prune empty junk expense accounts.
6. `--decisions decisions.json --post`. Reconcile final balances against the
derived figures above.
## Files here
- `rebuild_pnc.py` -- PNC classifier + reconciliation (read-only)
- `rebuild_dryrun.py` -- consolidated per-account reconciliation (read-only)
- `pnc_classified.json` -- PNC classification output
- `merchant_clusters.{json,md}` -- cluster proposal (taxonomy bootstrap)
- `mock_firefly.py` -- stdlib mock used for skill eval/testing
- `*review_preview*.html` -- review-UI previews on real data
Nothing here writes to Firefly except the final `--post` in step 6.