- migration/README.md: cold-start rebuild runbook (reconciliation gate,
classification rules, transfer pairing, investment policy, execution order)
- migration/build_rebuild_dataset.py: consolidated 3-QFX builder with PNC-
owned transfers, counterpart pairing & drop, per-account reconciliation
- migration/rebuild_clusters.{json,md}: clustering proposal for the rebuild
- migration/rebuild_review.html: read-only browser review for the 1017-txn
rebuild plan (transfers under PNC, category fixes baked in)
- migration/{pnc_review,review_preview_mixed}.html: earlier UI previews
- merchant_map.json: add 10 settled deterministic rules (Duquesne Light,
Pitt Salary, Interest Payment, IRS, Pitt Tuition, Daily Cash Adjustment,
ATM Surcharge/Yardi/Venmo/Zelle->Don't Know) so the skill stops flagging
pre-classified PNC lines as UNMATCHED
79 lines
3.8 KiB
Markdown
79 lines
3.8 KiB
Markdown
# Firefly rebuild runbook
|
|
|
|
One-time migration: wipe the CSV-era transactions and rebuild from
|
|
FITID-stable QFX so every transaction has a permanent dedup key and a clean
|
|
account taxonomy. Read this before running anything in this folder.
|
|
|
|
## Why a rebuild (not in-place cleanup)
|
|
|
|
Firefly history is young (everything ~Aug 2025+, ~950 txns, minimal manual
|
|
data). Old CSV imports left ~343 fragmented junk expense accounts and no
|
|
stable external_ids. A clean rebuild keyed on QFX `FITID` is a better
|
|
foundation than reassigning junk in place. Decided 2026-05-17.
|
|
|
|
## Hard prerequisites (do not skip)
|
|
|
|
1. **Firefly DB backup.** Destructive, no undo. Do not run the wipe until a
|
|
DB dump/snapshot exists.
|
|
2. **Exports** (in `../EXPORTS/`, gitignored): Apple/PNC/Costco QFX, Aug 2025
|
|
-> now, FITID on 100% of rows. Schwab/Coinbase/Cash (~35 txns) are
|
|
CSV-only/manual, handled separately.
|
|
|
|
## Reconciliation (the trust gate)
|
|
|
|
Per account: `opening_balance = QFX_ledger - sum(all that account's lines)`.
|
|
Classification (transfer vs expense) never changes an account's own balance,
|
|
so `opening + sum == ledger` must hold to the cent before trusting the wipe.
|
|
Verified: PNC opening $6,866.10, Apple -$4,498.79, Costco -$2,541.57 (all
|
|
tie). `rebuild_dryrun.py` recomputes this; re-run after any change.
|
|
|
|
## Classification rules (PNC = the hub)
|
|
|
|
- **Transfers** -- ALWAYS owned by the PNC leg: PNC's posting date and PNC's
|
|
FITID are authoritative, the card/brokerage counterpart line is paired by
|
|
amount (+/- a few days) and dropped. Every transfer lives under PNC, one
|
|
consistent date, never double-counted. Pairs: APPLECARD GSBANK -> Apple
|
|
Credit Card; CITI AUTOPAY -> Costco Visa Card; SCHWAB MONEYLINK -> Schwab
|
|
Stocks/Savings (disambiguate by amount); ATM WITHDRAWAL -> Cash; CARVANA
|
|
PAYOUT -> Illiquid Assets; big ATM DEPOSIT -> Coverdell; CAPITAL ONE ->
|
|
Capital One (closed). Codified in the skill's `references/transfers.md`.
|
|
- **Income/expense**: Pitt salary -> Wages; Duquesne Light -> Utilities:
|
|
Electric; Compeer -> Rent; etc.
|
|
- **Don't Know**: Venmo/CashApp/Zelle ("poker"), unrecallable checks, unknown
|
|
ATM deposits -> the `Don't Know` account, review later. Never guessed.
|
|
- **Special accounts**: `Illiquid Assets` (cars; sale = transfer in),
|
|
`Don't Know` (catch-all). See the skill's memory / taxonomy notes.
|
|
|
|
## Investment accounts
|
|
|
|
Do NOT transaction-import Schwab/Roth/Coverdell/Coinbase (noise, and assets
|
|
!= currency). Model as monthly-valued: opening balance + external MoneyLink
|
|
transfers (from the PNC side) + one monthly valuation adjustment booked to
|
|
`Investment Appreciation` / `Investment: Interest`. Dane supplies the current
|
|
value at import; delta = the adjustment. Savings<->Stocks journals are
|
|
transfers.
|
|
|
|
## Execution order
|
|
|
|
1. `python rebuild_dryrun.py` -> confirm all accounts still reconcile.
|
|
2. Build the full normalized dataset (PNC + Apple + Costco, transfers typed,
|
|
payments paired/deduped, opening balances set).
|
|
3. Drive review via the skill's browser workflow
|
|
(`references/review-workflow.md`): `--review-html`, resolve the ~190 tail
|
|
merchants in-situ (search-then-ask, <80% => ask), Export `decisions.json`.
|
|
4. **Confirm DB backup exists.**
|
|
5. Wipe transactions, prune empty junk expense accounts.
|
|
6. `--decisions decisions.json --post`. Reconcile final balances against the
|
|
derived figures above.
|
|
|
|
## Files here
|
|
|
|
- `rebuild_pnc.py` -- PNC classifier + reconciliation (read-only)
|
|
- `rebuild_dryrun.py` -- consolidated per-account reconciliation (read-only)
|
|
- `pnc_classified.json` -- PNC classification output
|
|
- `merchant_clusters.{json,md}` -- cluster proposal (taxonomy bootstrap)
|
|
- `mock_firefly.py` -- stdlib mock used for skill eval/testing
|
|
- `*review_preview*.html` -- review-UI previews on real data
|
|
|
|
Nothing here writes to Firefly except the final `--post` in step 6.
|