- 22 UNMATCHED + 2 REVIEW resolutions from the 2026-05-01..05-25 overlap import got folded back via apply_decisions/upsert_rule: the Charlotte bachelor-trip vendors (Tot Hill Farm, Lexington BBQ, Wooden Robot Brewing, Grace O'Malley, Buddy's Place, Stop and Go, Yamazaru, Publix, Krispy Kreme, Charlotte Motor Speedway), Pittsburgh first-timers (Utrecht/Blick's, Westinghouse Food Court, Halal Guys, Five Guys, Firestone, Leatherman, PRT via Masabi), plus Patrick Murphy (rent) and Emerson (wages). - migration/test_overlap_review.html: the rendered review doc from the test (PNC+Apple+Costco 2026-05-01..05-25; 57 dedup-skip, 44 actionable).
Firefly rebuild runbook
One-time migration: wipe the CSV-era transactions and rebuild from FITID-stable QFX so every transaction has a permanent dedup key and a clean account taxonomy. Read this before running anything in this folder.
Why a rebuild (not in-place cleanup)
Firefly history is young (everything ~Aug 2025+, ~950 txns, minimal manual
data). Old CSV imports left ~343 fragmented junk expense accounts and no
stable external_ids. A clean rebuild keyed on QFX FITID is a better
foundation than reassigning junk in place. Decided 2026-05-17.
Hard prerequisites (do not skip)
- Firefly DB backup. Destructive, no undo. Do not run the wipe until a DB dump/snapshot exists.
- Exports (in
../EXPORTS/, gitignored): Apple/PNC/Costco QFX, Aug 2025 -> now, FITID on 100% of rows. Schwab/Coinbase/Cash (~35 txns) are CSV-only/manual, handled separately.
Reconciliation (the trust gate)
Per account: opening_balance = QFX_ledger - sum(all that account's lines).
Classification (transfer vs expense) never changes an account's own balance,
so opening + sum == ledger must hold to the cent before trusting the wipe.
Verified: PNC opening $6,866.10, Apple -$4,498.79, Costco -$2,541.57 (all
tie). rebuild_dryrun.py recomputes this; re-run after any change.
Classification rules (PNC = the hub)
- Transfers -- ALWAYS owned by the PNC leg: PNC's posting date and PNC's
FITID are authoritative, the card/brokerage counterpart line is paired by
amount (+/- a few days) and dropped. Every transfer lives under PNC, one
consistent date, never double-counted. Pairs: APPLECARD GSBANK -> Apple
Credit Card; CITI AUTOPAY -> Costco Visa Card; SCHWAB MONEYLINK -> Schwab
Stocks/Savings (disambiguate by amount); ATM WITHDRAWAL -> Cash; CARVANA
PAYOUT -> Illiquid Assets; big ATM DEPOSIT -> Coverdell; CAPITAL ONE ->
Capital One (closed). Codified in the skill's
references/transfers.md. - Income/expense: Pitt salary -> Wages; Duquesne Light -> Utilities: Electric; Compeer -> Rent; etc.
- Don't Know: Venmo/CashApp/Zelle ("poker"), unrecallable checks, unknown
ATM deposits -> the
Don't Knowaccount, review later. Never guessed. - Special accounts:
Illiquid Assets(cars; sale = transfer in),Don't Know(catch-all). See the skill's memory / taxonomy notes.
Investment accounts
Do NOT transaction-import Schwab/Roth/Coverdell/Coinbase (noise, and assets
!= currency). Model as monthly-valued: opening balance + external MoneyLink
transfers (from the PNC side) + one monthly valuation adjustment booked to
Investment Appreciation / Investment: Interest. Dane supplies the current
value at import; delta = the adjustment. Savings<->Stocks journals are
transfers.
Execution order
python rebuild_dryrun.py-> confirm all accounts still reconcile.- Build the full normalized dataset (PNC + Apple + Costco, transfers typed, payments paired/deduped, opening balances set).
- Drive review via the skill's browser workflow
(
references/review-workflow.md):--review-html, resolve the ~190 tail merchants in-situ (search-then-ask, <80% => ask), Exportdecisions.json. - Confirm DB backup exists.
- Wipe transactions, prune empty junk expense accounts.
--decisions decisions.json --post. Reconcile final balances against the derived figures above.
Files here
rebuild_pnc.py-- PNC classifier + reconciliation (read-only)rebuild_dryrun.py-- consolidated per-account reconciliation (read-only)pnc_classified.json-- PNC classification outputmerchant_clusters.{json,md}-- cluster proposal (taxonomy bootstrap)mock_firefly.py-- stdlib mock used for skill eval/testing*review_preview*.html-- review-UI previews on real data
Nothing here writes to Firefly except the final --post in step 6.
Lessons from the first rebuild (2026-05-20)
Captured here so a second rebuild doesn't re-discover them.
- Orphan paired transfers: the PNC->Apple payment from 2025-08-01 has no
Apple-side line (Apple's QFX starts 08-02). Its effect was already in
Apple's derived opening; posting the transfer ALSO crediting Apple
double-counted by $3,218. Fix:
build_rebuild_dataset.pynow subtracts orphan transfer amounts from the destination card's opening. Seereferences/transfers.mdin the skill. - Asset accounts require
account_roleon POST /accounts.defaultAssetworks universally. - Budgets do not auto-create. If wiping to scratch, recreate Needs / Wants / Savings via UI or POST before the import.
- Wipe via UI leaves stale revenue accounts / categories (only transaction-referenced asset accounts go). Prune manually if you want a truly clean slate.
- Strip cached
account_idfrommerchant_map.jsonbefore any rebuild. Pre-wipe ids are invalid post-wipe. The skill no longer caches to the map (in-memory only) but old maps may still carry stale ids. - Background Python with
nohup ... &can lose stdout to buffering. Usepython -ufor the import step. The first rebuild's log was empty because Python buffered everything and we mistook it for "ran but did nothing." error_if_duplicate_hashis now off — Firefly's content-hash dedup was too eager (rejected legit-distinct rows with same date+amt+desc, like two parking sessions same garage).external_idprecheck is the only dedup.- Wipe by deleting transactions, not by deleting accounts. Otherwise you end up with stale ids referenced by merchant_map cache.