Migration: rebuild battle-test learnings + opening-balance orphan fix

- build_rebuild_dataset.py: subtract orphan paired-transfer amounts from
  destination card's derived opening; html.unescape descriptions.
- merchant_map.json: +110 auto-tail rules from rebuild long-tail, +20
  recurring rules + 135 auto-cluster acceptances; stripped all cached
  account_ids; Rock Auto -> Z(Mizumi) review:true; Duquesne Light ->
  Utilities; categories stripped from _auto_tail rules per user policy.
- migration/README.md: 'Lessons from the first rebuild' section.
- migration/rebuild_clusters.{json,md}: clustering proposal artifact.
This commit is contained in:
Dane Sabo 2026-05-25 21:05:38 -04:00
parent 26fb19ca9a
commit e446c4097a
5 changed files with 1876 additions and 57 deletions

File diff suppressed because it is too large Load Diff

View File

@ -76,3 +76,32 @@ transfers.
- `*review_preview*.html` -- review-UI previews on real data
Nothing here writes to Firefly except the final `--post` in step 6.
## Lessons from the first rebuild (2026-05-20)
Captured here so a second rebuild doesn't re-discover them.
- **Orphan paired transfers**: the PNC->Apple payment from 2025-08-01 has no
Apple-side line (Apple's QFX starts 08-02). Its effect was already in
Apple's derived opening; posting the transfer ALSO crediting Apple
double-counted by $3,218. Fix: `build_rebuild_dataset.py` now subtracts
orphan transfer amounts from the destination card's opening. See
`references/transfers.md` in the skill.
- **Asset accounts require `account_role`** on POST /accounts. `defaultAsset`
works universally.
- **Budgets do not auto-create.** If wiping to scratch, recreate Needs /
Wants / Savings via UI or POST before the import.
- **Wipe via UI leaves stale revenue accounts / categories** (only
transaction-referenced asset accounts go). Prune manually if you want a
truly clean slate.
- **Strip cached `account_id` from `merchant_map.json` before any rebuild.**
Pre-wipe ids are invalid post-wipe. The skill no longer caches to the map
(in-memory only) but old maps may still carry stale ids.
- **Background Python with `nohup ... &` can lose stdout to buffering.** Use
`python -u` for the import step. The first rebuild's log was empty because
Python buffered everything and we mistook it for "ran but did nothing."
- **`error_if_duplicate_hash` is now off** — Firefly's content-hash dedup
was too eager (rejected legit-distinct rows with same date+amt+desc, like
two parking sessions same garage). `external_id` precheck is the only dedup.
- **Wipe by deleting transactions, not by deleting accounts.** Otherwise you
end up with stale ids referenced by merchant_map cache.

View File

@ -13,7 +13,7 @@ Costco, with:
Nothing is posted. Output feeds `firefly_import.py --emit-plan/--review-html`.
"""
import re, json, hashlib, sys
import re, json, hashlib, sys, html
from collections import Counter
D = "/Users/danesabo/Documents/Finances/EXPORTS/-MAY172026"
@ -35,7 +35,7 @@ def parse(path):
for b in blocks:
out.append({"date": g(b, "DTPOSTED")[:8], "amt": float(g(b, "TRNAMT")),
"ttype": g(b, "TRNTYPE").upper(),
"desc": (g(b, "NAME") + " " + g(b, "MEMO")).strip(),
"desc": html.unescape((g(b, "NAME") + " " + g(b, "MEMO")).strip()),
"fitid": g(b, "FITID")})
return ledger, out
@ -100,6 +100,24 @@ for acct, (path, tag) in SRC.items():
rec["type"] = "withdrawal" if amt < 0 else "deposit"
records.append(rec)
# --- Orphan adjustment: a PNC->Apple/Costco payment whose date predates the
# card QFX window has its card-side effect already baked into the card's
# DERIVED opening (because opening = ledger - sum_kept_card_lines, and the
# orphan never appeared on the card side). If we ALSO post the PNC->card
# transfer in the rebuild, the card account gets credited twice. So subtract
# orphan transfer amounts from the card opening.
APPLE_WINDOW_START = "2025-08-02"
COSTCO_WINDOW_START = "2025-08-02"
for r in records:
if r.get("type") == "transfer" and r["asset_account"] == "PNC Checking":
dest = r.get("destination_account")
if dest == "Apple Credit Card" and r["date"] < APPLE_WINDOW_START:
recon["Apple Credit Card"]["opening"] -= float(r["amount"])
recon["Apple Credit Card"]["opening"] = round(recon["Apple Credit Card"]["opening"], 2)
elif dest == "Costco Visa Card" and r["date"] < COSTCO_WINDOW_START:
recon["Costco Visa Card"]["opening"] -= float(r["amount"])
recon["Costco Visa Card"]["opening"] = round(recon["Costco Visa Card"]["opening"], 2)
print("=== RECONCILIATION (must all tie) ===")
ok = True
for a, r in recon.items():

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,86 @@
%PDF-1.4
%“Œ‹ž ReportLab Generated PDF document (opensource)
1 0 obj
<<
/F1 2 0 R /F2 3 0 R /F3 4 0 R /F4 5 0 R
>>
endobj
2 0 obj
<<
/BaseFont /Helvetica /Encoding /WinAnsiEncoding /Name /F1 /Subtype /Type1 /Type /Font
>>
endobj
3 0 obj
<<
/BaseFont /Helvetica-Bold /Encoding /WinAnsiEncoding /Name /F2 /Subtype /Type1 /Type /Font
>>
endobj
4 0 obj
<<
/BaseFont /Helvetica-Oblique /Encoding /WinAnsiEncoding /Name /F3 /Subtype /Type1 /Type /Font
>>
endobj
5 0 obj
<<
/BaseFont /Symbol /Name /F4 /Subtype /Type1 /Type /Font
>>
endobj
6 0 obj
<<
/Contents 10 0 R /MediaBox [ 0 0 306 576 ] /Parent 9 0 R /Resources <<
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
>> /Rotate 0 /Trans <<
>>
/Type /Page
>>
endobj
7 0 obj
<<
/PageMode /UseNone /Pages 9 0 R /Type /Catalog
>>
endobj
8 0 obj
<<
/Author (Dane Sabo) /CreationDate (D:20260525192222-04'00') /Creator (\(unspecified\)) /Keywords () /ModDate (D:20260525192222-04'00') /Producer (ReportLab PDF Library - \(opensource\))
/Subject (\(unspecified\)) /Title (Sam's Bachelor Party - Settle Up) /Trapped /False
>>
endobj
9 0 obj
<<
/Count 1 /Kids [ 6 0 R ] /Type /Pages
>>
endobj
10 0 obj
<<
/Filter [ /ASCII85Decode /FlateDecode ] /Length 1214
>>
stream
Gatm;>Ar4d'Ro4HS-PjQ<Ue;YA%PN0dY'MNGI*::<@_A[1BB/;a0eVis1[M4JIl1a,"PQ"R</8>o?@RGUgLY,m=bOA0)U!Q!>BFE!&K7YHOi('i5Yh!QGbO<R"QEe5+oYL0Z"=?nG4qD%^\%)pTH`7g,B&V$f?-qH1X)lr9b/os*XZi$BqZWDt#IVidSK>pO@/^a92IC?a#)l;?oFGOCs#Leq!<N\q9LD/9Ne;AUth>@dp.*ja.VPKNmqKm!p4q42N1Rd&L[k[eet=5j)"Y^J"l0@2o>'Oo>DJ!0;'/@5crR6&]8fl0RsO$3iqZNEAj>E7$^M[-m'h/>H1Am9)6jpYgEB7JGcf6+C-q(*0)jiOqRkTMqS]h&RfWncVDmRsErl4:$4?6T"g&Y7-1G-`VH"q4gLL1dP18iq*PRkNUF5bPZ!;o/gAHpcQ56W9Af51AL&<!SiPXKW,<M)B+BJK9H)\&76V#)$mom>U+6j4>FQNJH1]AAA5O==kX)tbHq`-mKSD,<Wb&*B'03WUY;MmZtr6dEr2@8T(pDiR^W>q1\d,2A2m>F<cQ^?.^LM!\lps+*TX'L3@fAn'&RK$e"3^Gq7F`E5t&#FBP8EpXmUq-R[^n@Qb2<<19MNe_'t8\73A^%*,E\_\:^VK_-;b,.3VWOXX#G2V6;&)6!b8L7\Ro9<Gn?d@s2;1M2a^4$c,cf1<CX#-PI@)5Ka!E>$.,!["Cdegcu]c^Kfd5EM3u9Y6hnp5Y`,R@g%c(S1C^P@NOUTb!aU,-?9Maf`'*V?CJqn&Es3E7qK]<:RG/6E]PW1GT<e#NRd;g,7aV[3,HPdI1Jb=069#iRTe;7iXs?Zs%-5?JY?[.43,Y9Feq??AI@1WaP(-57f(!U531KOn9G]k>fgO8GDp0G?F?_Gf$0r'd52HIKoFqbGjFBBSiX$7F?$/1YdtK`#p7Z/\]T)t?7Ym_YEaXoCMe[(9sU)Pf4V40m7u>@4Q/1qo3SH;/9:*^@72fgD[o+EA)hS@._.WEZ1E(o6V['-q!:Nfa47Z>C0:n;#SJR%4,AU*cqA[;S/qH\0fmWs4G[8kUQ2=XH>?fiS+!rt[*$W_E?_#cP/tW/ib]C7UBB!AfPG(12S7>h=C]Om*B2u-1(%$F>J50HQ/,dT:![f/k%`+eCt"n5Lo<nkUS7*J7-bOr5E&P>NuG92V:u1=#m>(@>g$gNRN!sU"Ug8ODu~>endstream
endobj
xref
0 11
0000000000 65535 f
0000000061 00000 n
0000000122 00000 n
0000000229 00000 n
0000000341 00000 n
0000000456 00000 n
0000000533 00000 n
0000000727 00000 n
0000000795 00000 n
0000001090 00000 n
0000001149 00000 n
trailer
<<
/ID
[<3d7dcb593c2c9ba6fed463683a4107b4><3d7dcb593c2c9ba6fed463683a4107b4>]
% ReportLab generated PDF document -- digest (opensource)
/Info 8 0 R
/Root 7 0 R
/Size 11
>>
startxref
2455
%%EOF