Migration runbook + rebuild tooling; 10 PNC/income/Don't Know rules

- migration/README.md: cold-start rebuild runbook (reconciliation gate,
  classification rules, transfer pairing, investment policy, execution order)
- migration/build_rebuild_dataset.py: consolidated 3-QFX builder with PNC-
  owned transfers, counterpart pairing & drop, per-account reconciliation
- migration/rebuild_clusters.{json,md}: clustering proposal for the rebuild
- migration/rebuild_review.html: read-only browser review for the 1017-txn
  rebuild plan (transfers under PNC, category fixes baked in)
- migration/{pnc_review,review_preview_mixed}.html: earlier UI previews
- merchant_map.json: add 10 settled deterministic rules (Duquesne Light,
  Pitt Salary, Interest Payment, IRS, Pitt Tuition, Daily Cash Adjustment,
  ATM Surcharge/Yardi/Venmo/Zelle->Don't Know) so the skill stops flagging
  pre-classified PNC lines as UNMATCHED
This commit is contained in:
Dane Sabo 2026-05-25 18:54:50 -04:00
parent cc21c48b52
commit 26fb19ca9a
8 changed files with 8019 additions and 46 deletions

View File

@ -5,50 +5,353 @@
"daily_cash_adjustment": "'DAILY CASH ADJUSTMENT' => Apple Card Daily Cash; the ADJUSTMENT is NEGATIVE cashback income (clawback on a return). Sign follows amount; revenue acct 'Apple Card Cashback'."
},
"rules": [
{"match": "AMZN|AMAZON", "regex": true, "account_name": "Amazon", "review": true, "type": "withdrawal"},
{"match": "EBAY", "account_name": "eBay", "review": true, "type": "withdrawal"},
{"match": "SVDP", "account_name": "St. Vincent de Paul", "review": true, "type": "withdrawal"},
{"match": "WINE AND SPIRITS", "account_name": "Fine Wine & Good Spirits", "review": true, "type": "withdrawal"},
{"match": "COMCAST|XFINITY", "regex": true, "account_name": "Comcast / Xfinity", "review": true, "type": "withdrawal"},
{"match": "LIBERTY MUTUAL", "account_name": "Liberty Mutual", "review": true, "type": "withdrawal"},
{"match": "JEGS", "account_name": "JEGS", "review": true, "type": "withdrawal"},
{"match": "APEX RACE PARTS", "account_name": "Apex Race Parts", "review": true, "type": "withdrawal"},
{"match": "ADVANCE AUTO", "account_name": "Advance Auto Parts", "review": true, "type": "withdrawal"},
{"match": "SUBARU OF SOUTH HILLS", "account_name": "Subaru of South Hills", "review": true, "type": "withdrawal"},
{"match": "ALLEGHENY ARMS", "account_name": "Allegheny Arms", "category": "Recreation: Firearms", "review": true, "type": "withdrawal"},
{"match": "WILLI S SKI|WILLIS SKI", "regex": true, "account_name": "Willi's Ski Shop", "category": "Recreation: Snowboarding", "review": true, "type": "withdrawal"},
{"match": "CASTLE SHANNON SHOP", "account_name": "Shop 'n Save", "category": "Groceries", "budget": "Needs", "type": "withdrawal"},
{"match": "MARKET DISTRICT|GIANT EAGLE", "regex": true, "account_name": "Giant Eagle", "category": "Groceries", "budget": "Needs", "type": "withdrawal"},
{"match": "KUHNS", "account_name": "Kuhn's Market", "category": "Groceries", "budget": "Needs", "type": "withdrawal"},
{"match": "COMPEER", "account_name": "Compeer Investments", "category": "Rent", "budget": "Needs", "type": "withdrawal"},
{"match": "UPMC STUDENT INSURANCE", "account_name": "UPMC Student Insurance", "category": "Medical", "budget": "Needs", "type": "withdrawal"},
{"match": "SPIEGEL FREEDMAN", "account_name": "Spiegel Freedman Psychological Associates", "category": "Medical", "budget": "Needs", "type": "withdrawal"},
{"match": "APPLE.COM", "account_name": "Apple", "category": "Subscriptions", "budget": "Wants", "type": "withdrawal"},
{"match": "CLAUDE.AI", "account_name": "Claude.ai", "category": "Subscriptions", "budget": "Wants", "type": "withdrawal"},
{"match": "OPENAI", "account_name": "OpenAI", "category": "Subscriptions", "budget": "Wants", "type": "withdrawal"},
{"match": "YOUTUBE TV", "account_name": "YouTube TV", "category": "Subscriptions", "budget": "Wants", "type": "withdrawal"},
{"match": "PEACOCK", "account_name": "Peacock", "category": "Subscriptions", "budget": "Wants", "type": "withdrawal"},
{"match": "MCDONALD", "account_name": "McDonald's", "category": "Restaurants", "budget": "Wants", "type": "withdrawal"},
{"match": "PAMELA.?.?S? ?DINER|PAMELA'SDINER", "regex": true, "account_name": "Pamela's Diner", "category": "Restaurants", "budget": "Wants", "type": "withdrawal"},
{"match": "PRIMANTI BROS", "account_name": "Primanti Bros", "category": "Restaurants", "budget": "Wants", "type": "withdrawal"},
{"match": "MINEO'S|MINEOS", "regex": true, "account_name": "Mineo's Pizza", "category": "Restaurants", "budget": "Wants", "type": "withdrawal"},
{"match": "DAVE AND ANDY|DAVE & ANDY", "regex": true, "account_name": "Dave & Andy's", "category": "Restaurants", "budget": "Wants", "type": "withdrawal"},
{"match": "STARBUCKS", "account_name": "Starbucks", "category": "Coffee", "budget": "Wants", "type": "withdrawal"},
{"match": "TAZZA D|ENRICO'S TAZZA", "regex": true, "account_name": "Tazza D'Oro", "category": "Coffee", "budget": "Wants", "type": "withdrawal"},
{"match": "LA GOURMANDINE", "account_name": "La Gourmandine", "category": "Coffee", "budget": "Wants", "type": "withdrawal"},
{"match": "NEEDLE & BEAN|NEEDLE & BEAN|NEEDLE AND BEAN", "regex": true, "account_name": "Needle & Bean", "category": "Coffee", "budget": "Wants", "type": "withdrawal"},
{"match": "SHEETZ", "account_name": "Sheetz", "category": "Auto: Fuel", "type": "withdrawal"},
{"match": "BP#9604786|UKANI BRO", "regex": true, "account_name": "BP", "category": "Auto: Fuel", "type": "withdrawal"},
{"match": "24 7 TRAVEL ST", "account_name": "24/7 Travel Store", "category": "Auto: Fuel", "type": "withdrawal"},
{"match": "PITT PARKING", "account_name": "Pitt Parking", "category": "Auto: Parking", "type": "withdrawal"},
{"match": "T2\\* MT LEBANON", "regex": true, "account_name": "Mt Lebanon Parking", "category": "Auto: Parking", "type": "withdrawal"},
{"match": "GLOSS\\* JAYME|XCEPTIONAL STYLE", "regex": true, "account_name": "Xceptional Style", "category": "Personal Care", "budget": "Wants", "type": "withdrawal"}
{
"match": "DUQUESNE LIGHT",
"account_name": "Duquesne Light",
"category": "Utilities: Electric",
"budget": "Needs",
"type": "withdrawal"
},
{
"match": "UNIV PITTSBURGH (PAYROLL|SALARY)",
"regex": true,
"account_name": "Pitt Salary",
"category": "Wages",
"type": "deposit"
},
{
"match": "INTEREST PAYMENT",
"account_name": "Interest Income",
"category": "Investment: Interest",
"type": "deposit"
},
{
"match": "IRS TREAS 310",
"account_name": "IRS Refund",
"category": "Taxes",
"type": "deposit"
},
{
"match": "PITT TUITION",
"account_name": "University of Pittsburgh",
"category": "Education",
"type": "withdrawal"
},
{
"match": "DAILY CASH ADJUSTMENT",
"account_name": "Apple Card Cashback",
"category": "Other",
"type": "deposit",
"review": true
},
{
"match": "ATM SURCHARGE REIMB",
"account_name": "Don't Know",
"type": "deposit"
},
{
"match": "YARDI PENNY TEST",
"account_name": "Don't Know",
"type": "deposit",
"review": true
},
{
"match": "VENMO CASHOUT|CASH APP|ZEL FROM",
"regex": true,
"account_name": "Don't Know",
"type": "deposit",
"review": true
},
{
"match": "ATM DEPOSIT",
"account_name": "Don't Know",
"type": "deposit",
"review": true
},
{
"match": "AMZN|AMAZON",
"regex": true,
"account_name": "Amazon",
"review": true,
"type": "withdrawal",
"account_id": 720
},
{
"match": "EBAY",
"account_name": "eBay",
"review": true,
"type": "withdrawal",
"account_id": 622
},
{
"match": "SVDP",
"account_name": "St. Vincent de Paul",
"review": true,
"type": "withdrawal"
},
{
"match": "WINE AND SPIRITS",
"account_name": "Fine Wine & Good Spirits",
"review": true,
"type": "withdrawal"
},
{
"match": "COMCAST|XFINITY",
"regex": true,
"account_name": "Comcast / Xfinity",
"review": true,
"type": "withdrawal",
"account_id": 585
},
{
"match": "LIBERTY MUTUAL",
"account_name": "Liberty Mutual",
"review": true,
"type": "withdrawal",
"account_id": 871
},
{
"match": "JEGS",
"account_name": "JEGS",
"review": true,
"type": "withdrawal",
"account_id": 887
},
{
"match": "APEX RACE PARTS",
"account_name": "Apex Race Parts",
"review": true,
"type": "withdrawal",
"account_id": 594
},
{
"match": "ADVANCE AUTO",
"account_name": "Advance Auto Parts",
"review": true,
"type": "withdrawal"
},
{
"match": "SUBARU OF SOUTH HILLS",
"account_name": "Subaru of South Hills",
"review": true,
"type": "withdrawal",
"account_id": 555
},
{
"match": "ALLEGHENY ARMS",
"account_name": "Allegheny Arms",
"category": "Recreation: Firearms",
"review": true,
"type": "withdrawal",
"account_id": 657
},
{
"match": "WILLI S SKI|WILLIS SKI",
"regex": true,
"account_name": "Willi's Ski Shop",
"category": "Recreation: Snowboarding",
"review": true,
"type": "withdrawal"
},
{
"match": "CASTLE SHANNON SHOP",
"account_name": "Shop 'n Save",
"category": "Groceries",
"budget": "Needs",
"type": "withdrawal",
"account_id": 572
},
{
"match": "MARKET DISTRICT|GIANT EAGLE",
"regex": true,
"account_name": "Giant Eagle",
"category": "Groceries",
"budget": "Needs",
"type": "withdrawal",
"account_id": 592
},
{
"match": "KUHNS",
"account_name": "Kuhn's Market",
"category": "Groceries",
"budget": "Needs",
"type": "withdrawal",
"account_id": 563
},
{
"match": "COMPEER",
"account_name": "Compeer Investments",
"category": "Rent",
"budget": "Needs",
"type": "withdrawal"
},
{
"match": "UPMC STUDENT INSURANCE",
"account_name": "UPMC Student Insurance",
"category": "Medical",
"budget": "Needs",
"type": "withdrawal",
"account_id": 612
},
{
"match": "SPIEGEL FREEDMAN",
"account_name": "Spiegel Freedman Psychological Associates",
"category": "Medical",
"budget": "Needs",
"type": "withdrawal"
},
{
"match": "APPLE.COM",
"account_name": "Apple",
"category": "Subscriptions",
"budget": "Wants",
"type": "withdrawal"
},
{
"match": "CLAUDE.AI",
"account_name": "Claude.ai",
"category": "Subscriptions",
"budget": "Wants",
"type": "withdrawal"
},
{
"match": "OPENAI",
"account_name": "OpenAI",
"category": "Subscriptions",
"budget": "Wants",
"type": "withdrawal",
"account_id": 576
},
{
"match": "YOUTUBE TV",
"account_name": "YouTube TV",
"category": "Subscriptions",
"budget": "Wants",
"type": "withdrawal"
},
{
"match": "PEACOCK",
"account_name": "Peacock",
"category": "Subscriptions",
"budget": "Wants",
"type": "withdrawal",
"account_id": 607
},
{
"match": "MCDONALD",
"account_name": "McDonald's",
"category": "Restaurants",
"budget": "Wants",
"type": "withdrawal",
"account_id": 580
},
{
"match": "PAMELA.?.?S? ?DINER|PAMELA'SDINER",
"regex": true,
"account_name": "Pamela's Diner",
"category": "Restaurants",
"budget": "Wants",
"type": "withdrawal"
},
{
"match": "PRIMANTI BROS",
"account_name": "Primanti Bros",
"category": "Restaurants",
"budget": "Wants",
"type": "withdrawal",
"account_id": 747
},
{
"match": "MINEO'S|MINEOS",
"regex": true,
"account_name": "Mineo's Pizza",
"category": "Restaurants",
"budget": "Wants",
"type": "withdrawal"
},
{
"match": "DAVE AND ANDY|DAVE & ANDY",
"regex": true,
"account_name": "Dave & Andy's",
"category": "Restaurants",
"budget": "Wants",
"type": "withdrawal",
"account_id": 769
},
{
"match": "STARBUCKS",
"account_name": "Starbucks",
"category": "Coffee",
"budget": "Wants",
"type": "withdrawal",
"account_id": 611
},
{
"match": "TAZZA D|ENRICO'S TAZZA",
"regex": true,
"account_name": "Tazza D'Oro",
"category": "Coffee",
"budget": "Wants",
"type": "withdrawal"
},
{
"match": "LA GOURMANDINE",
"account_name": "La Gourmandine",
"category": "Coffee",
"budget": "Wants",
"type": "withdrawal",
"account_id": 595
},
{
"match": "NEEDLE & BEAN|NEEDLE & BEAN|NEEDLE AND BEAN",
"regex": true,
"account_name": "Needle & Bean",
"category": "Coffee",
"budget": "Wants",
"type": "withdrawal",
"account_id": 660
},
{
"match": "SHEETZ",
"account_name": "Sheetz",
"category": "Auto: Fuel",
"type": "withdrawal",
"account_id": 566
},
{
"match": "BP#9604786|UKANI BRO",
"regex": true,
"account_name": "BP",
"category": "Auto: Fuel",
"type": "withdrawal"
},
{
"match": "24 7 TRAVEL ST",
"account_name": "24/7 Travel Store",
"category": "Auto: Fuel",
"type": "withdrawal"
},
{
"match": "PITT PARKING",
"account_name": "Pitt Parking",
"category": "Auto: Parking",
"type": "withdrawal",
"account_id": 870
},
{
"match": "T2\\* MT LEBANON",
"regex": true,
"account_name": "Mt Lebanon Parking",
"category": "Auto: Parking",
"type": "withdrawal"
},
{
"match": "GLOSS\\* JAYME|XCEPTIONAL STYLE",
"regex": true,
"account_name": "Xceptional Style",
"category": "Personal Care",
"budget": "Wants",
"type": "withdrawal"
}
]
}
}

78
migration/README.md Normal file
View File

@ -0,0 +1,78 @@
# Firefly rebuild runbook
One-time migration: wipe the CSV-era transactions and rebuild from
FITID-stable QFX so every transaction has a permanent dedup key and a clean
account taxonomy. Read this before running anything in this folder.
## Why a rebuild (not in-place cleanup)
Firefly history is young (everything ~Aug 2025+, ~950 txns, minimal manual
data). Old CSV imports left ~343 fragmented junk expense accounts and no
stable external_ids. A clean rebuild keyed on QFX `FITID` is a better
foundation than reassigning junk in place. Decided 2026-05-17.
## Hard prerequisites (do not skip)
1. **Firefly DB backup.** Destructive, no undo. Do not run the wipe until a
DB dump/snapshot exists.
2. **Exports** (in `../EXPORTS/`, gitignored): Apple/PNC/Costco QFX, Aug 2025
-> now, FITID on 100% of rows. Schwab/Coinbase/Cash (~35 txns) are
CSV-only/manual, handled separately.
## Reconciliation (the trust gate)
Per account: `opening_balance = QFX_ledger - sum(all that account's lines)`.
Classification (transfer vs expense) never changes an account's own balance,
so `opening + sum == ledger` must hold to the cent before trusting the wipe.
Verified: PNC opening $6,866.10, Apple -$4,498.79, Costco -$2,541.57 (all
tie). `rebuild_dryrun.py` recomputes this; re-run after any change.
## Classification rules (PNC = the hub)
- **Transfers** -- ALWAYS owned by the PNC leg: PNC's posting date and PNC's
FITID are authoritative, the card/brokerage counterpart line is paired by
amount (+/- a few days) and dropped. Every transfer lives under PNC, one
consistent date, never double-counted. Pairs: APPLECARD GSBANK -> Apple
Credit Card; CITI AUTOPAY -> Costco Visa Card; SCHWAB MONEYLINK -> Schwab
Stocks/Savings (disambiguate by amount); ATM WITHDRAWAL -> Cash; CARVANA
PAYOUT -> Illiquid Assets; big ATM DEPOSIT -> Coverdell; CAPITAL ONE ->
Capital One (closed). Codified in the skill's `references/transfers.md`.
- **Income/expense**: Pitt salary -> Wages; Duquesne Light -> Utilities:
Electric; Compeer -> Rent; etc.
- **Don't Know**: Venmo/CashApp/Zelle ("poker"), unrecallable checks, unknown
ATM deposits -> the `Don't Know` account, review later. Never guessed.
- **Special accounts**: `Illiquid Assets` (cars; sale = transfer in),
`Don't Know` (catch-all). See the skill's memory / taxonomy notes.
## Investment accounts
Do NOT transaction-import Schwab/Roth/Coverdell/Coinbase (noise, and assets
!= currency). Model as monthly-valued: opening balance + external MoneyLink
transfers (from the PNC side) + one monthly valuation adjustment booked to
`Investment Appreciation` / `Investment: Interest`. Dane supplies the current
value at import; delta = the adjustment. Savings<->Stocks journals are
transfers.
## Execution order
1. `python rebuild_dryrun.py` -> confirm all accounts still reconcile.
2. Build the full normalized dataset (PNC + Apple + Costco, transfers typed,
payments paired/deduped, opening balances set).
3. Drive review via the skill's browser workflow
(`references/review-workflow.md`): `--review-html`, resolve the ~190 tail
merchants in-situ (search-then-ask, <80% => ask), Export `decisions.json`.
4. **Confirm DB backup exists.**
5. Wipe transactions, prune empty junk expense accounts.
6. `--decisions decisions.json --post`. Reconcile final balances against the
derived figures above.
## Files here
- `rebuild_pnc.py` -- PNC classifier + reconciliation (read-only)
- `rebuild_dryrun.py` -- consolidated per-account reconciliation (read-only)
- `pnc_classified.json` -- PNC classification output
- `merchant_clusters.{json,md}` -- cluster proposal (taxonomy bootstrap)
- `mock_firefly.py` -- stdlib mock used for skill eval/testing
- `*review_preview*.html` -- review-UI previews on real data
Nothing here writes to Firefly except the final `--post` in step 6.

View File

@ -0,0 +1,116 @@
"""Build the full rebuild dataset from the 3 QFX (READ-ONLY).
Emits one normalized.json (the skill's schema) for ALL of PNC + Apple +
Costco, with:
- transfers OWNED BY THE PNC LEG (PNC date + FITID authoritative); the
Apple PAYMENT lines and Costco positive AUTOPAY lines are the
counterparts and are DROPPED (paired by amount, +/- 6 days).
- PNC classified per the runbook (income / expense / Don't Know / special).
- Apple/Costco: negative = withdrawal (merchant), positive = deposit
(refund). merchant_map matching is left to firefly_import.py downstream.
- per-account reconciliation: opening + sum(its kept lines) must == QFX
ledger, else abort (no silent data loss).
Nothing is posted. Output feeds `firefly_import.py --emit-plan/--review-html`.
"""
import re, json, hashlib, sys
from collections import Counter
D = "/Users/danesabo/Documents/Finances/EXPORTS/-MAY172026"
SRC = {
"PNC Checking": (f"{D}/PNC7552Aug012025-May152025.QFX", "pnc"),
"Apple Credit Card": (f"{D}/Apple Card Transactions Aug 01 2025 - May 17 2026.qfx", "apple"),
"Costco Visa Card": (f"{D}/CitiCostcoCard Aug012025-May172025.QFX","costco"),
}
def parse(path):
t = open(path, encoding="latin-1", errors="replace").read()
m = re.search(r"<LEDGERBAL>.*?<BALAMT>([^<\r\n]*)", t, re.S | re.I)
ledger = float(m.group(1))
blocks = re.findall(r"<STMTTRN>(.*?)(?=<STMTTRN>|</BANKTRANLIST>)", t, re.S | re.I)
def g(b, k):
mm = re.search(rf"<{k}>([^<\r\n]*)", b, re.I)
return mm.group(1).strip() if mm else ""
out = []
for b in blocks:
out.append({"date": g(b, "DTPOSTED")[:8], "amt": float(g(b, "TRNAMT")),
"ttype": g(b, "TRNTYPE").upper(),
"desc": (g(b, "NAME") + " " + g(b, "MEMO")).strip(),
"fitid": g(b, "FITID")})
return ledger, out
def iso(d): # YYYYMMDD -> YYYY-MM-DD
return f"{d[:4]}-{d[4:6]}-{d[6:8]}" if len(d) >= 8 else d
# ---- PNC classification (runbook) ---------------------------------------
def classify_pnc(desc, amt):
d = desc.upper()
if "APPLECARD GSBANK PAYMENT" in d: return ("transfer", "Apple Credit Card")
if "CITI AUTOPAY PAYMENT" in d: return ("transfer", "Costco Visa Card")
if "SCHWAB BROKERAGE MONEYLINK" in d:
# amount disambiguation per the Schwab JSONs
return ("transfer", "Schwab Savings" if abs(amt) in (5000.0, 3550.0)
else "Schwab Stocks")
if "ATM WITHDRAWAL" in d: return ("transfer", "Cash")
if "CARVANA PAYOUT" in d: return ("transfer", "Illiquid Assets")
if "ATM DEPOSIT" in d and abs(amt) > 10000: return ("transfer", "Coverdell")
if "CAPITAL ONE TRANSFER" in d: return ("transfer", "Capital One")
if "UNIV PITTSBURGH" in d and ("PAYROLL" in d or "SALARY" in d):
return ("deposit", "Pitt Salary")
if "INTEREST PAYMENT" in d: return ("deposit", "Interest Income")
if "IRS TREAS 310" in d: return ("deposit", "IRS Refund")
if "DUQUESNE LIGHT" in d: return ("withdrawal", "Duquesne Light")
if "COMPEER" in d: return ("withdrawal", "Compeer Investments")
if "PITT TUITION" in d: return ("withdrawal", "University of Pittsburgh")
if any(k in d for k in ("VENMO CASHOUT","CASH APP","ZEL FROM","ATM SURCHARGE","YARDI")):
return ("dontknow", "Don't Know")
return ("raw", None) # leave to merchant_map / review downstream
records, recon, dropped = [], {}, Counter()
for acct, (path, tag) in SRC.items():
ledger, txns = parse(path)
s = round(sum(t["amt"] for t in txns), 2)
opening = round(ledger - s, 2)
recon[acct] = {"ledger": ledger, "sum": s, "opening": opening,
"ties": abs(opening + s - ledger) < 0.01}
for t in txns:
amt, d = t["amt"], t["desc"]
ext = f"{tag}:{t['fitid'] or hashlib.sha1((iso(t['date'])+d+str(amt)).encode()).hexdigest()[:16]}"
if acct == "Apple Credit Card" and t["ttype"] == "PAYMENT":
dropped["apple_payment(paired->PNC)"] += 1; continue
if acct == "Costco Visa Card" and amt > 0 and "AUTOPAY" in d.upper():
dropped["costco_autopay(paired->PNC)"] += 1; continue
rec = {"date": iso(t["date"]), "amount": f"{abs(amt):.2f}",
"description": d, "asset_account": acct, "source_tag": tag,
"source_txn_id": t["fitid"] or None, "currency_code": "USD"}
if acct == "PNC Checking":
kind, target = classify_pnc(d, amt)
if kind == "transfer":
rec["type"] = "transfer"
if amt < 0: rec["destination_account"] = target
else: rec["type"] = "transfer"; rec["asset_account"] = target; rec["destination_account"] = "PNC Checking"
elif kind in ("deposit", "withdrawal"):
rec["type"] = kind; rec["_canonical"] = target
elif kind == "dontknow":
rec["type"] = "withdrawal" if amt < 0 else "deposit"
rec["_canonical"] = "Don't Know"
else:
rec["type"] = "withdrawal" if amt < 0 else "deposit"
else:
rec["type"] = "withdrawal" if amt < 0 else "deposit"
records.append(rec)
print("=== RECONCILIATION (must all tie) ===")
ok = True
for a, r in recon.items():
flag = "OK" if r["ties"] else "*** MISMATCH ***"
ok &= r["ties"]
print(f" {a:20} ledger {r['ledger']:>11,.2f} Σ {r['sum']:>11,.2f} "
f"opening {r['opening']:>11,.2f} {flag}")
print("dropped (paired counterparts):", dict(dropped))
print(f"normalized records: {len(records)}")
if not ok:
print("ABORT: a reconciliation does not tie.", file=sys.stderr); sys.exit(1)
json.dump(records, open("/tmp/rebuild_normalized.json", "w"), indent=1)
json.dump(recon, open("/tmp/rebuild_recon.json", "w"), indent=1)
print("wrote /tmp/rebuild_normalized.json")

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,192 @@
# Merchant cluster proposal
- 386 clusters from 372 accounts + 1017 statement txns
- **142** auto-proposable (>=0.80, clean canonical)
- **244** NEED DANE (ambiguous / junky canonical / new merchant)
## NEEDS DANE (top 40 by volume)
_For each: what is the real merchant? You can type a name; it becomes a permanent rule._
- **?** (conf 0.57, weight 75, 28 accts, 47 stmt) guess=`Amazon`
- desc: `AMAZON MARK* B00SF6VV0410 TERRY`
- desc: `AMAZON.COM*9R3UC0N93 440 TERRY A`
- desc: `AMAZON.COM*N428X9Q71 440 TERRY A`
- desc: `AMAZON MARK* B008Z3VV0410 TERRY`
- desc: `AMAZON MARK* B03Y156K1410 TERRY`
- desc: `AMAZON MARK* B204T9M31410 TERRY`
- accts: Amazon, Amazon Mark* B008z3vv0, Amazon Mark* B00sf6vv0, Amazon Mark* B00sf6vv0410 Terry Avenue North Seattle 98109 Wa Usa (return), Amazon Mark* B03y156k1, Amazon Mark* B204t9m31
- **?** (conf 0.4, weight 56, 0 accts, 56 stmt) guess=`University Of Pittsburgh|Pitt Parking Pay Stati127 North`
- desc: `PITT PARKING PAY STATI127 NORTH`
- **?** (conf 0.78, weight 37, 7 accts, 30 stmt) guess=`McDonald's`
- desc: `MCDONALDS 1862 3708 FORBES AVE P`
- desc: `MCDONALDS 1102 225 MOUNT LEBANON`
- desc: `MCDONALD'S F1862 3708 FORBES AVE`
- desc: `MCDONALD'S F1102 225 MT LEBANON`
- desc: `MCDONALDS 5834 2518 W LIBERTY RD`
- desc: `MCDONALD'S F27387 1412 B MAIN ST`
- accts: McDonald's, Mcdonald's F1102, Mcdonald's F1862, Mcdonald's F27387, Mcdonalds 1862, Mcdonalds 33234
- **?** (conf 1.0, weight 30, 0 accts, 30 stmt) guess=`Castle Shannon Shop`
- desc: `CASTLE SHANNON SHOP' 799 CASTLE`
- **?** (conf 0.71, weight 30, 2 accts, 28 stmt) guess=`Market District`
- desc: `MARKET DISTRICT #0014 7000 OXFOR`
- desc: `MARKET DISTRICT #0047 100 SETTLE`
- accts: Market District, Market District Supermarket
- **?** (conf 0.4, weight 18, 0 accts, 18 stmt) guess=`Apple Com Bill One Apple`
- desc: `APPLE.COM/BILL ONE APPLE PARK WA`
- desc: `APPLE.COM/US ONE APPLE PARK WAY`
- desc: `APPLE.COM/BILL ONE APPLE PARK CU`
- **?** (conf 0.47, weight 18, 8 accts, 10 stmt) guess=`Compeer`
- desc: `COMPEER-COMP-CP WEB PMTS ACH WEB COMPEER-COMP-CP WEB PMTS ACH WEB C5R6`
- desc: `COMPEER-COMP-CP WEB PMTS ACH WEB COMPEER-COMP-CP WEB PMTS ACH WEB MD64`
- desc: `COMPEER-COMP-CP WEB PMTS ACH WEB COMPEER-COMP-CP WEB PMTS ACH WEB 3Y6Q`
- desc: `COMPEER-COMP-CP WEB PMTS ACH WEB COMPEER-COMP-CP WEB PMTS ACH WEB R34S`
- desc: `COMPEER-COMP-CP WEB PMTS ACH WEB COMPEER-COMP-CP WEB PMTS ACH WEB D9FZ`
- desc: `COMPEER-COMP-CP WEB PMTS ACH WEB COMPEER-COMP-CP WEB PMTS ACH WEB F394`
- accts: COMPEER-COMP-CP WEB PMTS ACH WEB 3Y6QDL, COMPEER-COMP-CP WEB PMTS ACH WEB 7Y648K, COMPEER-COMP-CP WEB PMTS ACH WEB D9FZ0L, COMPEER-COMP-CP WEB PMTS ACH WEB F394TK, COMPEER-COMP-CP WEB PMTS ACH WEB JS0NNK, COMPEER-COMP-CP WEB PMTS ACH WEB K7TDFK
- **?** (conf 0.4, weight 18, 0 accts, 18 stmt) guess=`Sq *La Gourmandine Oak116 Meyran`
- desc: `SQ *LA GOURMANDINE OAK116 MEYRAN`
- **?** (conf 1.0, weight 17, 0 accts, 17 stmt) guess=`Kuhns Banksville`
- desc: `KUHNS BANKSVILLE 3125 BANKSVILLE`
- **?** (conf 0.75, weight 13, 4 accts, 9 stmt) guess=`Starbucks`
- desc: `STARBUCKS STORE 27117 4022 FIFTH`
- desc: `STARBUCKS 27117 4022 5TH AVE PIT`
- desc: `STARBUCKS 8007827282 2401 UTAH A`
- accts: Starbucks, Starbucks 27117, Starbucks 8007827282, Starbucks Store 27117
- **?** (conf 0.4, weight 11, 0 accts, 11 stmt) guess=`Claude Ai Subscription548 Market`
- desc: `CLAUDE.AI SUBSCRIPTION548 MARKET`
- **?** (conf 0.61, weight 11, 2 accts, 9 stmt) guess=`Duquesne Light`
- desc: `DUQUESNE LIGHT PAYMENT ACH DEBIT DUQUESNE LIGHT PAYMENT ACH DEBIT xxxx`
- accts: DUQUESNE LIGHT PAYMENT ACH DEBIT xxxxxx5333, Duquesne Light
- **?** (conf 0.4, weight 11, 1 accts, 10 stmt) guess=`T2`
- desc: `T2* MT LEBANON PA 8900 KEYSTONE`
- accts: T2* Mt Lebanon Pa
- **?** (conf 1.0, weight 10, 1 accts, 9 stmt) guess=`Comcast / Xfinity`
- desc: `COMCAST / XFINITY 15 SUMMIT PARK`
- accts: Comcast / Xfinity
- **?** (conf 1.0, weight 10, 0 accts, 10 stmt) guess=`Interest Payment Interest Payment`
- desc: `INTEREST PAYMENT INTEREST PAYMENT`
- **?** (conf 0.4, weight 10, 0 accts, 10 stmt) guess=`Upmc Student Insurance600 Grant`
- desc: `UPMC STUDENT INSURANCE600 GRANT`
- **?** (conf 0.4, weight 9, 0 accts, 9 stmt) guess=`Applecard Gsbank Payment Ach Web`
- desc: `APPLECARD GSBANK PAYMENT ACH WEB APPLECARD GSBANK PAYMENT ACH WEB-RECU`
- desc: `APPLECARD GSBANK PAYMENT ACH WEB APPLECARD GSBANK PAYMENT ACH WEB xxxx`
- **?** (conf 0.4, weight 9, 0 accts, 9 stmt) guess=`Citi Autopay Payment Ach Web`
- desc: `CITI AUTOPAY PAYMENT ACH WEB-REC CITI AUTOPAY PAYMENT ACH WEB-RECUR xx`
- desc: `CITI AUTOPAY PAYMENT ACH WEB-REC CITI AUTOPAY PAYMENT ACH WEB-RECUR xx`
- desc: `CITI AUTOPAY PAYMENT ACH WEB-REC CITI AUTOPAY PAYMENT ACH WEB-RECUR xx`
- desc: `CITI AUTOPAY PAYMENT ACH WEB-REC CITI AUTOPAY PAYMENT ACH WEB-RECUR xx`
- desc: `CITI AUTOPAY PAYMENT ACH WEB-REC CITI AUTOPAY PAYMENT ACH WEB-RECUR xx`
- desc: `CITI AUTOPAY PAYMENT ACH WEB-REC CITI AUTOPAY PAYMENT ACH WEB-RECUR xx`
- **?** (conf 1.0, weight 9, 0 accts, 9 stmt) guess=`Daily Cash Adjustment`
- desc: `DAILY CASH ADJUSTMENT`
- **?** (conf 1.0, weight 7, 0 accts, 7 stmt) guess=`Ebay O`
- desc: `EBAY O*07-14287-66191 2535 NORTH`
- desc: `EBAY O*07-14287-66190 2535 NORTH`
- desc: `EBAY O*07-14287-66189 2535 NORTH`
- desc: `EBAY O*07-14287-66188 2535 NORTH`
- desc: `EBAY O*07-14287-66187 2535 NORTH`
- desc: `EBAY O*07-14287-66186 2535 NORTH`
- **?** (conf 0.9, weight 7, 1 accts, 6 stmt) guess=`Needle & Bean`
- desc: `SQ *NEEDLE &amp; BEAN 320 CASTLE`
- accts: Needle & Bean
- **?** (conf 0.4, weight 7, 0 accts, 7 stmt) guess=`University Of Pittsburgh|Univ Pittsburgh Salary Ach Credi`
- desc: `UNIV PITTSBURGH SALARY ACH CREDI UNIV PITTSBURGH SALARY ACH CREDIT xx0`
- **?** (conf 1.0, weight 7, 0 accts, 7 stmt) guess=`Youtube Tv`
- desc: `GOOGLE *YOUTUBE TV 1600 AMPHITHE`
- **?** (conf 0.62, weight 6, 2 accts, 4 stmt) guess=`Liberty Mutual`
- desc: `LIBERTY MUTUAL 175 BERKELEY ST 8`
- desc: `LIBERTY MUTUAL ATTN: COURTNEY MU`
- accts: Liberty Mutual
- **?** (conf 0.53, weight 6, 2 accts, 4 stmt) guess=`Openai`
- desc: `OPENAI *CHATGPT SUBSCR548 MARKET`
- desc: `OPENAI 1455 3RD STREET SAN FRANC`
- accts: Openai, Openai *chatgpt Subscr
- **?** (conf 0.4, weight 6, 0 accts, 6 stmt) guess=`Spo P&Amp Gspamelasdiner3703 F`
- desc: `SPO*P&amp;G'SPAMELA'SDINER3703 F`
- **?** (conf 1.0, weight 6, 2 accts, 4 stmt) guess=`Svdp Castle Shannon`
- desc: `SVDP CASTLE SHANNON 3423 LIBRARY`
- accts: SVDP Castle Shannon, Svdp Castle Shannon
- **?** (conf 0.4, weight 5, 0 accts, 5 stmt) guess=`Bp 9604786Ukani Broqps2900 Banks`
- desc: `BP#9604786UKANI BROQPS2900 BANKS`
- **?** (conf 0.4, weight 5, 2 accts, 3 stmt) guess=`Capital One Transfer Ach Web`
- desc: `CAPITAL ONE TRANSFER ACH WEB RT0 CAPITAL ONE TRANSFER ACH WEB RT0D854F`
- desc: `CAPITAL ONE TRANSFER ACH WEB PAY CAPITAL ONE TRANSFER ACH WEB PAYMENT `
- desc: `CAPITAL ONE TRANSFER ACH WEB PAY CAPITAL ONE TRANSFER ACH WEB PAYMENT `
- accts: CAPITAL ONE TRANSFER ACH WEB PAYMENT RT04E16C0EA8E68, CAPITAL ONE TRANSFER ACH WEB PAYMENT RT097FE1F911EB7
- **?** (conf 0.69, weight 5, 1 accts, 4 stmt) guess=`Peacock`
- desc: `PEACOCK 75AE1 PREMIUM 30 ROCKEFE`
- desc: `PEACOCK 81D06 PREMIUM 30 ROCKEFE`
- desc: `PEACOCK EF701 PREMIUM 30 ROCKEFE`
- desc: `PEACOCK X6258 PREMIUM 30 ROCKEFE`
- accts: Peacock
- **?** (conf 0.4, weight 5, 0 accts, 5 stmt) guess=`Spiegel Freedman Psych105 Braunl`
- desc: `SPIEGEL FREEDMAN PSYCH105 BRAUNL`
- **?** (conf 0.4, weight 5, 0 accts, 5 stmt) guess=`University Of Pittsburgh|Rnk Pittsburgh P3610 Forbe`
- desc: `TST*RNK PITTSBURGH - P3610 FORBE`
- **?** (conf 1.0, weight 5, 1 accts, 4 stmt) guess=`Www Costco Com`
- desc: `WWW COSTCO COM 800-955-2292`
- accts: WWW COSTCO COM 800-955-2292 WA
- **?** (conf 0.4, weight 4, 0 accts, 4 stmt) guess=`Dave And Andy S Ho207`
- desc: `SQ *DAVE AND ANDY S HO207 ATWOOD`
- **?** (conf 0.4, weight 4, 0 accts, 4 stmt) guess=`Enricos Tazza Do125 Lytton`
- desc: `SQ *ENRICO'S TAZZA D'O125 LYTTON`
- **?** (conf 0.4, weight 4, 0 accts, 4 stmt) guess=`Hofbrauhaus Pittsburgh2705 S Wat`
- desc: `HOFBRAUHAUS PITTSBURGH2705 S WAT`
- **?** (conf 0.65, weight 4, 1 accts, 3 stmt) guess=`Luis Benitez`
- desc: `ZEL FROM Luis Benitez ZEL FROM Luis Benitez`
- accts: Luis Benitez
- **?** (conf 0.4, weight 4, 2 accts, 2 stmt) guess=`Pitt Tuition Pittpaymnt Ach Web`
- desc: `PITT TUITION PITTPAYMNT ACH WEB PITT TUITION PITTPAYMNT ACH WEB OPUxxx`
- desc: `PITT TUITION PITTPAYMNT ACH WEB PITT TUITION PITTPAYMNT ACH WEB OPUxxx`
- accts: PITT TUITION PITTPAYMNT ACH WEB OPUxxxx0412, PITT TUITION PITTPAYMNT ACH WEB OPUxxxx9683
- **?** (conf 0.4, weight 4, 0 accts, 4 stmt) guess=`Schwab Brokerage Moneylink Ach W`
- desc: `SCHWAB BROKERAGE MONEYLINK ACH C SCHWAB BROKERAGE MONEYLINK ACH CREDIT`
- desc: `SCHWAB BROKERAGE MONEYLINK ACH C SCHWAB BROKERAGE MONEYLINK ACH CREDIT`
- desc: `SCHWAB BROKERAGE MONEYLINK ACH D SCHWAB BROKERAGE MONEYLINK ACH DEBIT `
- desc: `SCHWAB BROKERAGE MONEYLINK ACH W SCHWAB BROKERAGE MONEYLINK ACH WEB-RE`
- **?** (conf 1.0, weight 4, 1 accts, 3 stmt) guess=`Subaru Of South Hills`
- desc: `SUBARU OF SOUTH HILLS 3260 WASHI`
- accts: Subaru Of South Hills
## AUTO-PROPOSABLE (top 40 by volume)
- `GomobilePGH` (conf 1.0, weight 49, merges 4 accts) ids=[865, 642, 559, 781]
- `Sheetz` (conf 1.0, weight 43, merges 7 accts) ids=[566, 744, 739, 567, 774, 794, 738]
- `Autozone` (conf 1.0, weight 27, merges 6 accts) ids=[593, 812, 724, 714, 591, 806]
- `Sunoco` (conf 1.0, weight 27, merges 6 accts) ids=[599, 638, 827, 767, 820, 715]
- `Costco Whse` (conf 1.0, weight 22, merges 2 accts) ids=[842, 836]
- `Harbor Freight Tools` (conf 0.95, weight 18, merges 3 accts) ids=[878, 569, 737]
- `Petco` (conf 1.0, weight 15, merges 4 accts) ids=[546, 729, 797, 633]
- `Chick-fil-A` (conf 1.0, weight 14, merges 5 accts) ids=[630, 810, 832, 712, 702]
- `Costco Gas` (conf 1.0, weight 14, merges 2 accts) ids=[840, 837]
- `D J*wsj` (conf 1.0, weight 10, merges 1 accts) ids=[553]
- `Rockauto` (conf 0.94, weight 10, merges 1 accts) ids=[557]
- `University Club` (conf 0.86, weight 10, merges 2 accts) ids=[867, 637]
- `Chikn Oakland` (conf 1.0, weight 9, merges 1 accts) ids=[558]
- `Raising Cane's` (conf 1.0, weight 9, merges 3 accts) ids=[868, 561, 828]
- `Barnes & Noble` (conf 0.9, weight 7, merges 3 accts) ids=[603, 817, 658]
- `Lowe's` (conf 1.0, weight 7, merges 1 accts) ids=[673]
- `PMUSA` (conf 1.0, weight 7, merges 2 accts) ids=[885, 614]
- `Home Depot` (conf 0.83, weight 6, merges 1 accts) ids=[722]
- `REI` (conf 1.0, weight 6, merges 2 accts) ids=[684, 682]
- `Target` (conf 1.0, weight 6, merges 2 accts) ids=[605, 731]
- `The Saloon Of` (conf 0.82, weight 6, merges 2 accts) ids=[847, 801]
- `Best Buy` (conf 1.0, weight 5, merges 2 accts) ids=[751, 740]
- `Check` (conf 1.0, weight 5, merges 1 accts) ids=[524]
- `Expedia` (conf 1.0, weight 5, merges 2 accts) ids=[717, 711]
- `Michaels Stores` (conf 1.0, weight 5, merges 2 accts) ids=[587, 664]
- `Rita's` (conf 1.0, weight 5, merges 1 accts) ids=[882]
- `Als Corner` (conf 1.0, weight 4, merges 1 accts) ids=[762]
- `CVS Pharmacy` (conf 1.0, weight 4, merges 2 accts) ids=[783, 816]
- `Dunkin` (conf 1.0, weight 4, merges 2 accts) ids=[655, 846]
- `Five Guys` (conf 1.0, weight 4, merges 1 accts) ids=[723]
- `Redhawk Coffee` (conf 1.0, weight 4, merges 1 accts) ids=[721]
- `Sportsmans Warehouse` (conf 1.0, weight 4, merges 1 accts) ids=[568]
- `Taco Bell` (conf 1.0, weight 4, merges 2 accts) ids=[686, 691]
- `TNT Pizza` (conf 1.0, weight 4, merges 1 accts) ids=[624]
- `Act Cntyalleghenyprk` (conf 1.0, weight 3, merges 1 accts) ids=[776]
- `Butterjoint` (conf 1.0, weight 3, merges 1 accts) ids=[608]
- `Ctlp*csc Serviceworks` (conf 1.0, weight 3, merges 1 accts) ids=[650]
- `Fiori's Pizzaria` (conf 0.91, weight 3, merges 1 accts) ids=[551]
- `Get Go` (conf 1.0, weight 3, merges 1 accts) ids=[718]
- `Giant Eagle` (conf 1.0, weight 3, merges 1 accts) ids=[592]

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long