finances/migration/rebuild_clusters.md
Dane Sabo 26fb19ca9a Migration runbook + rebuild tooling; 10 PNC/income/Don't Know rules
- migration/README.md: cold-start rebuild runbook (reconciliation gate,
  classification rules, transfer pairing, investment policy, execution order)
- migration/build_rebuild_dataset.py: consolidated 3-QFX builder with PNC-
  owned transfers, counterpart pairing & drop, per-account reconciliation
- migration/rebuild_clusters.{json,md}: clustering proposal for the rebuild
- migration/rebuild_review.html: read-only browser review for the 1017-txn
  rebuild plan (transfers under PNC, category fixes baked in)
- migration/{pnc_review,review_preview_mixed}.html: earlier UI previews
- merchant_map.json: add 10 settled deterministic rules (Duquesne Light,
  Pitt Salary, Interest Payment, IRS, Pitt Tuition, Daily Cash Adjustment,
  ATM Surcharge/Yardi/Venmo/Zelle->Don't Know) so the skill stops flagging
  pre-classified PNC lines as UNMATCHED
2026-05-25 18:54:50 -04:00

193 lines
12 KiB
Markdown

# Merchant cluster proposal
- 386 clusters from 372 accounts + 1017 statement txns
- **142** auto-proposable (>=0.80, clean canonical)
- **244** NEED DANE (ambiguous / junky canonical / new merchant)
## NEEDS DANE (top 40 by volume)
_For each: what is the real merchant? You can type a name; it becomes a permanent rule._
- **?** (conf 0.57, weight 75, 28 accts, 47 stmt) guess=`Amazon`
- desc: `AMAZON MARK* B00SF6VV0410 TERRY`
- desc: `AMAZON.COM*9R3UC0N93 440 TERRY A`
- desc: `AMAZON.COM*N428X9Q71 440 TERRY A`
- desc: `AMAZON MARK* B008Z3VV0410 TERRY`
- desc: `AMAZON MARK* B03Y156K1410 TERRY`
- desc: `AMAZON MARK* B204T9M31410 TERRY`
- accts: Amazon, Amazon Mark* B008z3vv0, Amazon Mark* B00sf6vv0, Amazon Mark* B00sf6vv0410 Terry Avenue North Seattle 98109 Wa Usa (return), Amazon Mark* B03y156k1, Amazon Mark* B204t9m31
- **?** (conf 0.4, weight 56, 0 accts, 56 stmt) guess=`University Of Pittsburgh|Pitt Parking Pay Stati127 North`
- desc: `PITT PARKING PAY STATI127 NORTH`
- **?** (conf 0.78, weight 37, 7 accts, 30 stmt) guess=`McDonald's`
- desc: `MCDONALDS 1862 3708 FORBES AVE P`
- desc: `MCDONALDS 1102 225 MOUNT LEBANON`
- desc: `MCDONALD'S F1862 3708 FORBES AVE`
- desc: `MCDONALD'S F1102 225 MT LEBANON`
- desc: `MCDONALDS 5834 2518 W LIBERTY RD`
- desc: `MCDONALD'S F27387 1412 B MAIN ST`
- accts: McDonald's, Mcdonald's F1102, Mcdonald's F1862, Mcdonald's F27387, Mcdonalds 1862, Mcdonalds 33234
- **?** (conf 1.0, weight 30, 0 accts, 30 stmt) guess=`Castle Shannon Shop`
- desc: `CASTLE SHANNON SHOP' 799 CASTLE`
- **?** (conf 0.71, weight 30, 2 accts, 28 stmt) guess=`Market District`
- desc: `MARKET DISTRICT #0014 7000 OXFOR`
- desc: `MARKET DISTRICT #0047 100 SETTLE`
- accts: Market District, Market District Supermarket
- **?** (conf 0.4, weight 18, 0 accts, 18 stmt) guess=`Apple Com Bill One Apple`
- desc: `APPLE.COM/BILL ONE APPLE PARK WA`
- desc: `APPLE.COM/US ONE APPLE PARK WAY`
- desc: `APPLE.COM/BILL ONE APPLE PARK CU`
- **?** (conf 0.47, weight 18, 8 accts, 10 stmt) guess=`Compeer`
- desc: `COMPEER-COMP-CP WEB PMTS ACH WEB COMPEER-COMP-CP WEB PMTS ACH WEB C5R6`
- desc: `COMPEER-COMP-CP WEB PMTS ACH WEB COMPEER-COMP-CP WEB PMTS ACH WEB MD64`
- desc: `COMPEER-COMP-CP WEB PMTS ACH WEB COMPEER-COMP-CP WEB PMTS ACH WEB 3Y6Q`
- desc: `COMPEER-COMP-CP WEB PMTS ACH WEB COMPEER-COMP-CP WEB PMTS ACH WEB R34S`
- desc: `COMPEER-COMP-CP WEB PMTS ACH WEB COMPEER-COMP-CP WEB PMTS ACH WEB D9FZ`
- desc: `COMPEER-COMP-CP WEB PMTS ACH WEB COMPEER-COMP-CP WEB PMTS ACH WEB F394`
- accts: COMPEER-COMP-CP WEB PMTS ACH WEB 3Y6QDL, COMPEER-COMP-CP WEB PMTS ACH WEB 7Y648K, COMPEER-COMP-CP WEB PMTS ACH WEB D9FZ0L, COMPEER-COMP-CP WEB PMTS ACH WEB F394TK, COMPEER-COMP-CP WEB PMTS ACH WEB JS0NNK, COMPEER-COMP-CP WEB PMTS ACH WEB K7TDFK
- **?** (conf 0.4, weight 18, 0 accts, 18 stmt) guess=`Sq *La Gourmandine Oak116 Meyran`
- desc: `SQ *LA GOURMANDINE OAK116 MEYRAN`
- **?** (conf 1.0, weight 17, 0 accts, 17 stmt) guess=`Kuhns Banksville`
- desc: `KUHNS BANKSVILLE 3125 BANKSVILLE`
- **?** (conf 0.75, weight 13, 4 accts, 9 stmt) guess=`Starbucks`
- desc: `STARBUCKS STORE 27117 4022 FIFTH`
- desc: `STARBUCKS 27117 4022 5TH AVE PIT`
- desc: `STARBUCKS 8007827282 2401 UTAH A`
- accts: Starbucks, Starbucks 27117, Starbucks 8007827282, Starbucks Store 27117
- **?** (conf 0.4, weight 11, 0 accts, 11 stmt) guess=`Claude Ai Subscription548 Market`
- desc: `CLAUDE.AI SUBSCRIPTION548 MARKET`
- **?** (conf 0.61, weight 11, 2 accts, 9 stmt) guess=`Duquesne Light`
- desc: `DUQUESNE LIGHT PAYMENT ACH DEBIT DUQUESNE LIGHT PAYMENT ACH DEBIT xxxx`
- accts: DUQUESNE LIGHT PAYMENT ACH DEBIT xxxxxx5333, Duquesne Light
- **?** (conf 0.4, weight 11, 1 accts, 10 stmt) guess=`T2`
- desc: `T2* MT LEBANON PA 8900 KEYSTONE`
- accts: T2* Mt Lebanon Pa
- **?** (conf 1.0, weight 10, 1 accts, 9 stmt) guess=`Comcast / Xfinity`
- desc: `COMCAST / XFINITY 15 SUMMIT PARK`
- accts: Comcast / Xfinity
- **?** (conf 1.0, weight 10, 0 accts, 10 stmt) guess=`Interest Payment Interest Payment`
- desc: `INTEREST PAYMENT INTEREST PAYMENT`
- **?** (conf 0.4, weight 10, 0 accts, 10 stmt) guess=`Upmc Student Insurance600 Grant`
- desc: `UPMC STUDENT INSURANCE600 GRANT`
- **?** (conf 0.4, weight 9, 0 accts, 9 stmt) guess=`Applecard Gsbank Payment Ach Web`
- desc: `APPLECARD GSBANK PAYMENT ACH WEB APPLECARD GSBANK PAYMENT ACH WEB-RECU`
- desc: `APPLECARD GSBANK PAYMENT ACH WEB APPLECARD GSBANK PAYMENT ACH WEB xxxx`
- **?** (conf 0.4, weight 9, 0 accts, 9 stmt) guess=`Citi Autopay Payment Ach Web`
- desc: `CITI AUTOPAY PAYMENT ACH WEB-REC CITI AUTOPAY PAYMENT ACH WEB-RECUR xx`
- desc: `CITI AUTOPAY PAYMENT ACH WEB-REC CITI AUTOPAY PAYMENT ACH WEB-RECUR xx`
- desc: `CITI AUTOPAY PAYMENT ACH WEB-REC CITI AUTOPAY PAYMENT ACH WEB-RECUR xx`
- desc: `CITI AUTOPAY PAYMENT ACH WEB-REC CITI AUTOPAY PAYMENT ACH WEB-RECUR xx`
- desc: `CITI AUTOPAY PAYMENT ACH WEB-REC CITI AUTOPAY PAYMENT ACH WEB-RECUR xx`
- desc: `CITI AUTOPAY PAYMENT ACH WEB-REC CITI AUTOPAY PAYMENT ACH WEB-RECUR xx`
- **?** (conf 1.0, weight 9, 0 accts, 9 stmt) guess=`Daily Cash Adjustment`
- desc: `DAILY CASH ADJUSTMENT`
- **?** (conf 1.0, weight 7, 0 accts, 7 stmt) guess=`Ebay O`
- desc: `EBAY O*07-14287-66191 2535 NORTH`
- desc: `EBAY O*07-14287-66190 2535 NORTH`
- desc: `EBAY O*07-14287-66189 2535 NORTH`
- desc: `EBAY O*07-14287-66188 2535 NORTH`
- desc: `EBAY O*07-14287-66187 2535 NORTH`
- desc: `EBAY O*07-14287-66186 2535 NORTH`
- **?** (conf 0.9, weight 7, 1 accts, 6 stmt) guess=`Needle & Bean`
- desc: `SQ *NEEDLE & BEAN 320 CASTLE`
- accts: Needle & Bean
- **?** (conf 0.4, weight 7, 0 accts, 7 stmt) guess=`University Of Pittsburgh|Univ Pittsburgh Salary Ach Credi`
- desc: `UNIV PITTSBURGH SALARY ACH CREDI UNIV PITTSBURGH SALARY ACH CREDIT xx0`
- **?** (conf 1.0, weight 7, 0 accts, 7 stmt) guess=`Youtube Tv`
- desc: `GOOGLE *YOUTUBE TV 1600 AMPHITHE`
- **?** (conf 0.62, weight 6, 2 accts, 4 stmt) guess=`Liberty Mutual`
- desc: `LIBERTY MUTUAL 175 BERKELEY ST 8`
- desc: `LIBERTY MUTUAL ATTN: COURTNEY MU`
- accts: Liberty Mutual
- **?** (conf 0.53, weight 6, 2 accts, 4 stmt) guess=`Openai`
- desc: `OPENAI *CHATGPT SUBSCR548 MARKET`
- desc: `OPENAI 1455 3RD STREET SAN FRANC`
- accts: Openai, Openai *chatgpt Subscr
- **?** (conf 0.4, weight 6, 0 accts, 6 stmt) guess=`Spo P&Amp Gspamelasdiner3703 F`
- desc: `SPO*P&G'SPAMELA'SDINER3703 F`
- **?** (conf 1.0, weight 6, 2 accts, 4 stmt) guess=`Svdp Castle Shannon`
- desc: `SVDP CASTLE SHANNON 3423 LIBRARY`
- accts: SVDP Castle Shannon, Svdp Castle Shannon
- **?** (conf 0.4, weight 5, 0 accts, 5 stmt) guess=`Bp 9604786Ukani Broqps2900 Banks`
- desc: `BP#9604786UKANI BROQPS2900 BANKS`
- **?** (conf 0.4, weight 5, 2 accts, 3 stmt) guess=`Capital One Transfer Ach Web`
- desc: `CAPITAL ONE TRANSFER ACH WEB RT0 CAPITAL ONE TRANSFER ACH WEB RT0D854F`
- desc: `CAPITAL ONE TRANSFER ACH WEB PAY CAPITAL ONE TRANSFER ACH WEB PAYMENT `
- desc: `CAPITAL ONE TRANSFER ACH WEB PAY CAPITAL ONE TRANSFER ACH WEB PAYMENT `
- accts: CAPITAL ONE TRANSFER ACH WEB PAYMENT RT04E16C0EA8E68, CAPITAL ONE TRANSFER ACH WEB PAYMENT RT097FE1F911EB7
- **?** (conf 0.69, weight 5, 1 accts, 4 stmt) guess=`Peacock`
- desc: `PEACOCK 75AE1 PREMIUM 30 ROCKEFE`
- desc: `PEACOCK 81D06 PREMIUM 30 ROCKEFE`
- desc: `PEACOCK EF701 PREMIUM 30 ROCKEFE`
- desc: `PEACOCK X6258 PREMIUM 30 ROCKEFE`
- accts: Peacock
- **?** (conf 0.4, weight 5, 0 accts, 5 stmt) guess=`Spiegel Freedman Psych105 Braunl`
- desc: `SPIEGEL FREEDMAN PSYCH105 BRAUNL`
- **?** (conf 0.4, weight 5, 0 accts, 5 stmt) guess=`University Of Pittsburgh|Rnk Pittsburgh P3610 Forbe`
- desc: `TST*RNK PITTSBURGH - P3610 FORBE`
- **?** (conf 1.0, weight 5, 1 accts, 4 stmt) guess=`Www Costco Com`
- desc: `WWW COSTCO COM 800-955-2292`
- accts: WWW COSTCO COM 800-955-2292 WA
- **?** (conf 0.4, weight 4, 0 accts, 4 stmt) guess=`Dave And Andy S Ho207`
- desc: `SQ *DAVE AND ANDY S HO207 ATWOOD`
- **?** (conf 0.4, weight 4, 0 accts, 4 stmt) guess=`Enricos Tazza Do125 Lytton`
- desc: `SQ *ENRICO'S TAZZA D'O125 LYTTON`
- **?** (conf 0.4, weight 4, 0 accts, 4 stmt) guess=`Hofbrauhaus Pittsburgh2705 S Wat`
- desc: `HOFBRAUHAUS PITTSBURGH2705 S WAT`
- **?** (conf 0.65, weight 4, 1 accts, 3 stmt) guess=`Luis Benitez`
- desc: `ZEL FROM Luis Benitez ZEL FROM Luis Benitez`
- accts: Luis Benitez
- **?** (conf 0.4, weight 4, 2 accts, 2 stmt) guess=`Pitt Tuition Pittpaymnt Ach Web`
- desc: `PITT TUITION PITTPAYMNT ACH WEB PITT TUITION PITTPAYMNT ACH WEB OPUxxx`
- desc: `PITT TUITION PITTPAYMNT ACH WEB PITT TUITION PITTPAYMNT ACH WEB OPUxxx`
- accts: PITT TUITION PITTPAYMNT ACH WEB OPUxxxx0412, PITT TUITION PITTPAYMNT ACH WEB OPUxxxx9683
- **?** (conf 0.4, weight 4, 0 accts, 4 stmt) guess=`Schwab Brokerage Moneylink Ach W`
- desc: `SCHWAB BROKERAGE MONEYLINK ACH C SCHWAB BROKERAGE MONEYLINK ACH CREDIT`
- desc: `SCHWAB BROKERAGE MONEYLINK ACH C SCHWAB BROKERAGE MONEYLINK ACH CREDIT`
- desc: `SCHWAB BROKERAGE MONEYLINK ACH D SCHWAB BROKERAGE MONEYLINK ACH DEBIT `
- desc: `SCHWAB BROKERAGE MONEYLINK ACH W SCHWAB BROKERAGE MONEYLINK ACH WEB-RE`
- **?** (conf 1.0, weight 4, 1 accts, 3 stmt) guess=`Subaru Of South Hills`
- desc: `SUBARU OF SOUTH HILLS 3260 WASHI`
- accts: Subaru Of South Hills
## AUTO-PROPOSABLE (top 40 by volume)
- `GomobilePGH` (conf 1.0, weight 49, merges 4 accts) ids=[865, 642, 559, 781]
- `Sheetz` (conf 1.0, weight 43, merges 7 accts) ids=[566, 744, 739, 567, 774, 794, 738]
- `Autozone` (conf 1.0, weight 27, merges 6 accts) ids=[593, 812, 724, 714, 591, 806]
- `Sunoco` (conf 1.0, weight 27, merges 6 accts) ids=[599, 638, 827, 767, 820, 715]
- `Costco Whse` (conf 1.0, weight 22, merges 2 accts) ids=[842, 836]
- `Harbor Freight Tools` (conf 0.95, weight 18, merges 3 accts) ids=[878, 569, 737]
- `Petco` (conf 1.0, weight 15, merges 4 accts) ids=[546, 729, 797, 633]
- `Chick-fil-A` (conf 1.0, weight 14, merges 5 accts) ids=[630, 810, 832, 712, 702]
- `Costco Gas` (conf 1.0, weight 14, merges 2 accts) ids=[840, 837]
- `D J*wsj` (conf 1.0, weight 10, merges 1 accts) ids=[553]
- `Rockauto` (conf 0.94, weight 10, merges 1 accts) ids=[557]
- `University Club` (conf 0.86, weight 10, merges 2 accts) ids=[867, 637]
- `Chikn Oakland` (conf 1.0, weight 9, merges 1 accts) ids=[558]
- `Raising Cane's` (conf 1.0, weight 9, merges 3 accts) ids=[868, 561, 828]
- `Barnes & Noble` (conf 0.9, weight 7, merges 3 accts) ids=[603, 817, 658]
- `Lowe's` (conf 1.0, weight 7, merges 1 accts) ids=[673]
- `PMUSA` (conf 1.0, weight 7, merges 2 accts) ids=[885, 614]
- `Home Depot` (conf 0.83, weight 6, merges 1 accts) ids=[722]
- `REI` (conf 1.0, weight 6, merges 2 accts) ids=[684, 682]
- `Target` (conf 1.0, weight 6, merges 2 accts) ids=[605, 731]
- `The Saloon Of` (conf 0.82, weight 6, merges 2 accts) ids=[847, 801]
- `Best Buy` (conf 1.0, weight 5, merges 2 accts) ids=[751, 740]
- `Check` (conf 1.0, weight 5, merges 1 accts) ids=[524]
- `Expedia` (conf 1.0, weight 5, merges 2 accts) ids=[717, 711]
- `Michaels Stores` (conf 1.0, weight 5, merges 2 accts) ids=[587, 664]
- `Rita's` (conf 1.0, weight 5, merges 1 accts) ids=[882]
- `Als Corner` (conf 1.0, weight 4, merges 1 accts) ids=[762]
- `CVS Pharmacy` (conf 1.0, weight 4, merges 2 accts) ids=[783, 816]
- `Dunkin` (conf 1.0, weight 4, merges 2 accts) ids=[655, 846]
- `Five Guys` (conf 1.0, weight 4, merges 1 accts) ids=[723]
- `Redhawk Coffee` (conf 1.0, weight 4, merges 1 accts) ids=[721]
- `Sportsmans Warehouse` (conf 1.0, weight 4, merges 1 accts) ids=[568]
- `Taco Bell` (conf 1.0, weight 4, merges 2 accts) ids=[686, 691]
- `TNT Pizza` (conf 1.0, weight 4, merges 1 accts) ids=[624]
- `Act Cntyalleghenyprk` (conf 1.0, weight 3, merges 1 accts) ids=[776]
- `Butterjoint` (conf 1.0, weight 3, merges 1 accts) ids=[608]
- `Ctlp*csc Serviceworks` (conf 1.0, weight 3, merges 1 accts) ids=[650]
- `Fiori's Pizzaria` (conf 0.91, weight 3, merges 1 accts) ids=[551]
- `Get Go` (conf 1.0, weight 3, merges 1 accts) ids=[718]
- `Giant Eagle` (conf 1.0, weight 3, merges 1 accts) ids=[592]