Building a Tax Robot
“In this world nothing is certain except death and taxes.” — Benjamin Franklin
- Working at Quantium processing millions of bank transactions daily, when my accountant wanted $300 to do what looked like… basic arithmetic.
“Why the fuck would I pay a tax agent when all they’re doing is being a glorified calculator?”
Famous last words from an engineer about to spend six years building, abandoning, and resurrecting a tax automation system.
The Original Sin
The logic seemed bulletproof: I work with transaction data professionally. Tax deductions are just categorization and multiplication. Build once, use forever.
So I built it. Python pipeline: ingest CSVs → categorize transactions → calculate deductions.
First year was magical. Then 2020 happened. And 2021. And 2022.
Each year: dust off code, something’s broken, hack a fix, promise to clean up “next time.” Classic technical debt spiral. By 2023, I’d given up. Classic engineer trap: solving a $300 problem with $3000 worth of development time.
The Resurrection
Fast forward to 2025. I’m engaged. Joint finances. The $300 problem became $600.
The code was rough—inconsistent names, hard-coded paths, error handling strategy: # hope this doesn't break. But the architecture was sound.
What Actually Worked
The core design turned out to be surprisingly robust:
Rule-based categorization: rules/ folder with {category}.txt files. Keywords match transactions. Simple, explainable, auditable.
Persistent learning: “UBER” → transport, “BUNNINGS” → home office supplies, “STEAM” → questionable business expenses.
Transparent logic: Every deduction traces to specific transactions. Want to know why I claimed $2,847? Here’s the exact transaction list.
Multi-bank support: ANZ, CBA, Beem, Wise - different CSV formats, normalized to common schema.
Transfer detection: Same transaction appearing twice (debit from A, credit to B) gets excluded. No double-claiming.
The Numbers Don’t Lie
2023 results:
- Tyson: 653 transactions, 87% coverage
- Janice: 96% coverage
- Processing time: Under 10 seconds
- Audit ready: Complete transaction trail
Handles sophisticated scenarios: capital gains optimization, multi-person households, international transactions, duplicate detection.
Deduction logic examples:
home_office = 1.0 * home_office + 0.5 * home_stores + 0.4 * online_retail
work_related_car = 0.4 * vehicle + 0.3 * transport
Percentages based on legitimate business expense portions, applied consistently.
The Real Win
The smartest decision: easy labeling. Unknown transactions don’t fail - they tell you what’s missing and suggest categories.
This became our household financial analytics platform. Query spending patterns, track trends, optimize cash flow. Personal CFO that never forgets and never makes arithmetic mistakes.
Definitely spent more time building this than paying an accountant. But now I have something that scales and gets better every year.
The tax robot works. I never have to think about taxes again. Kinda.
The system correctly identified that my Steam purchases were 80% legitimate software development tools and 20% questionable leisure activities. If the ATO audits my gaming habits, I have the receipts.