Design Decisions
The decisions that aren't visible in the code — but determined everything about how the code was written. Each one was a genuine choice between two real options, made under constraint and with a specific outcome in mind.
This is the layer of a project that separates an implementer from a product thinker. Anyone can list a tech stack. The interesting question is why.
Should the primary interface be a terminal or a dashboard?
Terminal interface over dashboard
Dashboards optimise for passive observation — they're built for people who want to see what's happening. A command interface optimises for action — it's built for people who need to change what's happening. Enterprise supply chain modellers who actually resolve solver failures aren't browsing charts. They're diagnosing constraints, proposing fixes, and issuing approvals. The terminal collapses the distance between seeing the problem and fixing it. Every Action Card is rendered in the same stream where the command was issued — context and action are co-located, not split across panels.
A traditional dashboard (Figma-designed, React + chart library) was the obvious choice. A chat interface (conversational AI layer over a web app) was another option. Both were prototyped mentally. The dashboard felt passive — too much emphasis on visibility, not enough on resolution. The chat interface felt unconstrained — too open-ended, not enough affordance for structured operational decisions.
The terminal is unfamiliar to non-technical users. Discoverability requires intentional design (hence the slash-command palette). There's a learning curve that a dashboard wouldn't have. The bet is that the users who actually need this tool will find the terminal more productive once they know the commands — and that the slash palette bridges the gap for everyone else.
How should external data sources connect to the solver?
Manifest-driven ingestion over direct API integration
Direct API integration is fast to build and brittle in production. Every time a source system changes a field name, adds a column, or switches format, the integration silently breaks. The manifest pattern — where an AI infers the schema mapping, a human reviews it, and only the confirmed contract touches the database — makes every integration explicit, auditable, and correctable. The Schema Inferrer (Claude Sonnet) does the tedious column-mapping work. The human reviews and approves. The confirmed manifest becomes the record of what was agreed between the source system and the solver. This isn't just better engineering — it's better compliance. Every data load is documented before it happens.
Direct CSV-to-DB loading without schema inference (fast but fragile). A visual field-mapping UI (familiar but slow). REST API adapters per source system (scalable but requires maintenance per integration). The manifest approach borrows from ETL platform conventions but makes AI-assisted inference first-class rather than a bolt-on.
The manifest step adds latency to the first import (2–4 seconds for schema inference). It requires a confirmed manifest before the solver will run on new data — which is deliberate friction. The trade-off is that every integration is documented and every data load is preceded by a human decision. In an enterprise context, that friction is a feature.
Should the AI fix problems automatically, or wait for human approval?
Human-in-the-loop approval at every DB mutation
The most capable AI systems in enterprise workflows fail not because they're wrong, but because they're unaccountable. Autopilot AI can be right 95% of the time and still be unusable in regulated environments — because no one can explain who approved the change when the auditor asks. TX-1's design principle is simple: the agent proposes, the human decides. The Strategist reasons, dry-runs, and surfaces a card. The card shows the exact SQL that will run, the rationale behind it, and the expected outcome. Nothing touches the database until an explicit Approve signal is received from the user. The audit log records every decision with user_approved = true. The agent is genuinely useful — it does the diagnosis, validation, and proposal work that would otherwise take an hour. But accountability stays with the human.
Fully autonomous mode (agent executes immediately on high-confidence fixes). Suggested-only mode (agent surfaces insights but never proposes SQL). A confidence threshold model (autopilot below X% risk, approval above). All were considered. The threshold model is technically elegant but creates a false sense of security — the cases where you most need a human are exactly the cases where confidence scores are ambiguous.
The approval gate adds friction to the happy path. A fully automated system would be faster. The counter-argument is that in enterprise operations, 'faster' without accountability isn't a feature — it's a liability. The circuit breaker (halts after 3 failed fix attempts) handles the edge case where the agent loops without converging.
Should the system run locally or in the cloud?
Local execution via Tauri desktop app
Supply chain data is operationally sensitive. Inventory positions, demand forecasts, and capacity constraints are exactly the kind of data that enterprises are cautious about routing through third-party cloud infrastructure. Running the solver, the LLM calls, and the database locally means data stays on the machine unless the user explicitly exports it. Tauri (Rust + WebView) gives you a native desktop binary with a web-based UI — the best of both worlds. The sidecar pattern (FastAPI process spawned by Tauri) means the Python agent stack runs as a first-class citizen without requiring the user to manage a server. The result is a tool that feels like a desktop app, deploys like a desktop app, and keeps data on-device by default.
A web app deployed to Vercel (lowest friction for distribution, highest friction for data governance). An Electron app (heavier binary, weaker security model than Tauri's Rust core). A pure CLI tool (no UI layer, harder to surface Action Cards). Tauri was chosen over Electron specifically for the smaller binary size and Rust-based security boundaries.
Local execution means no multi-user collaboration out of the box. Updates require a binary release cycle rather than a server deploy. The LLM calls (Anthropic API) do leave the machine — this is a known and documented exception, and the system is designed so that only the specific rows involved in a constraint violation are sent to the model, never the full database.
One general-purpose agent, or specialised agents for each task?
Specialised agents with fixed roles over a general-purpose agent
A single general-purpose LLM handling classification, diagnosis, and proposal would be simpler to build and harder to trust. Specialisation enforces separation of concerns at the model level — the Dispatcher (Haiku) only classifies intent, the Inspector (Haiku) only validates data, the Strategist (Sonnet) only proposes fixes. Each agent has a narrow job and a narrow failure mode. When something goes wrong, you know which agent failed and why. The model choice follows the role: Haiku for fast, cheap, high-volume classification tasks; Sonnet for reasoning tasks where quality matters more than speed. The Archivist (ChromaDB + Haiku) adds institutional memory without requiring the Strategist to carry the full fix history in its context window.
A single Sonnet agent handling everything (simpler routing, higher cost, harder to debug). A fully orchestrated multi-agent swarm (more flexible, harder to reason about). LangGraph was chosen specifically because it makes the agent graph explicit and inspectable — you can see the topology, the routing conditions, and the state at each node.
Specialisation adds complexity to the routing layer. Adding a new intent requires updating the Dispatcher prompt, the graph routing, and the node implementation — three places instead of one. The payoff is that each agent's behaviour is bounded and testable in isolation.
These decisions were made over the course of the build — some at the start, some revised mid-way. The architecture decisions document tracks the full history.