Build Journal

Design Decisions

The decisions that aren't visible in the code — but determined everything about how the code was written. Each one was a genuine choice between two real options, made under constraint and with a specific outcome in mind.

This is the layer of a project that separates an implementer from a product thinker. Anyone can list a tech stack. The interesting question is why.

Interface Model

Should the primary interface be a terminal or a dashboard?

Decision

Terminal interface over dashboard

Why

Dashboards optimise for passive observation — they're built for people who want to see what's happening. A command interface optimises for action — it's built for people who need to change what's happening. Enterprise supply chain modellers who actually resolve solver failures aren't browsing charts. They're diagnosing constraints, proposing fixes, and issuing approvals. The terminal collapses the distance between seeing the problem and fixing it. Every Action Card is rendered in the same stream where the command was issued — context and action are co-located, not split across panels.

Alternatives considered

A traditional dashboard (Figma-designed, React + chart library) was the obvious choice. A chat interface (conversational AI layer over a web app) was another option. Both were prototyped mentally. The dashboard felt passive — too much emphasis on visibility, not enough on resolution. The chat interface felt unconstrained — too open-ended, not enough affordance for structured operational decisions.

Known trade-offs

The terminal is unfamiliar to non-technical users. Discoverability requires intentional design (hence the slash-command palette). There's a learning curve that a dashboard wouldn't have. The bet is that the users who actually need this tool will find the terminal more productive once they know the commands — and that the slash palette bridges the gap for everyone else.

Data Integration Model

How should external data sources connect to the solver?

Decision

Manifest-driven ingestion over direct API integration

Why

Direct API integration is fast to build and brittle in production. Every time a source system changes a field name, adds a column, or switches format, the integration silently breaks. The manifest pattern — where an AI infers the schema mapping, a human reviews it, and only the confirmed contract touches the database — makes every integration explicit, auditable, and correctable. The Schema Inferrer (Claude Sonnet) does the tedious column-mapping work. The human reviews and approves. The confirmed manifest becomes the record of what was agreed between the source system and the solver. This isn't just better engineering — it's better compliance. Every data load is documented before it happens.

Alternatives considered

Direct CSV-to-DB loading without schema inference (fast but fragile). A visual field-mapping UI (familiar but slow). REST API adapters per source system (scalable but requires maintenance per integration). The manifest approach borrows from ETL platform conventions but makes AI-assisted inference first-class rather than a bolt-on.

Known trade-offs

The manifest step adds latency to the first import (2–4 seconds for schema inference). It requires a confirmed manifest before the solver will run on new data — which is deliberate friction. The trade-off is that every integration is documented and every data load is preceded by a human decision. In an enterprise context, that friction is a feature.

Agency Model

Should the AI fix problems automatically, or wait for human approval?

Decision

Human-in-the-loop approval at every DB mutation

Why

The most capable AI systems in enterprise workflows fail not because they're wrong, but because they're unaccountable. Autopilot AI can be right 95% of the time and still be unusable in regulated environments — because no one can explain who approved the change when the auditor asks. TX-1's design principle is simple: the agent proposes, the human decides. The Strategist reasons, dry-runs, and surfaces a card. The card shows the exact SQL that will run, the rationale behind it, and the expected outcome. Nothing touches the database until an explicit Approve signal is received from the user. The audit log records every decision with user_approved = true. The agent is genuinely useful — it does the diagnosis, validation, and proposal work that would otherwise take an hour. But accountability stays with the human.

Alternatives considered

Fully autonomous mode (agent executes immediately on high-confidence fixes). Suggested-only mode (agent surfaces insights but never proposes SQL). A confidence threshold model (autopilot below X% risk, approval above). All were considered. The threshold model is technically elegant but creates a false sense of security — the cases where you most need a human are exactly the cases where confidence scores are ambiguous.

Known trade-offs

The approval gate adds friction to the happy path. A fully automated system would be faster. The counter-argument is that in enterprise operations, 'faster' without accountability isn't a feature — it's a liability. The circuit breaker (halts after 3 failed fix attempts) handles the edge case where the agent loops without converging.

Execution Model

Should the system run locally or in the cloud?

Decision

Local execution via Tauri desktop app

Why

Supply chain data is operationally sensitive. Inventory positions, demand forecasts, and capacity constraints are exactly the kind of data that enterprises are cautious about routing through third-party cloud infrastructure. Running the solver, the LLM calls, and the database locally means data stays on the machine unless the user explicitly exports it. Tauri (Rust + WebView) gives you a native desktop binary with a web-based UI — the best of both worlds. The sidecar pattern (FastAPI process spawned by Tauri) means the Python agent stack runs as a first-class citizen without requiring the user to manage a server. The result is a tool that feels like a desktop app, deploys like a desktop app, and keeps data on-device by default.

Alternatives considered

A web app deployed to Vercel (lowest friction for distribution, highest friction for data governance). An Electron app (heavier binary, weaker security model than Tauri's Rust core). A pure CLI tool (no UI layer, harder to surface Action Cards). Tauri was chosen over Electron specifically for the smaller binary size and Rust-based security boundaries.

Known trade-offs

Local execution means no multi-user collaboration out of the box. Updates require a binary release cycle rather than a server deploy. The LLM calls (Anthropic API) do leave the machine — this is a known and documented exception, and the system is designed so that only the specific rows involved in a constraint violation are sent to the model, never the full database.

Decision UX

Should the approval moment be a binary choice, or should it be informed by a comparison?

Decision

Branch & Compare: simulate both futures before the user decides

Why

A binary Approve/Dismiss prompt puts the cognitive burden entirely on the user — they're trusting the agent's recommendation with no independent reference point. That's fine for low-stakes decisions; it's a design failure for decisions with material operational consequences. Branch & Compare solves this by running both paths — the approved path (fix applied, solver reruns) and the dismissed path (no change, solver stays INFEASIBLE) — as dry-run simulations before any DB write. The result is a side-by-side comparison card that shows the cost of the fix against the cost of inaction. The user isn't approving a recommendation; they're choosing between two quantified futures. This is the difference between an AI that advises and an AI that actually improves the quality of human decisions.

Alternatives considered

Binary Approve/Dismiss with the Sensitivity card providing context (the original design — still present as a pre-card). A confidence score attached to the recommendation ('85% likely to resolve'). A sequential workflow: approve first, see results, then optionally rollback. The confidence score approach was rejected because it abstracts the outcome into a number rather than showing the outcome itself. Sequential approve-then-rollback puts the burden on the user to undo — the comparison flips that: compare first, commit once.

Known trade-offs

Branch & Compare adds 3–5 seconds to the decision flow (two solver runs back-to-back). The dismissed path result is always INFEASIBLE with a known deficit — it's not new information. The value is in making the comparison explicit and quantified rather than implicit and assumed. The timing cost is worth it precisely because the decision is irreversible: once you commit to a DB mutation in a live system, comparison is no longer available.

Agent Architecture

One general-purpose agent, or specialised agents for each task?

Decision

Specialised agents with fixed roles over a general-purpose agent

Why

A single general-purpose LLM handling classification, diagnosis, and proposal would be simpler to build and harder to trust. Specialisation enforces separation of concerns at the model level — the Dispatcher (Haiku) only classifies intent, the Inspector (Haiku) only validates data, the Strategist (Sonnet) only proposes fixes. Each agent has a narrow job and a narrow failure mode. When something goes wrong, you know which agent failed and why. The model choice follows the role: Haiku for fast, cheap, high-volume classification tasks; Sonnet for reasoning tasks where quality matters more than speed. The Archivist (ChromaDB + Haiku) adds institutional memory without requiring the Strategist to carry the full fix history in its context window.

Alternatives considered

A single Sonnet agent handling everything (simpler routing, higher cost, harder to debug). A fully orchestrated multi-agent swarm (more flexible, harder to reason about). LangGraph was chosen specifically because it makes the agent graph explicit and inspectable — you can see the topology, the routing conditions, and the state at each node.

Known trade-offs

Specialisation adds complexity to the routing layer. Adding a new intent requires updating the Dispatcher prompt, the graph routing, and the node implementation — three places instead of one. The payoff is that each agent's behaviour is bounded and testable in isolation.

These decisions were made over the course of the build — some at the start, some revised mid-way. The architecture decisions document tracks the full history.

View Architecture →How It Works →