01The question

An engineer asks
the question the code can't answer.

Mid-change in an unfamiliar service, the real question isn't what the code does — it's why it's shaped this way, and what was already ruled out. Same query, whether a human types it or an agent calls search_decisions over MCP.
ask lema · lemahq/lema
>Why did we choose an MCP-first architecture?

02The answer, with the why

The why — and the why-not — cited.

lema returns a synthesized answer plus the claims it rests on, each typed and sourced to a real decision: what was chosen, the constraints that forced it, what was rejected, and the consequences. Every inline citation jumps to its source.
app.lema.dev/ask
Why did we choose an MCP-first architecture?⌘↵
lema answerlema

lema is MCP-first: the open-source server is the portability layer that serves a handful of relevant decisions to an agent just-in-time, over one tool contract that is identical whether it runs locally or against hosted retrieval 1ADRADR-0016One MCP server is the single surface agents call — all writes route through it — with a mixed model strategy (Claude via Anthropic, Gemini/embeddings via Vertex) underneath.locatordocs/adr/0016. The shape was forced by where lema runs — npx lema-mcp on the engineer’s own machine, no account, the user’s own agent doing the reasoning 2ADRADR-0027The free wedge is npx lema-mcp on the user’s machine — no account, no GCP — and the tool contract is identical to hosted, so graduating swaps an endpoint, not a workflow.locatordocs/adr/0027 — and by the model gap on GCP, which is why writes route through the MCP server with a mixed Anthropic + Vertex strategy underneath 1ADRADR-0016One MCP server is the single surface agents call — all writes route through it — with a mixed model strategy (Claude via Anthropic, Gemini/embeddings via Vertex) underneath.locatordocs/adr/0016. Bundling inference into lema was ruled out: on the serve path the agent is the customer’s, not ours 3ADRADR-0027Folding inference into lema was rejected for the serve path: the agent reasoning over the atoms is the customer’s own (Claude Code, Cursor), so lema returns atoms, not answers.locatordocs/adr/0027. The payoff is measured — tight, sourced atoms instead of whole ADRs, ~71× fewer tokens per query 4ADRADR-0040Agents get a tight, cited slice of the decision graph per call — measured at ~71× fewer tokens than reading the source ADRs — instead of grepping the repo blind.locatordocs/adr/0040.

4 sources · lema
1ADR-0016chosen
One MCP server is the single surface agents call — all writes route through it — with a mixed model strategy (Claude via Anthropic, Gemini/embeddings via Vertex) underneath.
lema
2ADR-0027constraint
The free wedge is npx lema-mcp on the user’s machine — no account, no GCP — and the tool contract is identical to hosted, so graduating swaps an endpoint, not a workflow.
lema
3ADR-0027rejected
Folding inference into lema was rejected for the serve path: the agent reasoning over the atoms is the customer’s own (Claude Code, Cursor), so lema returns atoms, not answers.
lema
4ADR-0040consequence
Agents get a tight, cited slice of the decision graph per call — measured at ~71× fewer tokens than reading the source ADRs — instead of grepping the repo blind.
lema

03How we got here

How did we get from A to L.

Open the decision and lema maps its lineage — supersedes, depends-on, related-to — as a decision graph, beside the decision body itself. The reasoning isn't a flat note; it is a navigable record of how the architecture actually arrived where it is.

Decision graph

Agent-generated graph of the workspace’s decisions and how they relate — supersedes, depends-on, related-to, and rejected alternatives. Click “Generate graph” to map lema’s decision graph.
Accepted documentrev r0

§ Decision

lema is MCP-first: the open-source server serves decisions just-in-time over one neutral tool contract, identical whether it runs locally (npx lema-mcp, no account) or against hosted retrieval over the full atom layer.

§ Alternatives considered

Bundling inference and every surface into the web app was rejected — it strands non-GitHub forges, couples retrieval to a single runtime, and puts lema in the loop on the serve path where the agent is already the customer’s own.

§ Consequences

Agents on any runtime (Claude Code, Cursor, anything that speaks MCP) get the relevant few decisions at write time — measured at ~71× fewer tokens than reading the source ADRs.

04Served to your agent

The same knowledge,
just-in-time over MCP.

Everything above is one surface on the decision graph. The other is the open-source MCP server: the identical answer your agent reads at write time, as tight, sourced atoms instead of whole ADRs.
~71×
fewer tokens / query

The same answer your agent gets over MCP costs ~211 atom-tokens instead of the ~31.9k tokens of the source ADRs it cites — a ~71× average across our 12162× goldset.

range 12162×·20-query goldset · 39 ADRs · 390 claims·measured (ADR-0040)

Honest about the number — the baseline is the full bodies of the cited ADRs, “what you'd read to get this.” It's favorable because lema's own ADRs are unusually large; on smaller decisions the ratio shrinks. The durable claim isn't the exact multiple — it's the order of magnitude: hundreds of focused atom-tokens where the source is tens of thousands.

$npx lema-mcp --repo github.com/yourorg/yourrepo