01The question
An engineer asks
the question the code can't answer.
search_decisions over MCP.02The answer, with the why
The why — and the why-not — cited.
lema is MCP-first: the open-source server is the portability layer that serves a handful of relevant decisions to an agent just-in-time, over one tool contract that is identical whether it runs locally or against hosted retrieval 1ADRADR-0016One MCP server is the single surface agents call — all writes route through it — with a mixed model strategy (Claude via Anthropic, Gemini/embeddings via Vertex) underneath.locatordocs/adr/0016. The shape was forced by where lema runs — npx lema-mcp on the engineer’s own machine, no account, the user’s own agent doing the reasoning 2ADRADR-0027The free wedge is npx lema-mcp on the user’s machine — no account, no GCP — and the tool contract is identical to hosted, so graduating swaps an endpoint, not a workflow.locatordocs/adr/0027 — and by the model gap on GCP, which is why writes route through the MCP server with a mixed Anthropic + Vertex strategy underneath 1ADRADR-0016One MCP server is the single surface agents call — all writes route through it — with a mixed model strategy (Claude via Anthropic, Gemini/embeddings via Vertex) underneath.locatordocs/adr/0016. Bundling inference into lema was ruled out: on the serve path the agent is the customer’s, not ours 3ADRADR-0027Folding inference into lema was rejected for the serve path: the agent reasoning over the atoms is the customer’s own (Claude Code, Cursor), so lema returns atoms, not answers.locatordocs/adr/0027. The payoff is measured — tight, sourced atoms instead of whole ADRs, ~71× fewer tokens per query 4ADRADR-0040Agents get a tight, cited slice of the decision graph per call — measured at ~71× fewer tokens than reading the source ADRs — instead of grepping the repo blind.locatordocs/adr/0040.
03How we got here
How did we get from A to L.
Decision graph
§ Decision
lema is MCP-first: the open-source server serves decisions just-in-time over one neutral tool contract, identical whether it runs locally (npx lema-mcp, no account) or against hosted retrieval over the full atom layer.
§ Alternatives considered
Bundling inference and every surface into the web app was rejected — it strands non-GitHub forges, couples retrieval to a single runtime, and puts lema in the loop on the serve path where the agent is already the customer’s own.
§ Consequences
Agents on any runtime (Claude Code, Cursor, anything that speaks MCP) get the relevant few decisions at write time — measured at ~71× fewer tokens than reading the source ADRs.
04Served to your agent
The same knowledge,
just-in-time over MCP.
The same answer your agent gets over MCP costs ~211 atom-tokens instead of the ~31.9k tokens of the source ADRs it cites — a ~71× average across our 12–162× goldset.
Honest about the number — the baseline is the full bodies of the cited ADRs, “what you'd read to get this.” It's favorable because lema's own ADRs are unusually large; on smaller decisions the ratio shrinks. The durable claim isn't the exact multiple — it's the order of magnitude: hundreds of focused atom-tokens where the source is tens of thousands.