← All posts

Why Legal AI Needs Bounded Memory, Not Bigger Prompts

April 2, 2026·5 min read·AI & Technology

Reading mode

One of the laziest ideas in legal AI is that better systems will come from larger prompts.

If the model needs more context, just give it more. Add more documents, more history, more background, more examples, more instructions, more prior work. Maybe then it will understand the matter.

That instinct is wrong often enough to be a product smell.

The better direction is bounded memory.

Bigger prompts are not the same thing as better systems

When legal AI systems struggle, the first reaction is usually to increase the amount of material sent into the model:

  • more matter documents
  • more historical examples
  • more style instructions
  • more prior conversation
  • more extracted facts

That can help in narrow situations. As a product strategy, it breaks down quickly.

Large prompts create four predictable problems:

  • token cost rises fast
  • latency gets worse
  • irrelevant context dilutes the actual task
  • the system becomes less legible because no one can tell what drove the output

Beyond technical inefficiency, legal work turns this into a trust problem.

If the system answers a question or drafts a document using a giant pile of mixed context, the lawyer cannot easily see what was loaded, what was relevant, and what should have stayed out.

The better model is bounded memory

Bounded memory means the system stores context persistently outside the model and loads only what a specific task needs.

The principle sounds obvious. Many AI products still are not built that way.

Instead of treating the model prompt as the primary memory surface, the system should have:

  • persistent records
  • scoped retrieval rules
  • typed facts where appropriate
  • task-specific context assembly
  • visible boundaries around what reaches a given run

The practical goal is simple:

the system should know what not to load

That boundary makes memory useful.

Recent research is moving in the same direction

The broader multi-agent systems literature is moving the same way.

Governed Memory

Governed Memory describes a production architecture for multi-agent workflows with entity-scoped isolation, dual memory, and progressive context delivery.

The branding is secondary. The mechanism does the work:

  • persistent memory outside the model
  • scoped retrieval before semantic search
  • selective context delivery per step

The paper reports a 50.3% token reduction across a five-step workflow, with some steps seeing 86–90% savings when the system sends only the delta instead of reloading everything each time.

Those are the kinds of paper-reported results legal AI buyers should understand. Not because token accounting is interesting in the abstract, but because lower context waste usually means lower cost, lower latency, and a clearer execution path.

Source:

Facts as First Class Objects

Facts as First Class Objects pushes the same idea in a different direction.

Its argument is that facts should not only survive as embedded text in a large prompt. They should be addressable, persistent objects.

Legal AI should not have to rediscover the same core matter facts from scratch in every run. Some context should be retrieved as facts, not re-explained as prose.

Source:

Multi-Agent Memory from a Computer Architecture Perspective

This paper reframes the problem correctly: multi-agent memory is a systems problem, not a prompt problem.

Its I/O / cache / memory hierarchy is especially useful.

The practical lesson is simple: context assembly should be treated like a cache layer with scope, freshness, and consistency rules, not as a giant string builder attached to a model call.

Source:

Why Legal Work Raises the Stakes

In legal AI, bigger prompts are not just inefficient. They are dangerous in a very specific way.

Legal work is sensitive to:

  • confidentiality
  • privilege
  • provenance
  • review boundaries
  • task specificity

A system that responds to complexity by shoving more documents and more matter history into the next prompt is not just spending more. It increases the surface area for confusion and overexposure.

Legal AI should care about bounded retrieval for that reason.

The right system should be able to answer questions like:

  • what information was loaded for this run?
  • why was it loaded?
  • what remained outside the run?
  • what persisted after the run?

Those are memory-boundary questions, not model-quality questions.

Bigger prompts also create weaker supervision

Lawyers and legal ops teams need to supervise the system, not just admire its output.

That gets harder as prompt construction gets broader and less disciplined.

If every run includes:

  • the full matter history
  • prior documents
  • old examples
  • style notes
  • whatever else seemed plausibly relevant

then review becomes harder, not easier.

The supervising attorney cannot easily tell what drove the output. The system starts to feel smart in a vague way instead of trustworthy in a specific way.

That pushes legal AI in the wrong direction.

What buyers should be probing instead

Many legal AI buyers still get pulled toward context-window bragging rights.

Context-window size is a weak proxy for system quality.

The more revealing issues are:

  • how the product stores context outside the model
  • how it decides what to load
  • how it avoids loading irrelevant or sensitive material unnecessarily
  • how visible the boundary is around each run

The real win is not maximum context capacity. The win is context discipline.

The direction of travel

The strongest systems are moving toward:

  • scoped persistent memory
  • typed facts where useful
  • explicit context assembly
  • progressive delivery instead of repeated full reloads
  • clearer separation between persistent records and per-run context

Legal work needs exactly those properties.

The future of legal AI is not a bigger prompt window full of client documents.

The better path is a system that knows what belongs in memory, what belongs in the current run, and what should stay out.

The infrastructure legal runs on.

Guided by attorney judgment.