Why Legal AI Needs Bounded Memory, Not Bigger Prompts

One of the laziest ideas in legal AI is that better systems will come from larger prompts.

If the model needs more context, just give it more. Add more documents, more history, more background, more examples, more instructions, more prior work. Maybe then it will understand the matter.

That instinct is wrong often enough to be a product smell.

The better direction is bounded memory.

Bigger prompts are not the same thing as better systems

When legal AI systems struggle, the first reaction is usually to increase the amount of material sent into the model:

more matter documents
more historical examples
more style instructions
more prior conversation
more extracted facts

That can help in narrow situations. As a product strategy, it breaks down quickly.

Large prompts create four predictable problems:

token cost rises fast
latency gets worse
irrelevant context dilutes the actual task
the system becomes less legible because no one can tell what drove the output

Beyond technical inefficiency, legal work turns this into a trust problem.

If the system answers a question or drafts a document using a giant pile of mixed context, the lawyer cannot easily see what was loaded, what was relevant, and what should have stayed out.

The better model is bounded memory

Bounded memory means the system stores context persistently outside the model and loads only what a specific task needs.

The principle sounds obvious. Many AI products still are not built that way.

Instead of treating the model prompt as the primary memory surface, the system should have:

persistent records
scoped retrieval rules
typed facts where appropriate
task-specific context assembly
visible boundaries around what reaches a given run

The practical goal is simple:

the system should know what not to load

That boundary makes memory useful.

Recent research is moving in the same direction

The broader multi-agent systems literature is moving the same way.

Governed Memory

Governed Memory describes a production architecture for multi-agent workflows with entity-scoped isolation, dual memory, and progressive context delivery.

The branding is secondary. The mechanism does the work:

persistent memory outside the model
scoped retrieval before semantic search
selective context delivery per step

The paper reports a 50.3% token reduction across a five-step workflow, with some steps seeing 86–90% savings when the system sends only the delta instead of reloading everything each time.

Those are the kinds of paper-reported results legal AI buyers should understand. Not because token accounting is interesting in the abstract, but because lower context waste usually means lower cost, lower latency, and a clearer execution path.

Source:

Governed Memory: A Production Architecture for Multi-Agent Workflows

Facts as First Class Objects

Facts as First Class Objects pushes the same idea in a different direction.

Its argument is that facts should not only survive as embedded text in a large prompt. They should be addressable, persistent objects.

Legal AI should not have to rediscover the same core matter facts from scratch in every run. Some context should be retrieved as facts, not re-explained as prose.

Source:

Facts as First Class Objects: Knowledge Objects for Persistent LLM Memory

Multi-Agent Memory from a Computer Architecture Perspective

This paper reframes the problem correctly: multi-agent memory is a systems problem, not a prompt problem.

Its I/O / cache / memory hierarchy is especially useful.

The practical lesson is simple: context assembly should be treated like a cache layer with scope, freshness, and consistency rules, not as a giant string builder attached to a model call.

Source:

Multi-Agent Memory from a Computer Architecture Perspective

Why Legal Work Raises the Stakes

In legal AI, bigger prompts are not just inefficient. They are dangerous in a very specific way.

Legal work is sensitive to:

confidentiality
privilege
provenance
review boundaries
task specificity

A system that responds to complexity by shoving more documents and more matter history into the next prompt is not just spending more. It increases the surface area for confusion and overexposure.

Legal AI should care about bounded retrieval for that reason.

The right system should be able to answer questions like:

what information was loaded for this run?
why was it loaded?
what remained outside the run?
what persisted after the run?

Those are memory-boundary questions, not model-quality questions.

Bigger prompts also create weaker supervision

Lawyers and legal ops teams need to supervise the system, not just admire its output.

That gets harder as prompt construction gets broader and less disciplined.

If every run includes:

the full matter history
prior documents
old examples
style notes
whatever else seemed plausibly relevant

then review becomes harder, not easier.

The supervising attorney cannot easily tell what drove the output. The system starts to feel smart in a vague way instead of trustworthy in a specific way.

That pushes legal AI in the wrong direction.

What buyers should be probing instead

Many legal AI buyers still get pulled toward context-window bragging rights.

Context-window size is a weak proxy for system quality.

The more revealing issues are:

how the product stores context outside the model
how it decides what to load
how it avoids loading irrelevant or sensitive material unnecessarily
how visible the boundary is around each run

The real win is not maximum context capacity. The win is context discipline.

The direction of travel

The strongest systems are moving toward:

scoped persistent memory
typed facts where useful
explicit context assembly
progressive delivery instead of repeated full reloads
clearer separation between persistent records and per-run context

Legal work needs exactly those properties.

The future of legal AI is not a bigger prompt window full of client documents.

The better path is a system that knows what belongs in memory, what belongs in the current run, and what should stay out.

Why Legal AI Needs Bounded Memory, Not Bigger Prompts

Bigger prompts are not the same thing as better systems

The better model is bounded memory

Recent research is moving in the same direction

Governed Memory

Facts as First Class Objects

Multi-Agent Memory from a Computer Architecture Perspective

Why Legal Work Raises the Stakes

Bigger prompts also create weaker supervision

What buyers should be probing instead

The direction of travel

Related posts.

The Risk Is the Scaffold

Sandboxing Is Not the Control Layer

Claude + CoCounsel Strengthens One Category. The Operating Layer Still Sits Underneath.

The infrastructure legal runs on.

Practice Areas

Platform

Resources

Company