LLMs Do Not Reason. Legal AI Has to Account for That.

A legal tech investor I respect said on a podcast that large language models "legitimately understand human language and understand reasoning and can perform conceptual reasoning."

He was wrong, and legal AI products built around that mistake will fail in predictable ways.

Not because the tools are unimpressive. They are impressive. Not because they are useless. They are useful enough to reshape whole professions. The problem is that a lot of legal AI commentary now treats the model as a reasoning agent instead of as probabilistic machinery trained on human-produced artifacts.

That error changes the products people build, the policies they write, and the professional-responsibility judgments they make.

What LLMs actually do

Large language models generate likely token sequences conditioned on context. A model takes the text and other inputs you provide, computes a probability distribution over possible next tokens, samples from that distribution, appends the result, and repeats.

Scaled to hundreds of billions of parameters and trained over massive corpora, that mechanism produces output that often feels like reasoning. It can explain, summarize, translate, draft, compare, code, and argue in ways that are genuinely useful.

But the feel of reasoning and the mechanism of reasoning are not the same thing.

One is an observable property of output. The other is a structural property of the system. If you confuse them, you start making claims the architecture does not support.

The tell

The tell came later in the same podcast segment.

The investor asked why we have not trained LLMs on dolphin sounds yet, given how powerful the models have become.

That question reveals the answer.

If the system were reasoning in the way the public discourse often implies, the lack of a large paired corpus would not be the bottleneck in the same way. You could give the system known facts about dolphin biology, social behavior, and observed vocalization patterns, then ask it to reason from those facts.

But training is the bottleneck because current LLMs learn from corpora. To model dolphin communication meaningfully, the system would need data that pairs dolphin vocalizations with something that encodes semantic meaning. That data does not exist at the scale these systems need.

That limitation is not a moral failure of the model. It is a clue about what kind of system it is.

Lanier's correction

Jaron Lanier's most useful contribution to the AI discourse is his insistence that "there is no AI."

The machinery is not weak. The apparent intelligence is derivative. These systems reflect patterns in human-produced artifacts back at us at a scale and speed that can feel uncanny. The reflection is powerful. It is also a reflection, not a self-originating source of judgment.

That correction relocates the cognition.

The model architecture and training process shape the output. So do the inference system, retrieval layer, and tools. But the system's apparent intelligence is still downstream of human language, human work, human examples, human documents, human code, and human reasoning captured in the training and retrieval material.

When you place cognition in the model itself, you start believing the model can do things the surrounding corpus, tools, retrieval layer, and verification architecture cannot support.

Legal AI built on that mistake will not hold.

Why legal AI gets this wrong

The reasoning-versus-pattern-completion error produces a predictable cascade of bad legal AI claims.

The first is the "just tell the AI what you want" claim. That assumes the system reasons from your specification to a correct implementation. It does not. It pattern-completes from your specification toward a plausible implementation. If the specification is underdetermined, which it usually is when the user does not understand the system being built, the output can look correct while failing under load, under concurrency, under security review, or under a fact pattern the builder did not test.

The second is the "LLMs can practice law" claim. Legal reasoning is scoped by statute, precedent, procedure, facts, forum, timing, client objectives, and professional obligation. A model trained on legal texts can produce output that tracks legal reasoning because the corpus contains examples of legal reasoning. That does not make the model a lawyer. It makes it a useful drafting and analysis machine that must still be bounded by human legal judgment.

The third is the hallucination misunderstanding. Hallucinations are not a bolt-on bug in an otherwise reasoning system. They are a natural consequence of probabilistic generation when the context does not constrain the output to something specifically true. Retrieval, verification, grounding, and review architecture can reduce the risk materially. Model branding cannot make the problem disappear.

Legal AI cannot be evaluated by output polish alone.

ABA 512 points to the same architecture

ABA Formal Opinion 512 does not use the language of token prediction or pattern completion. It does not need to.

The opinion keeps professional responsibility where it belongs: with the lawyer. Competence, confidentiality, candor, supervision, and reasonable fees do not transfer to a model because the model produced fluent output.

The practical implication follows from the mechanism. The tool can generate. The lawyer must judge.

The same point runs through the broader architecture argument in ABA 512 and Heppner together: legal AI systems need review boundaries, provenance, scoped retrieval, and approval states because the output itself is not enough.

What mature legal AI asks instead

Once you stop treating the model as the locus of reasoning, the evaluation questions change.

The useful questions are not only:

which model does it use?
how big is the context window?
how impressive is the demo?

The better questions are:

what corpus and records can the system retrieve from?
how is retrieval scoped?
what is excluded from the run?
what provenance is attached to the answer?
what verification layer sits between output and reliance?
where does draft become approved work?
what happens when the system is wrong?

Those are product-architecture questions. They are also professional- responsibility questions.

A system that produces output and moves it directly toward legal effect is acting as if pattern completion were reasoning. A system that produces output, shows what it used, keeps state bounded, and routes the work through review is treating the model as what it is: powerful machinery that can prepare work for human judgment.

The broader mistake

The same error shows up outside legal.

The "anyone can code" claim treats directing an LLM as equivalent to engineering. It is not. Specification is the hard part of engineering. The LLM can produce code that looks correct. Whether it is actually correct depends on judgment the model does not have.

The "professionals will be replaced" claim treats pattern completion as a substitute for professional judgment. It is not. Pattern completion compresses the repeatable parts of professional work. It does not remove the need for the person who knows when the plausible answer is wrong.

The "AGI is near" discourse often treats better benchmark performance as evidence of general reasoning. Sometimes it may be evidence of meaningful capability improvement. Often it is evidence of better performance on distributions the system is now better at handling. Those are not the same claim.

Legal should be especially careful here because the costs of overclaiming do not land on the people making the podcast predictions. They land on clients, consumers, and firms that relied on systems whose architecture did not match the rhetoric.

Where this leaves the work

The machinery is impressive. The compression is real. The new capability should be used.

But if you treat the model as a reasoning agent, you build the wrong legal AI product. You overtrust output. You underbuild retrieval. You skip provenance. You treat review as a disclaimer instead of a workflow boundary.

If you treat the model as probabilistic machinery over human-produced corpora, you build differently. You build scoped retrieval, persistent records, verification layers, approval gates, audit trails, and review states. You use the machinery where pattern completion is useful and enforce human judgment where legal work requires it.

That distinction is not academic.

It is the difference between legal AI that looks impressive and legal AI that can hold up in practice.

FlowCounsel builds AI-enabled software for legal teams. FlowLawyers is the consumer-facing legal help platform with attorney discovery, legal-aid routing, state-specific legal information, and document tools. Neither provides legal advice. Attorney supervision of legal AI output is required.

LLMs Do Not Reason. Legal AI Has to Account for That.

What LLMs actually do

The tell

Lanier's correction

Why legal AI gets this wrong

ABA 512 points to the same architecture

What mature legal AI asks instead

The broader mistake

Where this leaves the work

Related posts.

The Risk Is the Scaffold

Sandboxing Is Not the Control Layer

Claude + CoCounsel Strengthens One Category. The Operating Layer Still Sits Underneath.

The infrastructure legal runs on.

Practice Areas

Platform

Resources

Company