← All posts

ai-technology

Why Review Boundaries Matter More Than Model Choice

April 2, 2026

The legal AI market still obsesses over the wrong comparison.

People ask:

  • Claude or GPT?
  • hosted or local?
  • fine-tuned or not?
  • how many tokens?

Those questions are not meaningless. They are not the first question a law firm should ask.

The first question should be simpler:

Where is the review boundary?

If a legal AI system cannot enforce a real boundary between draft output and legal effect, the rest of the stack matters less than vendors want buyers to believe.

Model choice matters less than system boundaries

A better model can improve drafting quality. It can improve recall, reasoning, and fluency. All of that matters.

But legal risk does not begin and end with output quality.

Legal work is sensitive because it creates consequences:

  • a filing can misstate authority
  • a letter can waive leverage
  • a summary can omit a fact that matters
  • a demand can overstate a record
  • a citation can be presented as reliable when it has not been verified

The question is not just whether the model is strong. The question is whether the surrounding system treats its output as draft, reviewable, and bounded before anyone relies on it.

That is a workflow problem, not a model-selection problem.

ABA 512 points in that direction

ABA Formal Opinion 512 was issued on July 29, 2024. It remains the clearest ABA statement of the duties lawyers still carry when generative AI is part of representation.

Those duties include:

  • competence
  • confidentiality
  • supervisory responsibilities
  • candor
  • reasonable fees

Read at the systems level, 512 does not say "pick the best model." It says lawyers remain responsible for what the system does and how they use it.

That implies a design requirement:

AI output should move through a reviewable workflow before it becomes externally effective legal work.

Source:

Heppner shows why soft review language is not enough

United States v. Heppner, No. 25-cr-00503-JSR (S.D.N.Y.), makes the same point from a different direction.

Judge Rakoff ruled from the bench on February 10, 2026 and issued a written memorandum on February 17, 2026. The court held that the defendant's written exchanges with Anthropic's consumer version of Claude were protected by neither the attorney-client privilege nor the work product doctrine on the facts presented.

Heppner is not a general anti-AI ruling. It is a reminder that weak workflow boundaries create legal consequences.

That is why generic review language is not enough. The system has to make review real.

Source:

What a real review boundary looks like

A real review boundary is not a disclaimer that says lawyers should check the output.

It is enforced system state.

That means the product can distinguish between:

  • generated draft
  • pending review
  • edited draft
  • approved output
  • rejected output

And it means the system can prevent certain things from happening until review occurs.

For legal AI, that should include at least:

  • sending externally effective communications
  • exporting final legal work product
  • filing or issuing operative output
  • marking generated material as final approved work

That is what makes review architectural rather than aspirational.

Why this matters more than model choice

Imagine two legal AI systems.

System A uses the best available model, but the workflow is loose. Generated output can be copied, exported, or acted on with minimal structure. Review is expected, but not strongly enforced.

System B uses a slightly weaker model, but the workflow is strict. Output is task-bounded, provenance is visible, and the system keeps draft and final states separate until a lawyer acts.

For real legal practice, System B is often the better system.

That is not because model quality does not matter. It is because review boundaries determine whether model quality is being used inside a controlled legal workflow at all.

Review boundaries also improve supervision

Partners and supervising attorneys do not just need better drafts. They need to know:

  • what ran
  • what information was used
  • what changed during review
  • what became final
  • who approved it

That is much easier when review is a system boundary instead of a cultural expectation.

Without that structure, legal AI starts to behave like consumer software with a professional disclaimer attached.

That is not enough.

What a serious buyer should examine

Model branding is easy to compare. Workflow discipline is harder, but much more important.

A serious buyer should examine:

  • where draft becomes final
  • what requires human review before external effect
  • what records exist of generation, editing, and approval
  • what the system can block automatically
  • what a supervising attorney can actually see

Those answers tell you more about whether a legal AI system belongs in real practice than any benchmark chart.

The real comparison

The legal AI market likes model comparisons because they are easy to market.

Review boundaries are harder to market because they require the product to be disciplined.

But that is the real comparison that matters.

The best legal AI systems will not be the ones with the most impressive demos. They will be the ones that make legal judgment visible, enforce review before legal effect, and keep draft output inside a real workflow boundary until a lawyer acts.

That is why review boundaries matter more than model choice.

FlowCounsel includes pipeline management, directory presence, and AI-managed campaigns.

By invitation only. We're onboarding select firms.