When Policies Are Not Enough: Sullivan & Cromwell, AI Architecture, and the Tool-Selection Problem

On April 18, 2026, Andrew Dietderich, co-head of Sullivan & Cromwell's global restructuring practice, sent an apology letter to Chief Judge Martin Glenn of the U.S. Bankruptcy Court for the Southern District of New York.

The apology concerned an emergency motion filed on April 9 in the Chapter 15 proceeding of Prince Global Holdings Limited. The motion contained inaccurate citations and other errors, including AI hallucinations. Public coverage describes wrong or fake cases and numbers, inaccurate article titles, misquoted authority, and corrected sentences in the firm's later redline.

Opposing counsel at Boies Schiller Flexner caught the errors. Sullivan & Cromwell did not.

The letter is important because of what it admits about the firm's internal safeguards. Dietderich told the court that Sullivan & Cromwell had policies, training, and review processes designed to prevent this exact problem. The firm's AI-use policies were not followed in preparing the motion. The citation review process did not identify the AI-generated inaccuracies. Some errors also appear to have resulted from manual error.

Dietderich signed the apology himself. He did not point at an associate. That is worth noting. Accountability matters, especially when the easier move would be to turn the incident into a junior-lawyer story.

But the useful takeaway is not personal blame.

The useful takeaway is architecture.

The easy read is the wrong read

The easy read of this story is that a prestigious Wall Street firm got humbled, and that even elite lawyers are vulnerable to AI hallucinations.

That read is incomplete.

Sullivan & Cromwell is not a random AI novice. The firm has a formal Artificial Intelligence Practice. Its own materials say it advises OpenAI on major AI infrastructure and partnership transactions, including OpenAI's Microsoft partnership. The firm also lists AI-ecosystem work spanning Character.AI, xAI, and major compute partnerships across AMD, CoreWeave, and Oracle.

If any law firm has both the sophistication and the incentive to avoid an AI-citation failure in a court filing, Sullivan & Cromwell is on the short list.

And the policies, training, and verification layers still did not catch the problem.

The headline-friendly version is irony.

The useful version is systems design.

Policies sit on top of architecture

This is the pattern that keeps repeating.

Damien Charlotin maintains a public database of AI hallucination cases. NPR reported earlier this month that his worldwide tally had passed 1,200 incidents, about 800 of them from U.S. courts. A Stanford Cyberlaw analysis of U.S. attorney cases in the database through October 2025 found that 90% of the law firms involved were solo practices or small firms.

Sullivan & Cromwell breaks the small-firm pattern cleanly.

It does not break the pattern underneath.

The pattern underneath is not simply lawyer laziness. Some cases involve carelessness. Some involve pressure. Some involve lawyers misunderstanding the tool. Some involve ordinary human review missing something a second person was supposed to catch.

The deeper pattern is that general-purpose generative AI can produce plausible-sounding legal output that cannot be made reliable by policy alone.

Training modules do not change what the model is doing underneath. Reminder emails do not change probabilistic generation. A secondary review process depends on a human second-checker noticing the false citation, false quote, or misstated source in the middle of a dense filing.

Sometimes they notice.

Sometimes they do not.

As I wrote in LLMs Do Not Reason. Legal AI Has to Account for That., hallucinations are not a bolt-on bug in an otherwise reasoning system. They are a natural consequence of probabilistic generation when the surrounding context does not constrain the output to something specifically true.

Retrieval, verification, grounding, and review architecture can reduce the risk materially. Model branding cannot make the problem disappear. Neither can policy documents.

The Sullivan & Cromwell letter is unusually useful because it names the policy layer and admits the policy layer did not catch the problem.

That is a systems report.

It should be read that way.

The tool-selection question most firms are not asking

The Sullivan & Cromwell letter and public reporting do not identify which AI tool produced the hallucinated material. The firm declined to say.

That matters less than the category question.

There is a real distinction between:

a public or general-purpose AI tool used as a drafting surface
a firm-scoped legal workflow built around controlled retrieval, bounded context, provenance, citation verification, and review states

Those are not the same category of tool.

They should not be evaluated by the same standard.

They should not be used for the same kind of legal work.

United States v. Heppner, decided February 17, 2026, made part of this distinction legally visible. As I wrote in What ABA 512 and Heppner Together Require From Legal AI Systems, Judge Rakoff held that exchanges with Anthropic's consumer version of Claude were protected by neither the attorney-client privilege nor the work-product doctrine on the facts before the court.

Heppner addressed one side of the tool-selection problem: what information reaches the model and under what workflow conditions.

The Sullivan & Cromwell incident addresses the other side: what comes back out and how the firm knows whether it is safe to rely on.

Both point to the same question:

Does the firm know, for every legal task, which tool is appropriate, why, and what must happen around that tool before work leaves the system?

That is the tool-selection question most firms are still not asking with enough rigor.

Does the vendor care about the outcome?

Legal AI vendors fall into two broad camps right now.

One camp optimizes for the demo. Bigger context windows. Smoother drafting surfaces. Better benchmark performance. Confident claims about reasoning, autonomy, and agentic legal work.

Those products can be impressive. They can also be evaluated too much like consumer chatbots: by output polish rather than workflow architecture.

The other camp optimizes for the outcome the lawyer is actually accountable for.

That means:

bounded retrieval
visible review states
provenance
approval gates
controlled storage boundaries
a clear line between draft and approved work
a record of what ran, what source material was used, what changed during review, and what became operative

The difference is not marketing.

The difference is whether, when something goes wrong, the system can tell a supervising attorney what happened, or whether the attorney has to reconstruct the middle of the workflow from memory and hope the error is visible.

A buyer can tell which camp a vendor is in by the questions the vendor welcomes.

A vendor optimizing for the demo tends to redirect toward model branding, context-window size, and productivity claims.

A vendor optimizing for the outcome tends to welcome questions about retrieval scoping, provenance, review enforcement, and what the system does when it is wrong.

The Sullivan & Cromwell incident is a good forcing function for buyers to ask the harder questions.

What supervising attorneys should be able to answer

A supervising attorney in a firm using AI should be able to answer, for any AI-assisted work product that leaves the firm:

which tool produced which part of the work
what context and source material the tool was given
what was excluded from the run
what draft status the output carried when it left the tool
what changed during review
what state the work entered before it reached a court, a client, or an adverse party

Those are not theoretical requirements.

They are the operational translation of the duties ABA Formal Opinion 512 says lawyers still carry when AI is in the workflow. Competence, confidentiality, candor toward tribunals, supervisory responsibility, and reasonable fees do not transfer to a model because the model produced fluent output.

If a firm's current tooling cannot support those answers, the firm is carrying the risk Sullivan & Cromwell just wrote a letter about.

Policies help at the margins.

They do not substitute for architecture.

The pro se asymmetry that rarely gets named

One dimension of the hallucination problem rarely appears in biglaw coverage.

In the Stanford analysis of Charlotin's database, pro se litigants accounted for 160 U.S. cases in the dataset the author downloaded, more than the lawyer-only sample used for the main firm-size analysis.

That matters.

People without lawyers are often using free consumer AI tools because they cannot afford anything else. They submit briefs with fabricated citations they have no realistic way to verify. The sanction may fall on a person with the least capacity to absorb it.

Sullivan & Cromwell had Boies Schiller reading its motion.

A pro se tenant facing eviction does not have that.

The access-to-justice consequences of unbounded consumer AI use are not symmetrical. When an elite law firm files a motion with AI hallucinations, opposing counsel catches it, the firm apologizes, and the case continues. When a pro se litigant files a brief with AI hallucinations, the filing itself can damage their position permanently.

That asymmetry is part of why legal infrastructure matters.

The tools that serve solo practitioners, small firms, legal aid organizations, and public legal-help layers need the same architectural discipline as the tools serving elite firms.

The consumer AI shortcut is most dangerous for the people with the least recourse when it fails.

The standard the market should adopt

The useful takeaway from the Sullivan & Cromwell apology is not that elite firms are fallible.

Everyone already knew that.

The useful takeaway is that tool selection has to become a first-class legal AI decision.

A consumer drafting surface, a general-purpose AI assistant, a legal research tool, and a firm-scoped workflow with provenance and review states are not the same category of system. Policies matter, but they cannot erase those differences.

Serious buyers should ask whether the system can show what happened in the middle of the workflow:

what was retrieved
what was excluded
what was generated
what was reviewed
what was approved
what left the system

The difference between legal AI that produces impressive drafts and legal AI that can hold up in practice is not marketing.

It is architecture.

FlowCounsel™ builds AI-enabled software for legal teams. FlowLawyers is the consumer-facing legal help platform with attorney discovery, legal aid routing, state-specific legal information, and document tools. Neither provides legal advice. Attorney supervision of legal AI output is required.