← All posts

The Risk Is the Scaffold

June 3, 2026·11 min read·AI & Technology

Reading mode

Anthropic published a useful security signal this week: a year of AI-enabled cyber threat activity mapped against MITRE ATT&CK.

The report is about cyber operations, not legal AI. But the lesson transfers cleanly. The highest-risk behavior is not only bad text, bad prompts, or bad model output. It is the scaffold around the model: the tools, connectors, runtime permissions, execution loops, and decision paths that let a model move from one step to the next.

In Anthropic's analysis, malicious actors were not becoming dangerous simply because they asked a model for isolated help. The risk increased when AI moved deeper into the operation: account discovery, lateral movement, credential access, exfiltration, and chained execution. The most important distinction was not the interface the actor used. It was how much operational work the scaffold allowed the model to carry.

Legal AI needs to read that carefully.

Not because legal work is cyber operations. Because the architecture problem is similar.

A model that drafts a bad paragraph is one class of risk. A model connected to documents, client communications, matter records, intake routing, payment systems, signing flows, court portals, government workflows, and external delivery paths is a different class of risk.

The second system is not dangerous because the model is smarter. It is dangerous because the scaffold can act.

The Old Risk Frame Is Too Narrow

Legal AI has spent too much time treating the primary risk as bad generation.

That risk is real. Hallucinated citations, fabricated quotes, weak summaries, and unsupported legal conclusions can create sanctions, malpractice exposure, client harm, and loss of trust.

But generation risk is only one part of the system.

The next risk class is execution risk.

Execution risk appears when AI is no longer just producing output for a lawyer to read. It appears when the system can retrieve more data, choose tools, update records, send communications, route intakes, stage documents, transfer files, or prepare something for external effect.

The question changes.

Not only: did the model say something wrong?

Also: what was the system allowed to do after it said it?

That distinction is where many legal AI evaluations are still underbuilt. They ask about model quality, citation behavior, or confidentiality in isolation. Those are necessary questions. They do not answer whether the connected system can chain actions across a workflow without the right control state.

Interface Choice Is Not the Boundary

One of the more useful findings in Anthropic's analysis is that the interface did not cleanly distinguish risk. A chat surface, API access, or an agentic coding tool did not by itself explain which actors were higher risk. The more durable signal was where and how AI was applied inside the operation.

Legal buyers should internalize the same point.

"We use a safe model" is not a complete answer.

"We use a secure cloud provider" is not a complete answer.

"We have an agent runtime" is not a complete answer.

"We have a sandbox" is not a complete answer.

Each can be useful. None is the legal control layer.

The relevant question is how authority is assigned and constrained. What tools can the system call? What data can those tools reach? What happens if a tool returns hostile or misleading content? What state transition is required before anything leaves draft status? What is logged? What can be replayed? What gets blocked? What requires approval?

A weakly governed agent running through a polished interface is still weakly governed.

A model behind enterprise infrastructure can still become risky if the application gives it broad tool authority.

Scaffolding Is Where Risk Compounds

The word "scaffold" is useful because it moves the discussion away from model personality and toward system design.

The scaffold is everything around the model that lets it do work:

  • the runtime
  • the tool registry
  • the connector permissions
  • the context assembly rules
  • the retry behavior
  • the state machine
  • the approval rules
  • the audit trail
  • the external-action locks

In legal AI, the scaffold determines whether a model is merely assisting a lawyer or operating inside a workflow with legal consequences.

That is why tool connectors deserve more scrutiny than they usually receive. A connector is not just an integration. It is an authority grant. It can decide what the model sees, what action becomes available, what records are touched, what data comes back, and whether a downstream system treats the output as ready to use.

MCP-style connectors and similar tool ecosystems will be useful. They will also become one of the largest trust boundaries in legal AI. Every connector should be treated as untrusted until its authority class, ownership, permissions, audit behavior, and review requirements are explicit.

The default posture should be boring:

  • read-only before write-capable
  • draft-only before external-send
  • internal records before client-facing delivery
  • explicit approval before external effect
  • per-call audit before trust
  • bounded context before broad retrieval

That posture is slower than pretending every connector is safe.

It is faster than explaining later why the system sent, filed, exported, or shared something no attorney approved.

Agent Worms Make The Same Point More Sharply

The same lesson is now appearing in agent-worm research.

One recent arXiv paper on autonomous LLM-agent worms describes a risk pattern that should be familiar to anyone building agentic software: long-running agents with persistent workspaces, memory files, scheduled task state, and messaging integrations can allow attacker-influenced content to be written into state, re-enter later model context, and drive high-risk actions.

Another line of research demonstrated self-propagating attacks across LLM-agent ecosystems by abusing persistent configuration, tool-execution privileges, and cross-agent messaging.

Those papers are not legal-industry product reviews. They are controlled security research, and they should be read with that limit in mind.

But they expose the same architectural fault line.

Once a model is connected to memory, files, tools, scheduled work, and other agents, the risk is no longer contained in the answer the model gives a user. The risk can live in what gets persisted, what gets reloaded, what gets trusted on the next run, what tool authority is still available, and what other system receives the output.

Legal systems are full of durable state: matter records, document stores, intake histories, client communications, deadlines, templates, approval history, firm playbooks, and privileged work product. If those states are loaded into agentic workflows without strict boundaries, the problem is not just hallucination. It is contaminated context, stale authority, cross-matter exposure, and action chains that become difficult to reconstruct after the fact.

"Read-only" cannot be treated as automatically safe. In an ordinary SaaS workflow, read access may look low risk. In an agentic workflow, read access can change future context, summaries, memory candidates, routing decisions, or downstream drafts. A bad read can become a bad premise. A bad premise can become a bad action.

The better legal AI posture is not panic. It is separation:

  • untrusted content does not promote itself into trusted memory
  • external reads reduce, rather than expand, later action authority
  • persistent state has promotion rules
  • tool output does not get to rewrite tool policy
  • workflow records preserve what re-entered context and why

The scaffold question keeps returning.

Legal Needs Authority Classes

Legal AI products should classify tools by what they can do, not by how impressive they sound.

A useful starting taxonomy is:

  • read_only: can inspect approved records or sources
  • draft_only: can prepare text or artifacts without changing external state
  • internal_write: can update internal workflow state
  • external_send: can send email, SMS, letters, or client communications
  • external_file_or_export: can share, download, or transfer files
  • financial: can affect payment, billing, trust, settlement, or distribution records
  • court_or_government: can touch filings, court systems, government workflows, or public-sector records

Those classes should not be marketing labels. They should drive policy.

A read-only connector may require ordinary logging. A draft-only specialist may require source provenance and review state. An external-send workflow should require approval. Financial and court/government workflows need stricter authority, stronger audit, and narrower eligibility.

The system should also care about chains.

A single read operation may be low risk. A chain that reads a document, extracts a fact, updates a matter record, drafts a client communication, attaches a file, and prepares an outbound message is not the same risk class. The chain changed the posture of the work.

This is where generic agent traces are not enough. A trace may show what happened. A legal control layer has to decide what is allowed to happen before the trace exists.

Approval Is Not A Button

The legal profession already has the right instinct: lawyers approve legal work.

The product question is whether approval is real architecture or a decorative button.

A real approval system has state. It knows what artifact is being approved, what sources supported it, what context was loaded, which tools ran, what reviewer had authority, what policy version applied, and what downstream effect the approval unlocks.

It can also handle reversal. If an approval is recalled or superseded, the system can identify dependent work and quarantine or re-review what relied on the old approval.

That is different from letting a user click "approve" on a generated output and hoping the surrounding workflow behaves.

The Anthropic report is useful here because it shows why autonomous chains are hard to reason about after the fact. Once a model can move through operational stages, risk is no longer contained in one answer. It is distributed across the sequence.

Legal AI has the same problem in a different domain.

The record has to carry the sequence.

The approval state has to govern the sequence.

The audit trail has to reconstruct the sequence.

Security Pages Need To Prove The Product

Most software security pages were built for the SaaS era.

They cover encryption, subprocessors, hosting, access controls, security contacts, uptime, and compliance posture. Those still matter. They are not enough for AI systems that can use tools.

The AI-era security page has to become part of the product argument. It should answer a different set of questions:

  • what can the model do directly?
  • what can tools do on the model's behalf?
  • which actions are read-only, draft-only, internal-write, or externally effective?
  • what requires approval?
  • how are tool calls logged?
  • how are hostile inputs handled?
  • how are connectors scoped?
  • how is context assembled?
  • what prevents one workflow from crossing into another without authority?
  • how can a firm reconstruct what happened?

Legal AI buyers should demand that security conversation, and vendors should be willing to publish it without hiding behind generic "enterprise-grade" language.

Security is not only where the data is hosted.

Security is what the system is allowed to do.

The Legal Control Layer

The right response is not to reject agents, connectors, or runtime engineering.

Legal AI will need stronger runtimes. It will need better connectors. It will need better sandboxes for workflows that execute code or touch untrusted tool surfaces. It will need observability, evals, policy enforcement, and incident response.

But the legal control layer sits above that.

It includes:

  • record truth
  • bounded context
  • source provenance
  • verifier state
  • tool authority
  • workflow state
  • review state
  • approval state
  • external-effect locks

That layer keeps capability from outrunning judgment.

The lesson from AI-enabled cyber misuse is not that every AI system is a cyber weapon. The lesson is that scaffolding changes the risk profile. A model with tools is not just a model. It is part of an operating system.

In legal, operating systems need legal control.

The risk is not only the model.

The risk is the scaffold.


Sources:


FlowCounsel builds AI-enabled software for legal teams. FlowLawyers is the consumer-facing legal help platform with attorney discovery, legal-aid routing, state-specific legal information, and document tools. Neither provides legal advice. Attorney supervision of legal AI output is required.

The infrastructure legal runs on.

Guided by attorney judgment.