Memory class architectures for firm-scoped legal intelligence

Abstract

Most legal-AI products describe memory by claiming their system "learns from your firm," then implement that claim as chat-history retention, embedding-cached past sessions, or silent base-model fine-tuning. None of those are memory architectures. They are storage choices. This paper presents FirmIQ™: a three-class memory architecture (episodic, semantic, procedural) grounded in long-standing cognitive-science distinctions (Tulving, 1972; Squire, 1992; Anderson, 1996) and recent agent-systems work (Packer et al., 2023; Park et al., 2023; Wang et al., 2023), translated for legal workflow. Each class is governed independently with explicit promotion discipline, diff-aware learning from attorney review, and context-manifest-bounded retrieval scoped to firm, client, matter, jurisdiction, and specialist. The supervisory framing throughout is the lawyer's non-delegable duty under ABA Model Rule 5.3 and ABA Formal Opinion 512 (2024).

§1Thesis

Most legal-AI memory claims resolve to storage, not architecture.

Almost every legal AI product now describes its system as "learning from your firm." In observed practice that claim resolves to one of three storage patterns: conversation history retained in the model context window, embeddings of past chat sessions cached for retrieval, or user inputs pooled (sometimes silently) into shared base-model fine-tunes. None of these patterns are memory architectures. They are storage choices framed as intelligence.

The cognitive-science distinction between episodic and semantic memory (Tulving, 1972; Squire, 1992) and the procedural-vs-declarative split (Anderson, 1996) maps onto a problem legal AI has not solved: how does a system retain the firm's accumulated judgment (which redlines a partner refused, which jurisdictional defaults a client requires, which procedural sequences resolved a class of matters) in a form that is governable, scoped, and revocable. Raw chat history cannot. Embeddings of chat sessions cannot. Silent fine-tunes can, but only in a form that fails on attorney-supervision tests (ABA Model Rule 5.3; ABA Formal Op. 512, 2024).

FirmIQ™ is the FlowCounsel™ layer that distills attorney review signal into firm-scoped patterns the system retrieves on the next matter. It is a memory architecture, not a context cache. The remainder of this paper specifies its three classes, the capture discipline, the attribution mechanism, the promotion state machine, and the retrieval boundary. The framing extends an earlier argument in the FlowCounsel blog post "Why Legal AI Memory Is a Systems Problem, Not a Prompt Problem" (FlowCounsel, April 2026).

Chat history is not memory. Memory is the firm's accumulated judgment, captured in a form the system can govern and retrieve.

§2Three classes

Memory in legal work has three distinct shapes.

The three-class taxonomy (episodic, semantic, procedural) is grounded in cognitive-science research on long-term memory systems (Tulving, 1972; Squire, 1992; Anderson, 1996) and has informed several recent agent-systems architectures (Park et al., 2023; Packer et al., 2023; Wang et al., 2023). The translation for legal workflow requires each class to answer a different question, carry a different lifecycle, and run under a different governance discipline.

Episodic memory remembers what happened. Semantic memory holds what the firm has approved. Procedural memory learns the shape of how work moves to approved.

Episodic memory holds what happened on a single matter: draft versions, attorney redlines, reviewer comments, approval decisions, recall events, the context manifest used to produce each draft, and the verifier results available at the time. The closest published analog is the memory-stream architecture used in interactive simulacra (Park et al., 2023). Implementation: Postgres for metadata, S3 for full artifacts and version snapshots.
Semantic memory holds what the firm has approved: firm preference rules, client guidelines, jurisdictional playbooks, fallback clause libraries, approved practice-pack rules, source-linked reference blocks. Each entry is typed, versioned, scoped, and provenance-linked. The constraint is closer to a curated knowledge base than to model fine-tuning.
Procedural memory holds how approved work moves: the order of review operations that resolved a matter, the escalation sequence that worked, the negotiation progression that produced the approved settlement, the recovery path after a verifier flagged a problem. The published analog is the skill-library pattern (Wang et al., 2023), modified so promotion to the active library requires the governance described in §5.

§3Capture discipline

Document-level diffs, not character-level surveillance.

The strongest learning signal in legal work is the final diff between what the specialist produced and what the attorney approved. That diff carries the firm's judgment in a form the system can attribute. Character-by-character keystroke telemetry is rarely the high-signal artifact and is almost always the wrong privacy posture (FlowCounsel, "Legal AI Has a Surveillance Problem Before It Has a Regulation Problem," May 2026).

FirmIQ™ default capture is structured: artifact version before review, artifact version after review, structured diff, changed span locators, reviewer comments, approval state, reviewer identity and role, matter and client metadata, specialist contract version, context manifest id, and verifier results available at the time. The diff-from-feedback pattern parallels the verbal-reinforcement learning used in Reflexion (Shinn et al., 2023), with the verbal feedback replaced by structured attorney review.

Editing session duration, review sequence, comment resolution order, and clause movement are captured selectively when a specific workflow benefits, but raw keystroke surveillance is excluded by default. The signal-to-burden ratio is poor and the privacy posture conflicts with the confidentiality obligations under ABA Model Rule 1.6.

Captured by default: artifact versions, structured diff, changed spans, reviewer comments, approval/rejection state, reviewer role, scope metadata, context manifest, verifier results.
Captured selectively: editing session duration, review sequence, comment resolution order, clause movement, repeated edit clusters.
Excluded by default: raw keystroke telemetry, character-by-character behavioral surveillance, hidden monitoring, promotion from transient draft edits before approval.

§4Decision attribution

Attribute the earliest meaningful divergence, not the final document.

A trajectory intelligence extractor reads the workflow run after approval and identifies the earliest meaningful divergence between what the specialist produced and what the attorney approved. That divergence is the candidate learning event. The pattern is analogous to Reflexion's episodic reflection on task feedback (Shinn et al., 2023), narrowed to structured artifact diffs rather than free-text reflection.

Worked example: a specialist drafted a liability cap at 2x fees. The attorney changed the data-breach liability to uncapped. The attribution is not "attorney edited the draft." The attribution is "the draft failed to account for sensitive PII triggering the firm's uncapped-liability exception." The resulting candidate becomes a firm rule scoped to the firm, the client, the contract type, and the data/privacy context, with evidence captured: artifact diff, reviewer identity, approval state, context manifest id.

The extractor does not mutate active FirmIQ™ rules. It produces candidates. Promotion requires the explicit governance described in §5.

Diagram · Trajectory Intelligence Extractor

The extractor reads the completed run, attributes the earliest meaningful divergence, and produces a scoped candidate. Promotion is a separate governed decision.

Attribution is to the earliest meaningful divergence, not to the final document. A specialist drafted a liability cap at 2x fees; the attorney changed it to uncapped for data-breach exposure. The candidate is "draft failed to account for sensitive PII triggering the firm's uncapped-liability exception," scoped accordingly.

§5Promotion discipline

No attorney edit silently becomes firm doctrine.

Promotion from candidate to active firm pattern requires explicit governance. This constraint maps directly to ABA Model Rule 5.3, which obligates lawyers to ensure nonlawyer assistance (including software tools, per ABA Formal Op. 512, 2024) operates under supervision compatible with the lawyer's professional obligations. A learning system that silently promotes one attorney's edit into a firm-wide rule moves outside that supervisory boundary.

Allowed promotion sources: approved artifacts, repeated approved edits across multiple matters, validated outcome reviews, explicit operator-authored rules, and explicit inversion events. A single edit on one matter is not enough to change firm doctrine.

A rule that becomes firm doctrine because it appeared in one edit is not memory. It is leakage.

Required states: candidate → under review → approved → rejected → superseded → recalled.
Required metadata per record: firm id, scope, source artifact ids, diff references, approving actor, confidence or recurrence class, effective date, expiration or review date where appropriate.
Every promoted record is scoped, versioned, provenance-linked, and revocable.
Promotion provenance is auditable end-to-end: which artifacts and diffs led to this rule, who approved it, when, and why.

Diagram · Promotion state machine

Every candidate moves through these six states. No attorney edit silently becomes firm doctrine.

Every promoted record is scoped, versioned, provenance-linked, and revocable. Recall is a first-class event: the system can show what work product was produced under a recalled pattern and re-flag affected artifacts.

§6Bounded retrieval

Every run loads only the firm patterns eligible for the work in front of it.

FirmIQ™ does not load all firm memory into every run. Retrieval is shaped by firm, client, matter, practice area, jurisdiction, artifact type, specialist type, procedural stage, and verifier requirements. The context manifest records which records were loaded, why each was eligible, what was excluded, and which promotion or approval basis supports each loaded record.

The bounded-retrieval discipline parallels MemGPT's tiered memory management (Packer et al., 2023), which treats the model context window as a constrained resource that must be managed with explicit eligibility rules. The legal-workflow version replaces "what fits in the window" eligibility with "what is in scope for this matter under this jurisdiction" eligibility.

A firm can ask "what did this run consider, and why was each record eligible?" and get a structured answer pointing back to the manifest. A vendor that cannot answer that question does not have a memory architecture (FlowCounsel, "Why Legal AI Needs Bounded Memory, Not Bigger Prompts," April 2026).

§7Non-goals

What FirmIQ™ is not.

Stating non-goals explicitly is part of the architecture. The patterns excluded by design are the patterns most often conflated with memory in current vendor pitches.

No automatic fine-tuning of base models from attorney edits. ABA Formal Op. 512 (2024) flags the supervision and confidentiality risks; the architecture excludes the path rather than mitigating it.
No cross-firm pattern pooling. Patterns learned at one firm stay at that firm. The architectural boundary is the firm tenant.
No raw keystroke surveillance as default capture.
No silent strategy promotion without governance review.
No treating model self-reflection as proof of approved learning. RLHF-style preference learning (Christiano et al., 2017; Ouyang et al., 2022) is a useful pattern for model alignment but is not a substitute for attorney-supervised rule promotion.
No replacing attorney review with trajectory scoring.

§8Why it matters

The structural advantage compounds; the marketing-label version does not.

A legal AI product that claims "your firm gets smarter from your work" without articulating which memory class, what governance, what retrieval boundary, and what promotion discipline is making a marketing claim, not an architectural one. The two are distinguishable on a single question: can the vendor produce, on demand, a structured record of what the system considered on a specific run and why each record was eligible.

FirmIQ™ exists because that question has an answer architecturally, not just policy-wise. Memory architecture is governance applied to learning.

The structural advantage compounds over time as approved work accumulates and promotion runs through real attorney review. The marketing-label version does not compound. It conflates storage with learning, and the storage choice does not become more useful as the firm uses the system.

References

Anderson, J. R. (1996). ACT: A simple theory of complex cognition. American Psychologist, 51(4), 355–365.
American Bar Association (2024). Formal Opinion 512: Generative Artificial Intelligence Tools. Standing Committee on Ethics and Professional Responsibility, July 29, 2024. https://www.americanbar.org/news/abanews/aba-news-archives/2024/07/aba-issues-first-ethics-guidance-ai-tools/
American Bar Association. Model Rule 5.3: Responsibilities Regarding Nonlawyer Assistance. https://www.americanbar.org/groups/professional_responsibility/publications/model_rules_of_professional_conduct/rule_5_3_responsibilities_regarding_nonlawyer_assistant/
FlowCounsel (2026, April 2). Why Legal AI Memory Is a Systems Problem, Not a Prompt Problem. FlowCounsel Blog. https://flowcounsel.com/blog/why-legal-ai-memory-is-a-systems-problem-not-a-prompt-problem
FlowCounsel (2026, April 2). Why Legal AI Needs Bounded Memory, Not Bigger Prompts. FlowCounsel Blog. https://flowcounsel.com/blog/why-legal-ai-needs-bounded-memory-not-bigger-prompts
FlowCounsel (2026, May 14). Legal AI Has a Surveillance Problem Before It Has a Regulation Problem. FlowCounsel Blog. https://flowcounsel.com/blog/legal-ai-has-a-surveillance-problem-before-it-has-a-regulation-problem
FlowCounsel (2026, May 21). The Next Category in Legal AI Is Governed Execution. FlowCounsel Blog. https://flowcounsel.com/blog/governed-execution-is-the-category
Packer, C., Wooders, S., Lin, K., Fang, V., Patil, S. G., Stoica, I., & Gonzalez, J. E. (2023). MemGPT: Towards LLMs as Operating Systems. arXiv:2310.08560. https://arxiv.org/abs/2310.08560
Park, J. S., O'Brien, J. C., Cai, C. J., Morris, M. R., Liang, P., & Bernstein, M. S. (2023). Generative Agents: Interactive Simulacra of Human Behavior. arXiv:2304.03442. https://arxiv.org/abs/2304.03442
Shinn, N., Cassano, F., Berman, E., Gopinath, A., Narasimhan, K., & Yao, S. (2023). Reflexion: Language Agents with Verbal Reinforcement Learning. arXiv:2303.11366. https://arxiv.org/abs/2303.11366
Squire, L. R. (1992). Memory and the hippocampus: A synthesis from findings with rats, monkeys, and humans. Psychological Review, 99(2), 195–231.
Tulving, E. (1972). Episodic and semantic memory. In E. Tulving & W. Donaldson (Eds.), Organization of Memory (pp. 381–403). New York: Academic Press.
Wang, G., Xie, Y., Jiang, Y., Mandlekar, A., Xiao, C., Zhu, Y., Fan, L., & Anandkumar, A. (2023). Voyager: An Open-Ended Embodied Agent with Large Language Models. arXiv:2305.16291. https://arxiv.org/abs/2305.16291

How to cite this paper

FlowCounsel Research (2026). Memory class architectures for firm-scoped legal intelligence. FlowCounsel™. https://flowcounsel.com/research/memory-classes

Most legal-AI memory claims resolve to storage, not architecture.

Memory in legal work has three distinct shapes.

Document-level diffs, not character-level surveillance.

Attribute the earliest meaningful divergence, not the final document.

No attorney edit silently becomes firm doctrine.

Every run loads only the firm patterns eligible for the work in front of it.

What FirmIQ™ is not.

The structural advantage compounds; the marketing-label version does not.

Where this research connects.

Practice Areas

Platform

Resources

Company