Attorney Review Isn't Enough Anymore: The Case for a Formal AI Hallucination Audit Protocol

For the past three years, the dominant conversation about AI in legal practice has been philosophical: Should lawyers trust AI output? How much? Under what circumstances? These are reasonable questions, and they have generated a substantial body of commentary, bar guidance, and continuing education programming.

That conversation is now beside the point.

The more consequential question is actuarial. Malpractice insurers are not asking whether your attorneys believe in AI. They are asking whether your firm has written policies, documented verification steps, and named accountability for AI governance. Firms that can answer clearly are presenting as a manageable risk. Firms that cannot are beginning to see that judgment reflected in their premiums and, in some cases, their coverage terms. The philosophical debate has given way to a coverage prerequisite.

What Carriers Are Actually Asking

The shift became concrete in late 2025 and has accelerated into 2026. Insurers including the Attorneys Liability Assurance Society and several Lloyd's syndicates have introduced or materially expanded AI governance questions in renewal questionnaires. The questions are not vague. Firms are being asked whether they have a written AI use policy, whether AI-generated output is independently verified before filing or client delivery, and who holds supervisory responsibility for AI use across the firm.

Answering vaguely, or answering in the negative, flags underwriter concern. That concern has a dollar value attached to it.

This is not a surprise to anyone who has watched how malpractice markets respond to new categories of documented loss. Carriers build actuarial pictures from claims patterns. The claims pattern for AI-assisted legal work is now established enough to price. The 2023 Mata v. Avianca sanctions decision was the first prominent data point; the Rule 11 actions and disciplinary proceedings that followed through 2024 and 2025 confirmed the pattern. AI-fabricated citations are not a hypothetical failure mode. They are a documented, recurring mechanism by which plausible-looking errors travel from a model's output to a filed document when no verification step exists.

Carriers have noticed. The question for firm leadership is whether their governance practices have kept pace.

The Gap Between Review and Verified Review

The most underappreciated nuance in this conversation is the distinction between an attorney reading AI output and an attorney conducting a competent verification of that output. These are not the same thing, and several state bar ethics opinions issued through 2025 make that distinction explicit.

New York, California, and Florida have each issued guidance drawing a meaningful line between passive review and review conducted with sufficient understanding of a tool's specific limitations to actually catch errors. An attorney who reads a research memo generated by a large language model, finds the analysis plausible, and approves it for submission has performed a review. Whether that review constitutes competent supervision depends on whether the attorney understood what error types the model was likely to produce, checked the specific outputs most susceptible to hallucination, and documented that process.

From an insurer's perspective, the same distinction applies. "An attorney reviewed all AI output before filing" is not a verification protocol. It is a description of an unstructured habit. A verification protocol specifies what was checked, how it was checked, by whom, and with what result. The difference matters enormously when a claim is filed and a carrier is reconstructing what the firm's quality control actually looked like.

Firms that have not yet drawn this distinction internally should do so before their next renewal questionnaire arrives. The question is no longer whether review occurred; it is whether the review was designed to catch the specific failure modes that AI tools introduce.

What a Hallucination Audit Protocol Actually Contains

The term "hallucination audit" is more operational than it sounds. It refers to a structured, repeatable internal process for catching, logging, and learning from AI errors, applied consistently across the firm rather than ad hoc after a near-miss. Building one does not require a large investment. It requires specificity.

A functional protocol has four components. First, pre-submission checklists tied to specific AI task types. Research outputs require citation verification against primary sources. Drafted documents require comparison of any factual assertions or case quotations against the underlying record. Summarization outputs require spot-checks against the source material for completeness and distortion. The checklist is task-specific because the hallucination risks are task-specific; a single generic "AI review" step conflates genuinely different error profiles.

Second, a logging system that records when AI output was used, by whom, and what verification steps were performed. This does not need to be elaborate. A structured entry in matter management software, a completed checklist stored in the file, a confirmation field in a document workflow: these are sufficient. The goal is to create a retrievable record that answers the insurer's question directly. Some firms will build this into their existing document management infrastructure; others will layer it onto their AI platform's session logs. Either approach works as long as the record is consistent and complete.

Third, periodic sampling reviews conducted on a schedule rather than triggered only by errors. Waiting for a problem to surface before evaluating AI output quality means the firm never develops a baseline picture of how its tools actually perform on its specific work. Monthly or quarterly sampling, conducted by a practice group leader or designated reviewer, generates that baseline. It also creates the kind of documented ongoing oversight that distinguishes a serious governance program from a policy written to satisfy a questionnaire.

Fourth, a designated AI Governance Officer or committee with defined accountability for protocol updates as model versions change. This is not a ceremonial appointment. AI tools update frequently, sometimes in ways that affect error rates or output characteristics. The person or group responsible for AI governance needs to track those changes and adjust verification procedures accordingly. A protocol written for the tool your firm deployed in 2024 may not be adequate for the version running in 2026.

Where the Exposure Is Accumulating

AmLaw 100 firms have largely built AI governance infrastructure, driven by general counsel pressure and institutional risk management capacity. The picture is different further down the market.

Mid-size and boutique firms, those in the 50-to-500-attorney range, are in many cases heavier per-attorney users of third-party legal AI tools than their larger counterparts. Resource constraints make AI efficiency gains more operationally significant for a 75-attorney firm than for one with 1,500 attorneys and a dedicated knowledge management department. These firms are adopting AI quickly. They are building governance infrastructure more slowly. That gap is where malpractice exposure is accumulating.

The same firms face renewal cycles with the same carriers, the same questionnaires, and the same underwriting scrutiny. The difference is that they often lack a legal operations function with dedicated bandwidth to build a governance program, and they may not have direct access to the general counsel-level institutional knowledge that has driven governance investment at larger firms. Managing partners at these firms are often the decision-maker, the risk manager, and the supervising attorney simultaneously. The argument for a structured protocol is, for them, also an argument for a manageable process rather than an open-ended commitment.

The protocol described above is designed to be that. It is not a compliance department. It is a set of documented habits, applied consistently, that produces the paper trail a carrier needs to underwrite the risk confidently.

Platform Auditability as a Risk Criterion

Legal AI platforms vary considerably in the degree to which they support the kind of documentation a verification protocol requires. Source citations, confidence indicators, session logs, and output provenance are features that exist in some platforms and are absent or limited in others. For firms building or revising a hallucination audit protocol, this variation is not a secondary product consideration. It is a risk management criterion.

A platform that generates retrievable session logs and cites primary sources in a format that supports verification makes it significantly easier to run the protocol described above. A platform that produces output without traceable provenance places the entire verification burden on the attorney's independent research, which is slower and more susceptible to the social dynamics of deadline pressure. When evaluating or renegotiating legal AI contracts, firms should ask vendors directly: what does your platform produce that supports audit documentation? The answer will vary, and it should influence the decision.

The broader principle is that AI governance is not separable from AI procurement. The tools a firm chooses shape what its governance infrastructure can realistically accomplish. Firms that treat auditability as a criterion at the contract stage are in a materially better position than those that discover the gap when building a verification protocol after the fact.

The Deadline Is the Renewal Date

Legal risk commentary often suffers from the absence of a hard deadline. Renewal season provides one.

Firms approaching Q3 or Q4 2026 renewal cycles, which includes a substantial portion of the market, will receive underwriting questionnaires that ask directly about AI governance. The firms that can answer those questions with documented protocols, named accountability, and a traceable verification history are presenting as a well-managed risk. The firms that cannot are presenting an uncertainty that carriers will price accordingly.

Building a hallucination audit protocol is not a large project. It requires clarity about task types, a logging mechanism, a sampling schedule, and a named owner. Most firms could establish the basic architecture within a few weeks if the decision to do so is made at the managing partner level.

The philosophical question, whether to trust AI output, has been answered in practice: firms are using these tools at scale, and they are not going back. The operational question is whether that use is documented and governed well enough to present as a manageable risk to the people whose job it is to price that risk. That question has a calendar attached to it. The answer should be ready before the questionnaire arrives.