Independent review of Anthropic's Claude for Legal for disputes lawyers. Real test results, pricing analysis, data security evaluation, and what AI for law firms means for arbitration teams in 2026.

Claude for Legal: What Disputes Lawyers Need to Know [2026]

By Ahmad Gado, cofounder of Kallam AI. This is written from the perspective of someone who builds dispute document workflows for a living, and who tested Claude for Legal with real documents the week it launched. That perspective is explicit throughout.

The Announcement, and the Questions It Left Unanswered

On May 12, 2026, Anthropic published what it called "Claude for Legal," one of the most significant AI for law firms announcements this year. The short version of what I found: Claude for Legal offers genuine capabilities for certain workflows, but disputes teams handling high-volume multilingual arbitrations will encounter specific limitations in usage economics, document processing at scale, and team collaboration. Data security requires Enterprise-grade controls, and the economics of token-based pricing need careful evaluation against the volume of documents your practice handles. The details follow. Twelve practice-area plugins. Over twenty connectors linking Claude to existing legal platforms. Integration with Microsoft 365. The whole package, released as open-source software under the Apache 2.0 license, meaning firms can use, modify, and redistribute it freely for commercial purposes, hosted on GitHub (a public platform where software code is stored and shared). (Source: LawNext coverage of Claude for Legal launch)

The response was immediate. Legal technology stocks dropped. Over 20,000 people had signed up for the earlier Claude for Legal webinar. LinkedIn filled with commentary about AI reshaping the practice of law.

Most of that commentary was written by and for people who already understand what a "plugin" or "MCP connector" is. For the arbitration partner reading LinkedIn between hearings, or the mid-level associate managing a 3,000-document production in a construction dispute, the question is much simpler. What does this actually mean for my practice? What does it cost? Can I trust it with client documents? And does it actually work at the scale I need?

This blog answers those questions. It walks through what Anthropic released, what it can do, where specific gaps exist for disputes work, and what the economics look like. Where something works well, it says so. Where something does not fit a particular workflow, it describes the observation.

A note on transparency: I build Kallam, a platform designed specifically for disputes document workflows. I will be explicit about where Claude for Legal overlaps with what we do and where the differences are. You can weigh that accordingly.

What Anthropic Actually Released

To understand Claude for Legal, it helps to understand three separate things that are often conflated in coverage of this launch.

Claude Cowork is the application. It is Anthropic's collaborative workspace product where users interact with Claude through a chat-and-document interface. Think of it as the environment where work happens.

Inside Claude Cowork, users select which model powers their session. The available models include Claude Sonnet 4.6, Claude Opus 4.6, and Claude Opus 4.7, each with different capabilities, speeds, and costs. The model is the reasoning engine. It is what processes your text, interprets your documents, and generates responses.

Claude for Legal is neither the application nor the model. It is a collection of twelve practice-area plugins and over twenty MCP connectors that users can add to their Claude Cowork workspace. These plugins and connectors provide legal-specific instruction sets and connect Claude to external legal platforms. They do not change the underlying model. They layer legal workflow structure on top of whichever model the user has selected.

Here is what the Claude for Legal collection actually contains:

Twelve practice-area plugins. Each plugin is a packaged set of instructions that tell Claude how to approach a specific type of legal work. The twelve cover commercial law, corporate law, employment law, privacy, IP, litigation, regulatory compliance, AI governance, and a few others including a law student plugin, a legal clinic plugin, and a legal builder hub. For disputes lawyers, the litigation plugin is the most relevant. It includes capabilities for chronology building from document sets, deposition preparation, first-pass privilege log review, claim chart construction, and brief section drafting. (Source: LawNext Claude for Legal analysis; MindStudio MCP analysis)

Over twenty MCP connectors. These link Claude to external legal platforms. The list includes Relativity, iManage, Westlaw (via Thomson Reuters CoCounsel), LexisNexis, Everlaw, Consilio, CourtListener, Ironclad, DocuSign, Box, NetDocuments, Datasite, Midpage, Trellis, Legal Data Hunter, Slack, Google Drive, and Microsoft 365. (Source: LegalTechnology.com analysis)

Microsoft 365 integration. Claude now works inside Word, Outlook, Excel, and PowerPoint through an add-in.

Open source. The entire codebase is publicly available. Firms with technical teams can download it, modify it, and adapt it to their own workflows without licensing restrictions.

Managed Agents API. For scheduled background tasks that run without manual intervention, such as monitoring dockets, tracking regulatory changes, or watching contract renewal deadlines.

This is available to all paid Claude customers: Pro at $20 per month, Max at $100 or $200 per month, Team plans, and Enterprise. Enterprise admins can enable specific plugins and connectors in workspace settings.

What Plugins, Skills, and MCP Connectors Actually Mean

Three terms keep appearing in every discussion of Claude for Legal. Here is what they mean in plain language.

A plugin is a briefing pack for Claude. Think of it like onboarding a new associate to your practice area. You hand them your firm's playbook, your standard checklists, your house style guide, and examples of good work product. A plugin does the same thing for Claude. It tells the model how to approach a specific type of legal work. The litigation plugin, for example, instructs Claude on how to build chronologies, structure deposition outlines, and organize privilege logs.

Each plugin includes what Anthropic calls a "cold-start interview," a 10-to-20-minute guided setup where the lawyer provides examples of their own work: signed contracts, playbooks, escalation matrices, preferred formats. Claude uses these to create a practice profile that all instructions within that plugin reference. Skip this step and you get generic output. The GitHub README explicitly warns that this is the single most common reason a skill produces boilerplate results.

A skill is a specific instruction card within a plugin. If the plugin is the playbook, a skill is a single page of that playbook covering one task. Slash commands are typed commands that activate specific skills. When you type /litigation-legal:chronology into Claude, you are telling it to read and follow a particular set of instructions for building a chronology. When you type /corporate-legal:tabular-review, you are asking it to follow instructions for producing multi-sheet Excel workbooks with a sources sheet. Skills are plain-text instruction files. They do not add legal knowledge to Claude. They do not fine-tune the model. They do not give Claude access to case law databases. They simply tell Claude how to structure its behavior and format its outputs for a particular task. The quality of the output depends entirely on how well the selected model follows those instructions, and as I will discuss, instruction-following varies depending on which model you choose and the complexity of the task.

An MCP connector (Model Context Protocol) is a cable connecting Claude to your existing software. MCP is an open standard that lets Claude read from and write to external systems. A connector to iManage means Claude can pull documents from your document management system. A connector to Everlaw means Claude can access your eDiscovery workspace. A connector to CourtListener means Claude can look up public US federal court dockets.

For disputes lawyers, the relevant connectors are Everlaw, Relativity, iManage, CourtListener, Thomson Reuters CoCounsel for Westlaw research, and Definely for in-document drafting.

Here is the important caveat. These connectors work within Claude's ecosystem. If you are not already using these platforms, the connectors add nothing. Most small and mid-size disputes practices handling construction, energy, or infrastructure disputes are not using Relativity or Everlaw. They are working with shared drives, email attachments, and Excel spreadsheets. The connectors are valuable for large litigation shops that already have an integrated legal technology stack. For firms not on those platforms, the connectors are not relevant today.

What Claude for Legal Can Do for Disputes Work

The litigation plugin capabilities are worth walking through, because this is where the legal AI tools in Claude for Legal are most relevant to disputes practitioners.

The /litigation-legal:chronology skill reconstructs a timeline from declared sources. You can feed Claude a set of documents and ask it to extract events, dates, and references into a chronological structure. For a partner preparing hearing submissions or a mid-level building a statement of case, this is genuinely useful functionality.

The /litigation-legal:claim-chart skill builds element-by-element claim charts with citation columns. This is most directly relevant for patent matters and structured civil cases, though the format can be adapted for arbitration claim schedules.

The /litigation-legal:deposition-prep skill prepares testimony outlines aligned with case theory. The /corporate-legal:tabular-review skill writes multi-sheet Excel workbooks with a sources sheet, useful for structured document analysis and due diligence.

There is also a docket watcher, a managed agent that monitors opposing filings in real time, and a privilege log review capability for first-pass screening with criticality flags.

These are genuinely useful capabilities for the right use case. In-house counsel reviewing standard NDAs, corporate lawyers doing compliance checks, US litigators tracking federal docket activity: the Claude for Legal plugins offer real value here. The performance of these plugins depends on which model the user selects within Claude Cowork. Claude Opus 4.7, Anthropic's most capable generally available model, scores 90.9% on the BigLaw Bench legal reasoning benchmark, with 45% of tasks receiving perfect scores. Its predecessor, Opus 4.6, achieved 90.2% with 40% perfect scores. (Source: Harvey AI, "Opus 4.7, Now Live in Harvey"; Anthropic, Claude Opus 4.6 announcement, quoting Harvey's Head of AI Research) The Sonnet 4.6 model, which is faster but less capable on complex reasoning tasks, is what many users will interact with on lower-tier plans where speed matters more than peak accuracy.

The question is what happens when you apply these plugins to the workflow of a disputes team handling a multilingual, document-heavy arbitration.

Testing Claude for Legal with Real Documents

I tested the Claude for Legal plugins with my Claude Max subscription ($100 per month, the 5x plan). Here is what I did.

I downloaded the plugins from GitHub, zipped each one individually, and added them to Claude Cowork. The setup process involves several steps. First, you navigate to the GitHub repository and download the code package. Then you study the file structure to identify which folders correspond to which plugins. Finally, you compress each plugin folder into a zip file and upload it into Claude Cowork. A firm with a technical support team can handle this in little time.

I then ran a tabular review using Sonnet 4.6 as the model on 28 small documents, approximately 60 pages total. This is an important detail: I used Sonnet 4.6, not Opus. Sonnet is faster and cheaper per token, but it is not the same model that scores 90.9% on BigLaw Bench. That benchmark score belongs to Opus 4.7. The model you select in Claude Cowork directly affects the quality, cost, and speed of the output. These were public documents. I asked Claude to extract the contextual date, title, summary, and document category for each document. This is essentially what Kallam does automatically when you upload a document set.

That single tabular review consumed 12% of my 5-hour usage window.

On the Pro plan at $20 per month, which provides roughly one-fifth of the Max allowance, the same task would consume approximately 60% of the 5-hour window. One review of 60 pages, and you have nearly exhausted your session.

I then ran a second, much smaller query extracting information from just 2 of those 28 documents. That consumed another 4% of my window.

What this means for a real arbitration matter: a mid-size construction arbitration might involve 500 to 5,000 documents. If 60 pages consumed 12% of a Max window, processing even a modest 500-document case would require dozens of sessions spread across multiple days, with constant waiting for usage windows to reset. On the Pro plan, processing an arbitration-scale document set within the usage limits would not be feasible.

The output included a structured spreadsheet with extracted dates, titles, summaries, and document categories for each of the 28 documents. It did not include clickable citations linking each extracted fact to the specific page in the source document. It did not include a side-by-side view of the source alongside the extracted data. It did not provide a way to verify a date or title by clicking through to the original page. In disputes workflows, traceability to source pages is a standard requirement. The current output format does not provide this.

How Usage Limits Affect Disputes Workflows

A point of clarification on how Claude's usage works, because this directly affects disputes practitioners evaluating legal AI software for their teams.

Anthropic does not publish exact message counts for its subscription tiers. The Pro plan provides approximately 5x the free-tier usage. The Max 5x plan provides 5x the Pro allowance. Usage is measured in tokens, which are the units that language models use to process text. Roughly speaking, one token equals about three-quarters of a word. A single message that includes a large document attachment can consume 10x or more tokens than a short conversational query. This means that document-heavy workflows deplete usage windows far faster than general question-and-answer conversations.

The experiment results above are the most concrete data point available: 60 pages of documents consumed 12% of the Max 5x window in a single operation.

Rate limits have been a persistent discussion point in the Claude user community. In early 2026, users on the Max 5x plan reported that their sessions were depleting significantly faster than expected when running document-intensive or code-intensive workflows. Community forums documented numerous accounts of this experience. (Source: Reddit r/ClaudeAI rate limit discussion; Reddit r/Anthropic user feedback thread) Anthropic acknowledged the issue. The company identified that a prompt caching implementation was, in some cases, inflating token consumption beyond what users anticipated.

On May 6, 2026, Anthropic announced changes in response. The company doubled Claude Code five-hour limits for Pro, Max, Team, and Enterprise plans, and removed peak-hour usage reductions for Pro and Max accounts. Alongside this, Anthropic announced a compute partnership with SpaceX, gaining access to more than 300 megawatts of new GPU capacity (over 220,000 NVIDIA GPUs) through SpaceX's Colossus 1 data center. The company has publicly described itself as compute-constrained, noting that new data center capacity takes 12 to 24 months to come online. (Source: Anthropic announcement: "Higher usage limits for Claude and a compute deal with SpaceX")

This is relevant context for disputes teams evaluating Claude for Legal. Usage limits are not static. They have been increasing and will likely continue to increase as Anthropic adds compute capacity. At the same time, document-intensive legal work consumes tokens at a rate that general-purpose subscribers may not encounter, and the economics of that consumption should be evaluated against the specific volume of documents your practice handles.

OCR, Multilingual Processing, and PDF Page Limits

Claude can process PDFs and has vision capabilities that function like OCR (optical character recognition, the technology that converts images of text into machine-readable text). It can read text from images, interpret tables, and understand visual layouts. For single documents or small batches of clean English-language PDFs, this works well.

The 100-Page Visual Analysis Threshold

However, the page limits involve three distinct thresholds that are often conflated. Anthropic's help center states that Claude can analyze "both text and visual elements (like images, charts, and graphics) in PDFs that are under 100 pages." For PDFs exceeding 100 pages, Claude switches to text-only processing, meaning it can no longer interpret images, charts, tables rendered as graphics, or page layouts. It processes only the embedded text layer. For PDFs exceeding 1,000 pages, Claude will not process the file at all. (Source: Anthropic help center, "Upload files to Claude")

Separately, the API documentation lists a maximum of 600 pages (or images) per API request for models with 1-million-token context windows, and 100 pages per request for models with 200,000-token context windows. This 600-page figure is the total media count across all attachments in a single API request, not a per-file visual analysis limit. It is the ceiling on how many page-images the API will accept in one call. The maximum request payload size is 32 MB. Each page processed visually consumes approximately 1,500 to 3,000 tokens depending on content density. (Source: Anthropic PDF support documentation; Anthropic vision documentation)

For disputes workflows involving multilingual scanned documents at scale, the picture changes.

Scanned PDF quality is variable at scale. Claude's documentation describes scanned PDF processing as inconsistent. Quality degrades with faint text, skewed pages, stylized fonts, compression artifacts, and multi-column layouts. Arbitration document sets frequently include poor-quality photocopies, fax transmissions, and handwritten annotations. These are the same documents that challenge human reviewers, and Claude encounters the same difficulties.

The 100-page visual analysis threshold has a specific implication for scanned document bundles. Claude's ability to interpret visual elements, including reading text from scanned images of pages, depends on its vision capabilities. For a scanned PDF under 100 pages, Claude processes each page as both an image and extracted text, which allows it to read text that exists only as an image (such as a photographed or scanned page with no embedded text layer). Beyond 100 pages, Claude falls back to text-only processing. For scanned documents that lack an embedded text layer, this means Claude receives no usable content from pages beyond the 100-page threshold. A 300-page scanned exhibit bundle with no OCR text layer would yield visual analysis of the first 100 pages and nothing from the remaining 200. In arbitration, document bundles routinely exceed 100 pages.

Each page processed visually costs approximately 1,500 to 3,000 tokens. For a scanned document bundle where visual processing is available (under 100 pages), processing a 90-page bundle could consume 135,000 to 270,000 tokens before any analysis begins. On subscription plans, this has a direct impact on usage window consumption.

No Automated Processing Pipeline

There is no automated processing pipeline. Claude processes documents one at a time in conversation. There is no batch upload, no automatic splitting of large bundles, no parallel processing, no error recovery for failed pages. For a document production of 2,000 files, each file must be handled individually.

Claude's document processing capabilities are strong for the use cases they were designed for. Reliable, high-volume, multilingual document processing at arbitration scale requires additional infrastructure: OCR engines optimized for Arabic, French, and English, batch processing pipelines, error handling, and cost optimization. The Claude for Legal plugins were not designed to provide this infrastructure. They were designed to provide legal reasoning instructions and workflow structure on top of Claude's general capabilities, which are different things.

How Context Windows Affect Document Processing

There is a technical concept that matters for anyone planning to load large document sets into an AI model, and it applies to all large language models, not just Claude.

A context window is the total amount of text a model can hold in its working memory at one time. The three current Claude models used in Claude Cowork (Opus 4.7, Opus 4.6, and Sonnet 4.6) each have 1-million-token context windows, roughly equivalent to 750,000 words. Claude Haiku 4.5, the fastest and cheapest model, has a 200,000-token context window. One technical note: Opus 4.7 uses a newer tokenizer than the 4.6 models, which means the same English text may produce up to 35% more tokens on Opus 4.7. In practice, this means a given document consumes a larger share of the context window on Opus 4.7 than on Sonnet 4.6 or Opus 4.6. (Source: Anthropic models overview; Anthropic context windows documentation)

A large context window does not mean that everything placed inside it receives equal attention. Research shows that performance degrades as the window fills, even before it reaches its nominal limit.

Chroma, an AI research company, published a study in July 2025 titled "Context Rot: How Increasing Input Tokens Impacts LLM Performance." The study tested 18 large language models, including models from the Claude, GPT, Gemini, and Qwen families. The central finding was that the common assumption that models process all content in their context window uniformly is incorrect. As the number of input tokens increases, accuracy on retrieval tasks, position accuracy for target information, and hallucination rates all degrade progressively. The degradation is not uniform across models, but it was present in every model tested. (Source: Chroma "Context Rot" study, July 2025)

A related and earlier finding comes from a paper published in Transactions of the Association for Computational Linguistics (TACL) in 2024. Researchers from Stanford and Anthropic found what they called the "lost in the middle" effect: language models attend well to information positioned near the beginning and end of the context, but performance degrades significantly for information positioned in the middle. This held true even for models explicitly designed for long contexts. (Source: Stanford/Anthropic "Lost in the Middle" paper, TACL 2024)

Think of it as giving a junior associate a stack of 50 files to review. The associate reads the first five and last five carefully, but skims the 40 in the middle. The associate may still produce useful work, but you would not rely on their review of file number 27 without checking it yourself.

Practical Implications for Legal Document Review

What this means for legal document processing: loading a full arbitration document set into a single context window does not guarantee that the model will retrieve information from all documents with equal reliability. The documents loaded first and last receive more reliable attention than those in the middle. This is an architectural characteristic of the transformer models that power Claude, GPT, Gemini, and all other current large language models. It is not a bug, and it is not Claude-specific. It is a design reality that any legal workflow relying on AI for document review must account for.

Context engineering, the discipline of managing what information is in the model's working memory at any given time, matters as much as the quality of the model itself. When conversations grow long enough to approach the context window boundary, Anthropic offers a feature called server-side compaction, which automatically condenses earlier parts of the conversation into summaries to free up space. This allows conversations to continue, but the condensation means older details may be summarized rather than preserved verbatim. Purpose-built document processing systems take a different approach, breaking large document sets into manageable batches, processing them individually, and aggregating the results. A general-purpose chat interface may not do this automatically. (Source: Anthropic compaction documentation)

Why Skills Are Instructions, Not Guarantees

This section addresses a technical reality that most coverage of Claude for Legal does not discuss, and it is relevant to any AI legal assistant or legal AI tool that relies on instruction-following.

Skills are plain-text instruction files. They tell Claude how to behave. But large language models do not follow instructions with 100% reliability. This is not a bug in Claude specifically. It is a documented characteristic of all large language models, and understanding it matters for anyone relying on AI output in professional legal work.

IBM Research's work on instruction-following at scale found that adherence rates degrade as the number of instructions increases. The IFEval++ benchmark from December 2025 found that current models can experience performance drops of up to 61.8% with nuanced prompt modifications. OpenAI's "Instruction Hierarchy" research documented that large language models often treat system-level instructions (like skills) with the same priority as user messages, meaning the model may override the skill's instructions based on what the user asks.

What this means practically: when you invoke /litigation-legal:chronology, Claude reads the skill's instruction file and attempts to follow it. It may skip steps, reinterpret instructions, produce outputs in a different format than specified, or ignore certain checklist items. The more complex the task and the longer the conversation, the more likely drift becomes. This is consistent with the context rot research described above: as more content fills the context window, the model's adherence to its original instructions also degrades.

Mark Pike, Anthropic's associate general counsel and the product lead for Claude for Legal, has himself stated that the plugins provided are "just the start" and need to be tailored to your practice. (Source: CIOLanding Claude for Legal analysis) The GitHub README warns that skipping the cold-start interview is the single most common reason a skill produces generic output.

The Engineering Work Behind Reliable Legal AI

Building reliable legal AI workflows where Claude consistently follows complex multi-step instructions across hundreds of documents requires what the AI engineering community calls agentic engineering. This is not a matter of downloading instruction files from GitHub. It is serious software engineering work. Prompt chaining breaks complex tasks into sequences of smaller, more reliable steps. Multi-step tool orchestration coordinates Claude's interactions with external systems in a defined order. Deterministic output validation programmatically checks that every output meets structural and factual requirements before it reaches a user.

Hallucination guardrails must be built to catch cases where Claude fabricates citations, misattributes dates, or invents parties that do not appear in the source documents. Citation verification pipelines must confirm that every claim in a generated chronology or brief actually traces back to a specific page in a specific document. Regression test suites must verify that changes to prompts or model updates do not silently degrade output quality on previously working tasks.

The Cold-Start and Evaluation Problems

There is the cold-start problem: each new conversation in Claude Cowork begins with an empty context window and no memory of previous sessions. A workflow that requires building on yesterday's analysis must be manually reconstructed or require custom infrastructure to persist state across sessions. There is the evaluation problem: how do you measure, at scale, whether a legal AI output is correct? A human reviewer can check one chronology, but a firm processing 50 matters simultaneously needs automated quality measurement, and defining "correct" for legal output is itself a difficult problem. And there is the agent loop failure problem: when an AI agent is given a multi-step task, each step can compound errors from previous steps. A misidentified date in step one becomes a misplaced event in the chronology in step two, which becomes a flawed narrative in the brief draft in step three. Without validation gates between steps, these errors propagate silently.

The skills that Anthropic has released are a starting point for this work. Firms that invest in customizing and testing them will get better results than those that use them as-is. Firms without technical teams will need to evaluate whether that investment is practical, or whether a platform that has already done this engineering work is a more efficient path.

Data Security and Compliance for Law Firms Using AI

For any law firm considering Claude for Legal or any other legal AI software, data security is not optional. It is an ethical obligation. Here are the facts as of May 2026.

Where Is Data Processed?

Anthropic stores and processes data in the United States. There is no EU data residency option for Claude Cowork or claude.ai. Claude Code CLI can route through AWS Bedrock in Frankfurt or Google Vertex AI in Belgium for EU-resident processing, but Claude Code is a developer tool with none of the user interface and collaboration features of Cowork. GitHub issues 40526 and 40530, both requesting EU data residency support for Cowork, remain open and unresolved. (Source: GitHub issue 40526; GitHub issue 40530; Compound.law EU analysis)

GDPR and European Data Residency

Anthropic provides a Data Processing Addendum with EU Standard Contractual Clauses for commercial customers. However, the DPA alone does not satisfy the stricter data residency requirements of many European firms, particularly in regulated industries where data must be processed within EU territory. (Source: Prompt-guide GDPR analysis)

Consumer Plans, Team Plans, and Enterprise: What Protects Client Data?

On consumer plans (Free, Pro, Max), Anthropic's October 2025 policy update shifted to an opt-in model for AI training data, but with data usage set to "on" by default for existing users who did not manually adjust the setting. If a user has opted in, their data can be retained for up to five years. If they have opted out, retention is 30 days. No DPA is provided by default on these plans. (Source: Anthropic consumer terms update, October 2025; Bitdefender privacy analysis)

The Heppner ruling matters here. In United States v. Heppner, decided in the Southern District of New York on February 10, 2026, with a written memorandum on February 17, 2026, Judge Jed S. Rakoff found that communications with a public AI platform like Claude were not protected by attorney-client privilege. The court's reasoning was direct: Claude is not an attorney and cannot form a fiduciary relationship. Anthropic's privacy policy at the time allowed user inputs to be retained, used for model training, and potentially disclosed to third parties. There was no reasonable expectation of confidentiality. The court treated disclosure to a publicly available AI platform as equivalent to disclosure to any third party, which is a well-established basis for privilege waiver. The ruling also noted that documents created using AI without direction from legal counsel do not become privileged merely because they are later shared with an attorney. (Source: Husch Blackwell analysis; Venable analysis; Chapman analysis; Perkins Coie analysis)

European bar authorities have also addressed this issue. The UK Solicitors Regulation Authority published a risk outlook report on AI in the legal market in November 2023, warning that staff inputting confidential case details into public AI systems poses a direct confidentiality threat, and that transferring client data to AI providers for model training creates additional exposure. (Source: SRA, "The use of artificial intelligence in the legal market," November 2023)

The Team Standard plan, at $25 per seat per month ($20 annual, minimum 5 users), commits to not using customer content for model training and includes SSO, central billing, admin controls for connectors, and enterprise desktop deployment. It does not include SCIM, audit logs, custom data retention controls, or a HIPAA-ready offering. Those features are available on the Enterprise plan. Firms evaluating Team Standard for privileged legal work should note this distinction carefully: SSO provides identity management, but the absence of audit logs, custom data retention, and HIPAA readiness means the compliance tooling available at the Team tier is limited compared to Enterprise. Data is still processed in the US. (Source: Anthropic pricing page, verified May 2026)

The Team Premium plan, at $125 per seat per month ($100 annual), provides the same security features as Team Standard with 5x more usage than standard seats and includes Claude Code access.

Enterprise-Grade Controls for Privileged Legal Work

The Enterprise plan is available in both self-serve and sales-assisted configurations. The self-serve Enterprise plan is priced at $20 per seat per month, billed annually, with all usage charged at standard API rates on top. It includes everything in Team plus role-based access with fine-grained permissions, SCIM, audit logs, Compliance API for monitoring, custom data retention controls, network-level access controls, IP allowlisting, HIPAA-ready offering, and the option for Zero Data Retention. The sales-assisted Enterprise plan adds custom agreements (MSA), purchase orders, and usage commitments. It carries SOC 2 Type II, ISO 27001, and ISO 42001 certifications. (Source: Anthropic pricing page) This is the tier designed for privileged legal work, but data still processes in the US with no EU data residency option for Cowork.

The Heppner ruling is a precedent that every firm considering AI for law firms should evaluate. If your firm enters privileged client material into a consumer-grade AI tool that may use data for training or allows the provider's employees to review conversations for safety purposes, a court could find that privilege has been waived. Enterprise-grade Zero Data Retention is the mechanism designed to prevent this, but it requires the Enterprise plan, US-based processing, and a direct contract with Anthropic.

Pricing Economics for Disputes Teams

Understanding the cost structure matters, because disputes work is volume-intensive and Claude's pricing is consumption-based.

Pro ($20/month): Approximately 5x the free-tier usage per 5-hour window. For disputes work, this plan is not designed for sustained document processing.
Max 5x ($100/month): 5x the Pro allowance. My experiment consumed 12% of the window for 60 pages.
Max 20x ($200/month): 20x Pro. More headroom, but still a rolling 5-hour window with resets.
Team Standard ($25/seat/month, $20 annual, minimum 5 seats): 1.25x more usage than Pro per session. For a 5-person disputes team, that is $125 per month at minimum, with limited usage per person.
Team Premium ($125/seat/month, $100 annual): 5x more usage than Team Standard seats. Includes Claude Code. For a 5-person team, $625 per month.
Enterprise ($20/seat/month base, plus usage-based API billing): The actual cost depends heavily on volume. API rates (application programming interface, the technical method for connecting software to Claude directly) for Sonnet 4.6 are $3 per million input tokens and $15 per million output tokens. Opus 4.6 and Opus 4.7 both cost $5 per million input tokens and $25 per million output tokens. (Source: Anthropic models overview and pricing; Anthropic pricing page)

The economic question for disputes teams is whether the per-interaction pricing model works for document-heavy practices where volume is measured in thousands of pages, not dozens. A firm processing 500 documents might need multiple sessions across several days on the Max plan, or a significant monthly API spend on the Enterprise plan. These costs should be compared against purpose-built alternatives that price for volume.

Why Team Collaboration Is Missing from Claude Cowork

Claude Cowork runs on each person's local machine. There is no shared workspace. There is no built-in mechanism to share document sets across team members, see what a colleague has already reviewed, build a cumulative chronology that the whole team contributes to, assign document review tasks, or track progress on a shared matter.

Disputes work typically involves teams. A construction arbitration involves a lead partner, one or two senior associates, a junior or two doing document preparation, and often a paralegal managing the exhibit bundles. These people need to see the same documents, build on each other's work, and know what has already been reviewed and what remains.

In Claude Cowork's current design, each user's work exists in their own workspace. If three associates each build a partial chronology from different document sets, merging those chronologies requires manual effort outside the platform. If a partner wants to see the current state of a document review, there is no dashboard or progress view within Claude Cowork itself. Anthropic may add team collaboration features in future updates, but as of May 2026, this is the current state.

How This Compares to Purpose-Built Disputes Platforms

At this point, the comparison to purpose-built platforms is worth stating directly. I build Kallam, so I will be transparent about what it does and you can evaluate the claims on their merits.

Kallam was built from the ground up for disputes document workflows. When you upload a document set, the platform automatically structures every document: extracting dates, titles, parties, and document types without prompting. The OCR pipeline is optimized for Arabic, English, and French, with dedicated engines for each language and error correction built into the processing chain. Every extracted fact links to the exact page in the source document with a clickable citation. You can view the source alongside the extraction, side by side, and verify any claim against the original page.

Teams work on shared cases. Multiple users see the same document set, contribute to the same chronology, and track progress on the same matter. The platform includes a chronology builder, semantic search, question-and-answer with citations, exhibit table generation, tabular extraction, and a Word citation plugin that lets you cite source pages directly from Microsoft Word while drafting submissions.

The platform is hosted and processes data on Microsoft Azure within the European Union. Pricing is structured for volume: thousands of pages, not dozens.

These are architectural differences. Claude for Legal is a collection of plugins and connectors that add legal workflow instructions to Claude Cowork, a general-purpose AI application powered by whichever model the user selects. Kallam is a purpose-built system where the document processing pipeline, the OCR, the citation system, the collaboration layer, and the cost model were all designed for one type of work: disputes.

Both approaches serve different needs. Claude's underlying models are among the most capable language models available, and the Claude for Legal plugins give users the flexibility to apply that capability across dozens of practice areas with the ability to customize through open-source code. Kallam gives you a dedicated system that handles the specific, high-volume, multilingual, team-based document workflows that define disputes practice.

What Disputes Lawyers Should Take Away

If you read nothing else, here is what matters.

Claude for Legal is real, and it is useful. The underlying models are strong: Opus 4.7 at 90.9% on BigLaw Bench represents genuine legal reasoning capability, and even Sonnet 4.6 provides useful results for many tasks. The litigation plugin provides genuine value for chronology building, deposition preparation, and structured document review in small-to-moderate batches. For in-house teams doing contract review, compliance checks, or US docket monitoring, this is a meaningful step forward.
For high-volume disputes work, the current usage limits and absence of batch processing make it impractical at arbitration scale today. A 500-document case would require multiple sessions over several days and produce output without source-linked citations. Anthropic is actively expanding compute capacity and increasing usage limits, so this may change over time.
Data security requires evaluation before adoption. The Heppner ruling is a precedent that every firm should assess. Consumer-grade plans do not provide the safeguards that privileged legal work requires. Team Standard includes SSO but does not include SCIM, audit logs, custom data retention, or HIPAA readiness. Enterprise-grade Zero Data Retention, along with audit logs, SCIM, and custom data retention controls, remains the tier designed specifically for privileged legal work, and data processes in the US with no EU data residency option for Claude Cowork.
Skills are a starting point. Instruction-following is probabilistic, not deterministic. Building reliable legal AI workflows requires serious agentic engineering: prompt chaining, output validation, citation verification, regression testing, and solutions for the cold-start and evaluation problems. This goes well beyond downloading instruction files from GitHub. Firms that invest in customization and testing will get better results.
Context window size is not the same as context window reliability. All three current Claude models (Opus 4.7, Opus 4.6, and Sonnet 4.6) offer 1-million-token context windows, but research shows that all current language models degrade as context fills up, particularly for information positioned in the middle of the input. Document processing workflows need to account for this through careful context engineering.
Purpose-built platforms exist for specific reasons. General-purpose AI models are powerful, but disputes-specific requirements, including multilingual OCR, source-page citations, team collaboration, and volume pricing, require dedicated architecture. The right tool depends on the problem you are solving.

If your practice handles document-heavy arbitrations and you want to see what a purpose-built alternative looks like, Kallam offers a walkthrough for disputes teams. Explore the platform or reach out directly.

Writer Notes

Bibliography of sources cited in this draft:

Harvey AI, "Opus 4.7, Now Live in Harvey" (BigLaw Bench: Opus 4.7 = 90.9%, 45% perfect scores): https://www.harvey.ai/blog/opus-4-7-now-live-in-harvey
Anthropic, Claude Opus 4.6 announcement (BigLaw Bench: Opus 4.6 = 90.2%, 40% perfect scores, per Harvey's Head of AI Research Niko Grupen): https://www.anthropic.com/news/claude-opus-4-6
Anthropic PDF support documentation (API limit: 600 pages per request for 1M-context models, 100 for 200K-context models; 1,500-3,000 tokens/page; 32MB max request size): https://platform.claude.com/docs/en/build-with-claude/pdf-support
Anthropic context windows documentation (model-specific context window sizes, compaction, context awareness): https://platform.claude.com/docs/en/build-with-claude/context-windows
Anthropic compaction documentation (server-side context compaction, beta feature): https://platform.claude.com/docs/en/build-with-claude/compaction
Anthropic models overview (context windows, pricing, model comparison table, tokenizer differences): https://platform.claude.com/docs/en/about-claude/models/overview
Husch Blackwell, Heppner privilege waiver analysis: https://huschblackwell.com/newsandinsights/heppner-v-claude-the-first-privilege-waiver-by-ai-rulingwhat-lawyers-and-clients-must-know
Venable, Heppner ruling analysis: https://venable.com/insights/publications/2026/02/ai-privilege-and-the-heppner-ruling-what-the-court
Chapman, federal court ruling on AI-generated privilege: https://chapman.com/publication-federal-court-rules-that-ai-generated-documents-are-not-protected-by-privilege
Perkins Coie, Heppner and Gilbarco privilege analysis: https://perkinscoie.com/insights/update/heppner-and-gilbarco-courts-apply-privilege-and-work-product-protection-generative
Anthropic consumer terms update, October 2025 (training data opt-in policy): https://anthropic.com/news/updates-to-our-consumer-terms
Bitdefender, Anthropic privacy policy analysis: https://bitdefender.com/en-us/blog/hotforsecurity/anthropic-shifts-privacy-stance-lets-users-share-data-for-ai-training
Anthropic pricing page (Team Standard, Team Premium, Enterprise features and pricing): https://claude.com/pricing
LawNext, Claude for Legal launch coverage (12 plugins, 20+ connectors): https://lawnext.com/2026/05/anthropic-goes-all-in-on-legal-releasing-more-than-20-connectors-and-12-practice-area-plugins-for-claude.html
MindStudio, Claude for Legal MCP connector analysis: https://mindstudio.ai/blog/claude-legal-mcp-connectors-contract-review-compliance
LegalTechnology.com, Claude for Legal industry analysis: https://legaltechnology.com/claude-for-legal-what-the-industry-needs-to-know
CIOLanding, Mark Pike statements and Claude for Legal analysis: https://ciolanding.com/claude-for-legal-what-law-firms-need-to-know
IBM Research, "Boosting Instruction Following at Scale" (2025); IFEval++ benchmark (December 2025); OpenAI "Instruction Hierarchy" paper
Chroma, "Context Rot: How Increasing Input Tokens Impacts LLM Performance" (July 2025): https://www.trychroma.com/research/context-rot
Stanford/Anthropic, "Lost in the Middle" (TACL 2024): https://arxiv.org/abs/2307.03172
Anthropic, "Higher usage limits for Claude and a compute deal with SpaceX" (May 6, 2026): https://www.anthropic.com/news/higher-limits-spacex
Reddit r/ClaudeAI, rate limit user discussions: https://reddit.com/r/ClaudeAI/comments/1htuxni/hitting_claude_limits_almost_immediately_its
Reddit r/Anthropic, user feedback on rate limits: https://reddit.com/r/Anthropic/comments/1sla14y/anthropic_faces_user_backlash_over_reported
GitHub issues on EU data residency: https://github.com/anthropics/claude-code/issues/40526 and https://github.com/anthropics/claude-code/issues/40530
Compound.law, Claude Enterprise EU analysis: https://compound.law/en-DE/tools/claude-enterprise
Prompt-guide, Claude Cowork GDPR analysis: https://prompt-guide.com/en/blog/claude-cowork-securite-rgpd
Anthropic help center, "Upload files to Claude" (visual analysis under 100 pages, text-only over 1,000 pages): https://support.claude.com/en/articles/8241126-upload-files-to-claude
Anthropic vision documentation (image/page limits per request: 600 for 1M-context models, 100 for 200K-context models): https://platform.claude.com/docs/en/build-with-claude/vision
SRA, "The use of artificial intelligence in the legal market" (November 2023, confidentiality risks of AI in legal practice): https://www.sra.org.uk/sra/research-publications/artificial-intelligence-legal-market/