A scoring framework to help disputes partners pick the highest-leverage, lowest-risk AI use case. Six dimensions, one page, one meeting. Start with document-heavy work.

Choosing the Right First Use Case for AI for Law Firms in Disputes Work

Part 3 of The Practitioner's Guide to AI in Disputes (10-post series).

Every failed AI adoption I have seen in a disputes practice started with the same mistake. The firm picked the wrong task. Not a bad task. The wrong first task. They chose something high-risk, low-volume, or impossible to supervise. Then the pilot failed. Then the partners said "we tried AI" and moved on.

When it comes to AI for law firms, task selection is the decision that separates successful pilots from expensive lessons. This post gives you a scoring framework for picking the right task. Three leverage dimensions. Three risk dimensions. One page. You can use it in a Monday morning discussion about what to pilot. The punchline, for those in a hurry: document-heavy sorting and structuring work scores highest in almost every disputes practice. But the framework matters more than the answer. Your practice is not identical to mine.

Why Task Selection Determines Whether Legal AI Tools Succeed or Fail

Post 1 of this series named three reasons AI adoptions stall. Post 2 mapped the economic model. This post addresses the second cause of failure: choosing the wrong starting point.

The temptation is to start with the most impressive use case. Legal research. Drafting motions. Constructing arguments. These are the tasks vendors demonstrate. They are also the tasks where AI errors are hardest to catch and most expensive to fix.

The firms seeing early results are doing something less glamorous. They are starting with tasks that have three properties. High volume. Low judgment threshold. Mandatory human review already built into the workflow. That combination is not accidental. It is the only combination where you can evaluate AI output without creating new supervision overhead.

The Scoring Framework: Leverage vs. Risk for Legal AI Software

Here is the framework. Six dimensions. Score each candidate task from 1 (low) to 5 (high).

Leverage Dimensions

Volume: How many hours does this task consume per matter?

A task that takes two hours on a small matter and forty hours on a large one scores higher than a task that takes three hours regardless. You want AI applied to the work that scales with matter size. Document sorting, chronology building, and production review all scale. Drafting a standard directions application does not.

Recall: How much does accuracy depend on finding everything?

Some tasks require completeness. Missing a relevant document in a production is worse than miscategorizing one. If the task demands finding every relevant item in a large set, AI's recall advantage over manual review is substantial. AI-assisted review tools can process thousands of documents in the time it takes a human reviewer to read dozens, while maintaining consistency that manual review cannot match (CS Disco). Score recall-dependent tasks higher.

Structuring: How much time is spent organizing rather than analyzing?

In most disputes practices, the gap between receiving documents and understanding them is filled with structuring work. Date extraction. Entity identification. Timeline assembly. Exhibit numbering. These tasks are labor-intensive and repetitive. They require attention, not judgment. Legal AI tools handle structuring work well because the output format is predictable and verifiable. Legal timeline software, for example, can extract dates and build chronologies from unstructured document sets in a fraction of the time manual assembly requires.

Risk Dimensions

Privilege exposure: Does the task touch privileged communications?

This is the hardest risk dimension and it deserves careful thought. A federal court has already ruled that documents generated through AI tools may not receive attorney-client privilege protection (Baker & Hostetler). If the task requires uploading privileged material to an external AI tool, the risk score is high. If the task processes only non-privileged documents, or if the tool runs within your firm's own environment, the risk score is lower. Post 10 of this series covers data handling in detail. For now, note this: privilege exposure is not binary. It depends on the tool's architecture, not just the task itself.

Liability if wrong: What is the consequence of an AI error?

A misclassified document in a first-pass review is caught in the second pass. A hallucinated case citation in a filed brief triggers sanctions. The difference in consequence is orders of magnitude. Score liability by asking: if the AI output is wrong and nobody catches it, what happens? If the answer is "we waste time but nothing reaches a client or tribunal," liability is low. If the answer is "we file something inaccurate," liability is high.

Supervision cost: How much partner time is needed to verify AI output?

This is the dimension most firms underestimate. AI does not eliminate review. It shifts what gets reviewed. The question is whether reviewing AI output takes less partner time than the original task took. For structuring tasks, a partner can scan a generated timeline in minutes. For drafted arguments, a partner must read every sentence. The supervision cost for structuring work is low. The supervision cost for generative drafting is nearly equivalent to the original task. ABA Formal Opinion 512 makes this explicit: the supervising attorney is responsible for the AI output (UNC Law Library). European equivalents impose comparable obligations (the CCBE's guidance on generative AI use by lawyers, the SRA Code of Conduct, and the EU AI Act all place supervision responsibility on the practitioner). Post 7 of this series covers the supervision protocol in detail.

The Framework Applied: How Five Common Disputes Tasks Score

Here is how five candidate tasks score on the framework. These scores reflect a typical midsize disputes practice handling document-intensive arbitration and litigation. Your numbers will differ. The framework is the tool. The scores are illustrations.

Task	Volume	Recall	Structuring	Privilege	Liability	Supervision	Net Score
Document sorting and classification	5	5	5	2	1	1	11
Chronology and timeline building	4	4	5	2	1	1	9
First-pass production review	5	5	3	3	2	2	6
Legal research	3	3	2	1	4	4	-1
Motion or memorial drafting	2	2	2	2	5	5	-6

Net Score = (Volume + Recall + Structuring) - (Privilege + Liability + Supervision)

Document sorting scores highest because it combines maximum leverage with minimum risk. The volume is enormous on any matter with more than a few hundred documents. Recall matters because missing a relevant document has downstream consequences. The task is almost entirely structuring. And the output is always reviewed before it goes anywhere.

Motion drafting scores lowest not because AI cannot draft. It can. But the supervision cost is nearly identical to doing the work manually. And the liability if a drafted argument contains an error that reaches a tribunal is severe. It is the wrong first task. It may be a good third or fourth task, once your team has built confidence and protocols.

Why Document-Heavy Disputes Work Is the Right Starting Point for AI for Law Firms

The scoring framework points to document-intensive work for structural reasons, not because of any particular tool's capabilities.

Disputes and arbitration practices carry a structural advantage when it comes to AI adoption. Document volumes are high. Tasks are repeatable. And the performance standard for a first pass is specific and testable: faster than your best associate, reviewable in minutes rather than days.

Roughly 86% of midsize law firms report using AI tools in some form (Clio). Adoption is broad. But the firms reporting measurable results tend to share a pattern: they started with document-intensive tasks. Sorting productions. Building chronologies. Flagging relevant passages in large bundles. These tasks consume serious associate time. They require attention, not judgment. And the output is always reviewed before it matters.

That combination of high volume, low judgment threshold, and mandatory human review is the right starting condition for any AI adoption in a disputes practice. It is also the combination where the economic case is clearest. Post 2 of this series showed that on fixed fees, every hour saved is margin gained. Document structuring tasks are where those hours accumulate fastest.

The Counterargument: Why Not Start with Legal Research?

The case for starting with legal research is not trivial. Some attorneys report significant time savings on AI-assisted research tasks. And the task is universal. Every disputes lawyer does legal research.

But the risk profile is different. AI-generated legal research can hallucinate citations. The Stanford RegLab study (Magesh et al., 2025) found that a leading legal AI platform produced hallucinations in more than 17% of queries. The supervision cost is high because every citation must be independently verified. And the consequence of an undetected error is a sanctions motion, not a misclassified document.

Legal research is a good second or third use case. Once your team has built a review protocol on lower-risk tasks, the discipline transfers. But as a starting point, the risk-to-leverage ratio is unfavorable.

There is a second counterargument worth addressing. Some firms argue that case strategy and predictive analytics should come first because they deliver the highest strategic value. But strategic tools require the largest investment in data preparation, the most sophisticated supervision, and the most mature internal protocols. They are destination tasks, not starting tasks.

How to Use This Framework to Evaluate Legal AI Tools in Practice

Print the blank framework. Sit down with your practice group. List the five tasks that consume the most associate time on a typical matter. Score each one. The conversation is more valuable than the scores.

Three things to watch for:

The supervision trap. A task that looks high-leverage on volume may have hidden supervision costs. If reviewing AI output takes as long as doing the task manually, the net score collapses. Be honest about this column.

The privilege question is task-plus-tool. The same task can have different privilege scores depending on the tool. A document classification task processed in legal AI software that runs within your firm's environment scores differently from the same task processed through a tool that sends data to external servers. Score the task-tool combination, not the task alone.

Do not let vendor demos set the agenda. Vendors demonstrate their strongest use case. That is almost always generative drafting or legal research, because those tasks are visually impressive. Neither is the right first task for most disputes practices. Use the framework to set your own priority. Then evaluate tools against that priority.

If You Read Nothing Else

Pick the task with the highest volume, the greatest recall dependency, and the most structuring time. Then verify that it has low privilege exposure, low liability if wrong, and low supervision cost. In most disputes practices, that task is document sorting and classification. Start there. Build confidence. Build protocols. Then expand. The framework is one page. The conversation takes one meeting. The decision is worth getting right because it determines whether AI becomes a permanent capability or a failed experiment in your practice.

Post 4 covers building a one-page AI usage policy. Post 7 covers the supervision protocol. Post 10 covers data handling and privilege in detail.

Looking for legal AI tools built specifically for document-heavy disputes work? Explore how Kallam AI handles document sorting, chronology building, and document structuring for arbitration and litigation teams. Or start a conversation about what your first use case should be.

Sources

CS Disco, "Judgment Day: The Rise of Artificial Intelligence in Dispute Resolution": AI can process thousands of documents in the time a human reads dozens. https://csdisco.com/blog/judgment-day-the-rise-of-artificial-intelligence-in-dispute-resolution
Baker & Hostetler LLP, "AI Is Not Your Lawyer: Federal Court Rules AI-Generated Documents Are Not Privileged": federal court ruling on privilege and AI-generated documents. https://www.bakerlaw.com/insights/ai-is-not-your-lawyer-federal-court-rules-ai-generated-documents-are-not-privileged/
UNC School of Law Library, "ABA Formal Opinion 512: The Paradigm for Generative AI in Legal Practice": summary of ABA supervision requirements. https://library.law.unc.edu/2025/02/aba-formal-opinion-512-the-paradigm-for-generative-ai-in-legal-practice/
Clio, "AI Is Reshaping How Mid-Sized Law Firms Scale": 86% of midsize firms report using AI. https://www.clio.com/about/press/ai-is-reshaping-how-mid-sized-law-firms-scale-clio-reports/
Stanford RegLab, Magesh et al. (2025): leading legal AI platform hallucinated on more than 17% of queries. https://hai.stanford.edu/news/ai-trial-legal-models-hallucinate-1-out-6-or-more-benchmarking-queries