Skip to main content

TAR vs. Keyword Search:
Which Actually Saves More Money?

Technology-assisted review promises massive cost savings over traditional keyword culling — and it delivers, but not always and not automatically. Here's an honest, scenario-by-scenario comparison so you can make the right call for your matter.

The Honest Answer First

TAR — technology-assisted review, also called predictive coding — saves significant money in large, complex matters. It often doesn't in small ones. The crossover point, roughly speaking, is around 75,000–100,000 documents. Below that threshold, the setup costs and validation requirements of TAR can exceed the savings. Above it, TAR's cost advantage becomes increasingly dramatic.

Here's what that looks like in practice:

Cost comparison between TAR and keyword review by matter size
Matter SizeKeyword Review CostTAR CostWinner
25,000 docs$1,500–$3,000$2,500–$5,000Keywords
75,000 docs$4,500–$9,000$3,750–$7,500TAR (slight edge)
200,000 docs$12,000–$24,000$4,000–$8,000TAR (clear win)
1,000,000 docs$60,000–$120,000$8,000–$20,000TAR (decisive)

How Each Method Works

Traditional Method

Keyword Search Culling

  • Legal team develops search terms with counsel
  • Terms run against full dataset
  • All "hits" enter linear review queue
  • Contract attorneys review every document
  • Responsive/non-responsive/privileged calls made per doc
  • High recall risk if terms are under-inclusive
  • High cost if terms are over-inclusive
Technology-Assisted Review

TAR / Predictive Coding

  • Small seed set of documents reviewed by senior attorney
  • Machine learning model trained on seed set decisions
  • Model scores entire corpus for relevance probability
  • Review focused on high-probability documents
  • Continuous active learning refines model iteratively
  • Statistical validation confirms recall targets are met
  • Documentation trail supports defensibility

TAR 1.0 vs. TAR 2.0: The Practical Difference

The eDiscovery industry distinguishes between two generations of TAR, and the cost difference between them is significant enough to matter in your vendor conversations.

TAR 1.0 (also called Simple Passive Learning or SPL) requires a large, carefully constructed seed set reviewed by senior attorneys before the model trains. It's effective but front-loaded — the upfront attorney time is substantial, which erodes cost savings on smaller matters.

TAR 2.0 (Continuous Active Learning, or CAL) trains the model continuously as reviewers code documents during the normal review workflow. There's no separate seed-set phase. The model improves in real time, and the review team doesn't need to do anything differently. This is the dominant modern approach and the one worth requesting from vendors.

When evaluating vendor platforms, ask specifically whether they support CAL-based TAR 2.0. Several major platforms still default to TAR 1.0 workflows unless you ask otherwise.

The Step Nobody Skips: Pre-Review Culling

Before either keyword review or TAR begins, there are culling steps that cost almost nothing but can reduce your reviewable population by 20–40%. These should be standard practice on every matter regardless of review method:

  • Date range filtering — eliminate documents clearly outside the relevant period
  • Custodian filtering — if only 8 of 40 custodians are relevant, process only their data
  • Near-duplicate detection — collapses clusters of near-identical documents, reviewer sees one representative copy
  • Email thread suppression — review only the final email in a thread; earlier messages are already captured
  • System file exclusion — remove operating system files, executables, and known non-responsive file types via NIST NSRL

Proper culling before review — regardless of method — consistently produces 25–40% volume reductions at minimal cost. On a 200,000-document corpus, that's 50,000–80,000 fewer documents entering the review queue before TAR or keywords even start.

The Defensibility Question

Courts have broadly accepted TAR as a defensible review methodology, with significant case law establishing that TAR is at minimum as defensible as keyword search when properly validated. The key requirements are documentation: you need to be able to demonstrate what your recall targets were, how you validated against them, and that your process was reasonable.

The practical implication: TAR requires more upfront documentation than keyword review. This is a one-time cost per matter, and it's worth it — but factor it into your cost comparison. A TAR protocol that can't be explained to opposing counsel or a court isn't defensible regardless of how good the technology is.

Decision Framework: When to Use Each Method

When to use TAR versus keyword review based on matter characteristics
Use Keywords When…Use TAR When…
Matter has fewer than 75,000 documentsMatter has more than 75,000–100,000 documents
Issues are narrow and well-definedIssues are broad or evolving during review
Timeline is extremely compressedBudget is the primary constraint
Terms are highly specific (product codes, account numbers)Relevant documents use varied, non-predictable language
Small team, limited TAR experienceComplex, multi-issue litigation

The most sophisticated legal teams don't choose one method permanently. They evaluate each matter independently using a framework like the one above, and they build the capability to use both. Vendors who tell you TAR is always better — or always worse — are selling a product, not giving advice.