Why Legal AI Needs Hallucination-Free Tabular Review

Harvey and Legora built tabular review tools on generative models. Isaacus just showed how to do it better—without hallucinations.

The approach is worth studying because it reveals something fundamental about AI architecture: sometimes the right tool isn't the one that can do everything.

The Problem with Generative Review

Tabular review is the bread and butter of legal work. You have dozens or hundreds of documents—contracts, leases, agreements—and you need to extract the same fields from each one into a comparison table.

Generative models are convenient for this. You feed in documents, ask for fields, and they produce structured output. The problem: they hallucinate.

Not often, but enough to matter in legal work. A generative model might extract a governing law that doesn't exist in the document, or merge clauses from different sections, or invent terms that sound plausible but aren't there.

In legal review, one wrong extraction can be a liability.

The Isaacus Approach

Isaacus built a tabular review pipeline that avoids generative models entirely. Instead, they use three specialized models:

Kanon 2 Enricher extracts entities and relationships into a knowledge graph schema (ILGS). It identifies people, organizations, dates, locations, clauses, and document segments—all with token-level annotations and confidence scores.

Kanon 2 Embedder handles semantic search over document spans. You embed natural-language queries like "confidentiality obligations" and match against indexed segments, then filter by similarity threshold.

Kanon Answer Extractor does extractive QA—returning answer spans from the source document, which you then cross-reference against the knowledge graph to link entities.

No generation. Every extraction is grounded in the original text.

Why This Works Better

The key insight: legal documents are already structured. They have defined sections, named entities, numbered clauses, hierarchical hierarchies.

A specialized enrichment model can recognize this structure without hallucinating it. It assigns unique IDs to every entity, segment, and relationship, which means:

Backlinks work. Click any cell in the table and navigate directly to the source span in the document.
Entity panels make sense. Because entities are linked across documents, you can see all mentions of "Acme Corporation" across your entire corpus, not just in one extraction.
Classification is query-driven. Add a new column by typing a natural-language query. The embedding model finds matching spans; you don't need to re-run anything.
Confidence scores are meaningful. Because the models are specialized, they produce calibrated scores. A 0.9 similarity means something.

The Trade-off

The Isaacus approach requires specialized models trained on legal text. You can't just swap in a general-purpose embedder or use an off-the-shelf NER model.

But for the tabular review use case, that's the right trade. Legal documents have domain-specific structures—citations, defined terms, cross-references—that general models miss.

The result: higher accuracy, lower latency, zero hallucination risk, and a richer interface that lets lawyers actually verify what the AI extracted.

What General-Purpose Models Can Learn

This isn't about generative vs. extractive. It's about fit-for-purpose.

When you're building an AI product for a specific domain, ask:

Does the domain have inherent structure? If yes, specialized extraction may beat generation.
Is groundedness non-negotiable? In legal, medical, and financial applications, hallucinations aren't just annoyances—they're liabilities.
Can you encode domain knowledge? Legal documents follow conventions. A model that knows those conventions outperforms one that doesn't.

The Isaacus tabular review guide is open source. You can clone it, adapt it, or extend it. The lesson is architectural: sometimes the best AI product doesn't use the most capable AI model.

Why Legal AI Needs Hallucination-Free Tabular Review

The Problem with Generative Review

The Isaacus Approach

Why This Works Better

The Trade-off

What General-Purpose Models Can Learn

Comments

More from this blog

Voice Agents Are Finally Real. Your Architecture Isn't.

A Million Tokens Changes Nothing If Your Agent Can't Remember Yesterday

The Line Between Vibe Coding and Production Is Dissolving

Correctness Before Corrections: What vLLM's RL Migration Teaches Us About Agent Reliability

The Line Between Vibe Coding and Production Is Dissolving

Command Palette

The Problem with Generative Review

The Isaacus Approach

Why This Works Better

The Trade-off

What General-Purpose Models Can Learn

Comments

More from this blog