Complexity: IntermediateType: Architecture / VerificationAccuracy

Hallucination Block

Before an AI-generated answer is shown, a second agent ("The Judge") verifies that it is grounded in the provided context, blocking unverified statements from reaching the user.

1. Trust Challenge

What is the core risk to user trust, and when does it matter most?

LLMs are confident storytellers. When data is missing, unclear, or contradictory, they often fill gaps with fluent but incorrect answers instead of admitting uncertainty.

From a user's perspective, this is worse than "I don't know." It feels like being lied to by a system that sounds authoritative.

Critical moments where this pattern matters most:

The answer depends on live or factual data (records, schedules, prices, policies, status).
The system is expected to be source-of-truth, not just a brainstorming tool.
Mistakes have real consequences (money, access, health, legal, safety).

Without a verification layer, users are forced to double-check every single output manually, negating the efficiency gains of using AI in the first place.

2. Desired Outcome

What does 'trust done right' look like for this pattern?

Hallucination Block is working when answers are grounded, not guessed.

Evidence-Backed Responses

Factual answers are consistent with the underlying data and policies the product is built on.

Honest Uncertainty

When data is missing or ambiguous, the assistant says "I'm not sure" or offers a safe fallback instead of inventing.

Reduced "Confidently Wrong" Moments

Users see fewer situations where the AI says something fluent but obviously untrue.

Success State

Users come to expect that if the assistant states a fact about their account, a policy, or the system, it is either correct or clearly marked as uncertain or pending verification.

3. Implementation Constraints

What limitations or requirements shape how this pattern can be applied?

To apply Hallucination Block effectively, you need:

Requirements

Trusted Context: Clear access to the data and rules that should constrain answers (APIs, databases, documents, policies). The judge cannot evaluate answers in a vacuum.
Separation of Roles: A "generator" that drafts responses and a "judge" that evaluates them; these can be separate models or the same model in a different mode, but their responsibilities are distinct.
Evaluation Rubric: The Judge needs clear instructions on what constitutes an error. Is a slight rephrasing okay? Or must it be an exact semantic match? You need to define these thresholds rigorously.
Structured Feedback Channel: A way for the judge to label an answer as supported, unsupported, unsafe, or incomplete—and for the system to act on that signal.

Constraints / Limitations

Latency & Cost: A second pass over each answer adds overhead; you may need to apply it selectively (e.g., only to certain intents or high-risk domains).
Coverage Limits: The judge can only compare answers to the context it has. If your grounding data is incomplete, the judge can't magically fill the gaps.
Design Discipline: The judge must be configured to prioritize "I don't know" over guessing, even if that feels less impressive.

4. Pattern in Practice

What specific mechanism or behavior will address the risk in the product?

Core mechanism:

The product introduces a Verification Loop into the generation pipeline.

GenerateThe primary LLM creates a draft response based on the user's prompt and retrieved context.

CritiqueThe "Judge" model receives the Draft Response + The Original Source Context. It prompts: "Does the response contain any claims not supported by the context?"

Act: PassIf the Judge finds no errors, the response is streamed to the user.

Act: FailIf errors are found, the system triggers a Regeneration (asking the main model to try again with specific feedback) or tags the response with a Low Confidence warning.

Behavior in the UI / conversation:

For the user, this process is largely seamless, but transparency can be added.

The Verified Badge: A subtle checkmark indicating the response passed the integrity check.
The Citation Check: When hovering over a citation, the UI might show a snippet of the source text that validates the claim, proving the connection.
The Uncertainty Disclaimer: If the Judge is unsure, the AI output might shift tone: "Based on the documents, it appears that X, though the text is ambiguous regarding Y."

Use these components to visualize the verification layer.

1. Confidence Banner

Purpose: Signal when an answer is grounded vs. uncertain in a single line.

Placement: Top or bottom of the assistant’s message.

Variants:

“Checked against current data” (normal / neutral styling).
“Some parts may be incomplete or outdated” (subtle warning styling).

2. “Limited Confidence” Answer Style

Purpose: Provide a visual style for answers that passed the judge with low support.

Structure: Regular answer text with a short disclaimer and suggested next step.

Elements:

One-line caveat: “I’m not fully confident in this answer based on the data I have.”
Action buttons: “See source data”, “Contact support”, “Refresh data”.

3. Fallback / Escalation Card

Purpose: Handle cases where the judge blocks the answer entirely.

Text: “I’m not confident enough in my answer to show it to you.”

Actions: “Ask a human”, “View related help articles”, “Try a different question”.

These UI elements make the invisible judgment pass tangible: grounded answers feel solid, and uncertain ones are clearly marked as such.

5. Best Used When

In which contexts does this pattern create the greatest trust value?

Hallucination Block is especially valuable when:

RAG Systems

Where the primary promise is "Chat with your data." The user expects the AI to be a faithful scribe, not a creative writer.

Compliance & Legal

Where accuracy is legally mandated. A hallucinated clause in a contract summary is unacceptable.

Medical & Scientific Q&A

Where specific protocols or dosage information must be exact.

Customer Support Automation

To ensure the bot doesn't invent return policies or promise refunds that don't exist.

In these scenarios, the added latency and cost are justified by the massive reduction in liability and the increase in user confidence.

6. Use With Caution

When could applying this pattern create friction or unintended effects?

Risks and Anti-Patterns:

The "Paralyzed" AI

If the Judge is set to "Zero Tolerance," the AI might refuse to answer anything that isn't a verbatim quote, frustrating users who want synthesis or summary.

Latency Fatigue

If the verification step adds 5+ seconds to every answer, users will abandon the tool for a faster, dumber alternative.

False Security

Users might blindly trust a "Verified" badge, assuming it guarantees absolute truth, when it only guarantees consistency with the provided context (which itself might be wrong).

To use this pattern safely:

Asynchronous Verification: For lower-stakes queries, stream the answer first and run the Judge in the background, updating the UI with a "Verified" badge only after the check is done.
Tunable Strictness: Allow admins to adjust the "Temperature" of the Judge based on the use case (e.g., strict for Legal, lenient for Creative Writing).
Human Override: Always provide a link to the source so the human can be the "Supreme Court" if they suspect the Judge is wrong.

7. How to Measure Success

How will we know this pattern is strengthening trust?

North Star Metric

Regeneration Rate

How often does the Judge reject a draft? A high rate means the primary model needs better prompting; a low rate means the system is stable.

Citation Click-through Rate

If users are clicking citations less often over time, it implies they trust the AI's summary is accurate (or they are disengaged—context matters).

User Feedback (Thumbs Down)

Specifically tagging "Inaccurate" or "Made this up."

Latency Impact

Monitoring the average time-to-first-token to ensure the verification step isn't killing the UX.