Verifier

The verifier runs server-side after a human submits, before the task lands in COMPLETED. It does two things in one LLM call:

Quality check — was the response acceptable per your instructions?
NL parsing (optional) — if the human replied in free text (Slack thread, email body), extract structured data into the response schema.

If the verifier rejects, the task goes to REJECTED (non-terminal), the reason is shown to the reviewer, and they get to retry. After max_attempts, it’s VERIFICATION_EXHAUSTED (terminal) and the agent gets a typed error.

BYOK — bring your own key

The verifier LLM call runs server-side, not in your agent process. That has two consequences worth internalizing before you wire anything up:

Your provider API key must live on the awaithumans server’s environment, not the agent’s. The agent process never sees the key, never sends it over the wire, and doesn’t need it installed.
Your LLM provider bills you directly. awaithumans has no inference layer of its own — every verifier call goes from your server to Anthropic / OpenAI / Google / Azure, on your account, at provider list price. There’s no markup and no proxy.

Each provider needs two things on the server: the SDK extra installed, and the API key exported as an env var.

Provider	Install	Required env var	Default model
`claude`	`pip install "awaithumans[verifier-claude]"`	`ANTHROPIC_API_KEY`	`claude-sonnet-4-5`
`openai`	`pip install "awaithumans[verifier-openai]"`	`OPENAI_API_KEY`	`gpt-4o-2024-11-20`
`gemini`	`pip install "awaithumans[verifier-gemini]"`	`GEMINI_API_KEY`	`gemini-2.0-flash`
`azure`	`pip install "awaithumans[verifier-azure]"`	`AZURE_OPENAI_API_KEY` + `AZURE_OPENAI_ENDPOINT`	(operator-supplied deployment name)

Override the env var name with api_key_env= if you’d rather not use the default — e.g. claude_verifier(api_key_env="ACME_CLAUDE_KEY").

# On the same shell that runs `awaithumans dev` (or in the server's .env):
export ANTHROPIC_API_KEY=sk-ant-...
awaithumans dev   # restart so the new var is picked up

Forget this step and the first verifier call returns HTTP 500 VERIFIER_API_KEY_MISSING with a docs link to the troubleshooting page. Set the key, restart the server, the failed task can be resubmitted by the human — the verifier didn’t burn an attempt.

Quickstart

from pydantic import BaseModel
from awaithumans import await_human_sync
from awaithumans.verifiers.claude import claude_verifier


class RefundPayload(BaseModel):
    amount_usd: int
    customer_id: str


class RefundDecision(BaseModel):
    approved: bool
    notes: str = ""


decision = await_human_sync(
    task="Approve $250 refund?",
    payload_schema=RefundPayload,
    payload=RefundPayload(amount_usd=250, customer_id="cus_demo"),
    response_schema=RefundDecision,
    timeout_seconds=900,
    verifier=claude_verifier(
        instructions=(
            "Reject if the notes contradict the decision (e.g. notes say "
            "'looks fraudulent' but approved=true). For amounts over $1000, "
            "require non-empty notes."
        ),
        max_attempts=3,
    ),
)

If the human submits with empty notes on a

1500 refund, the verifier rejects with the reason "Amount over

1000 requires notes.” The dashboard shows the reason; the reviewer fills the field and resubmits. Note that claude_verifier(...) doesn’t take an API key — it just declares the config. The agent ships that config to the server, and the server reads ANTHROPIC_API_KEY from its own env.

Other providers

from awaithumans.verifiers.openai import openai_verifier
from awaithumans.verifiers.gemini import gemini_verifier
from awaithumans.verifiers.azure_openai import azure_openai_verifier

_INSTRUCTIONS = "Reject any refund decision that lacks a clear reason."

# OpenAI
verifier = openai_verifier(instructions=_INSTRUCTIONS, model="gpt-4o-2024-11-20")

# Gemini
verifier = gemini_verifier(instructions=_INSTRUCTIONS, model="gemini-2.0-flash")

# Azure OpenAI (deployment-name addressing)
verifier = azure_openai_verifier(
    instructions=_INSTRUCTIONS,
    deployment_name="my-gpt4-deployment",
    endpoint_env="AZURE_OPENAI_ENDPOINT",   # default
    api_version="2024-10-21",                # default
)

All four return a VerifierConfig you pass to await_human(verifier=...).

NL parsing

When the human replies in free text (Slack thread reply, email body), the verifier extracts structured data into the response schema. No code changes — the server detects raw text vs. structured form, runs the verifier in NL-parsing mode. Operator’s instructions should anticipate the cases:

verifier=claude_verifier(
    instructions=(
        "If the human's NL reply contains 'approve', 'yes', 'go ahead', "
        "or similar affirmative phrasing, set approved=true. If 'reject', "
        "'no', 'denied', or similar, set approved=false. If ambiguous, "
        "reject with: 'Please answer with one of: approve, reject.'"
    ),
)

A reviewer can now reply to the Slack DM with “approve, looks legit” and the verifier turns that into {approved: true, notes: "looks legit"}.

redact_payload

If the task’s redact_payload=True, the verifier is skipped entirely. The operator marked the payload sensitive; we don’t ship it to a third-party LLM regardless of verifier config.

decision = await_human_sync(
    # ...
    payload=KYCPayload(ssn="123-45-6789", ...),
    redact_payload=True,    # never sent to the verifier
    verifier=claude_verifier(...),   # silently ignored
)

The task still works — the human reviews via dashboard, submits, the response is stored. Just no AI verification.

Error handling

Provider failures (vendor outage, missing API key, network blip) propagate as typed errors WITHOUT consuming a max_attempts slot. The human gets a fresh shot once the operator fixes config:

HTTP 500 VERIFIER_API_KEY_MISSING
  → operator: set ANTHROPIC_API_KEY on the server
HTTP 500 VERIFIER_PROVIDER_UNAVAILABLE
  → operator: pip install "awaithumans[verifier-claude]"
HTTP 502 VERIFIER_PROVIDER_ERROR
  → vendor blip; submission will work on retry
HTTP 422 VERIFIER_PROVIDER_UNKNOWN
  → typo in config.provider; supported list in error message
HTTP 422 VERIFIER_CONFIG_INVALID
  → schema mismatch; older SDK or hand-written config

Each error has a docs URL the dashboard renders for the operator.

Best practices

Keep instructions narrow. “Reject if X” / “Require Y when Z” — not “be a good reviewer.” Narrow rules are easier to debug when the verifier rejects something you didn’t expect.
Set max_attempts low. Default is 3. For high-stakes decisions, lower (1) so a single rejection fails closed rather than letting the human grind through retries.
Test the prompt. Run a few representative submissions through it before relying on the verifier in production. The dashboard’s audit trail records every verifier verdict so you can inspect them.

Cost

The verifier fires on every submission. At ~1k input tokens + ~100 output per call:

Claude Sonnet: ~$0.005 / call
GPT-4o: ~$0.01 / call
Gemini 2.0 Flash: ~$0.0005 / call

For most workloads this is negligible. For very high-volume queues, consider gemini-2.0-flash.

Runnable examples

Example	Language	Walks through
`examples/verifier-py/`	Python	All three verifier paths: pass, reject + retry, exhaust
`examples/verifier/`	TypeScript	Same flow, TS SDK

Both run against a real Claude verifier — drop in your ANTHROPIC_API_KEY, python refund.py (or npx tsx), and walk the dashboard through each path. A useful template for fixture-driven tests of your own verifier prompts.

Where to next

Testing — patterns for testing verifier paths in CI
Idempotency — what happens to in-flight tasks across verifier retries
Troubleshooting — every verifier error code with the matching fix

Get started

Concepts

Adapters

Channels

Routing

Self-hosting

Integrations

SDK reference

API reference

Help

Community

BYOK — bring your own key

Quickstart

Other providers

NL parsing

redact_payload

Error handling

Best practices

Cost

Runnable examples

Where to next

Get started

Concepts

Adapters

Channels

Routing

Self-hosting

Integrations

SDK reference

API reference

Help

Community

Documentation Index

​BYOK — bring your own key

​Quickstart

​Other providers

​NL parsing

​redact_payload

​Error handling

​Best practices

​Cost

​Runnable examples

​Where to next

BYOK — bring your own key

Quickstart

Other providers

NL parsing

redact_payload

Error handling

Best practices

Cost

Runnable examples

Where to next