A common skeptical question: “Won’t HITL go away once the models are good enough?” The answer is no, and it’s not a defensive answer. There are three walls between an AI agent and the world. Bigger models knock on the first wall harder; they don’t move the second or third.Documentation Index
Fetch the complete documentation index at: https://docs.awaithumans.dev/llms.txt
Use this file to discover all available pages before exploring further.
Wall 1 — Judgment
The agent has a confidence score. The cost of being wrong isn’t symmetric. Someone has to make the call. Examples:- KYC: the model says 73% likely match; the regulator says you can’t reject without a manual review
- Refunds: the model says fraud; the human knows the customer just had a bad week
- Content moderation: the model says borderline hate speech; the policy team owns the line
Wall 2 — System uncertainty
The agent doesn’t know what happened. The Stripe webhook never fired. The wire to the bank didn’t come back. The third-party API returned 200 but the downstream system is in an inconsistent state. The model can guess, but the only way to actually know is for a human to call the bank. This is the most under-appreciated wall. Bigger models do nothing here — by definition, no amount of reasoning can recover information that wasn’t captured. The system needs a human to look at the actual external state and report back. Examples:- Transaction reconciliation: payment provider didn’t ack; was the transfer applied?
- Distributed-system inconsistency: order shows as shipped in one DB and not-shipped in another
- Vendor outage during a long-running workflow: did the workflow’s last side effect take?
Wall 3 — Embodiment
The task needs a body. Examples:- KYC ID-photo verification (a human compares face to document)
- Pickup-and-delivery (someone has to physically grab the thing)
- Phone calls to vendors who don’t have APIs
- Visits to physical locations (inspections, audits)
assign_to: { capability: "pickup-and-deliver", region: "SF" }. For v0.1 the embodiment wall is just “humans on your team, routed via assign_to.” The post-Phase-3 marketplace expansion targets the broader case where the embodied work is sourced from outside your team.
Why this matters for your stack
If you treat HITL as a temporary hack — a Slack channel where everyone yells, a spreadsheet someone updates by hand — you’ll outgrow it within months and have to rip-and-replace. If you treat it as permanent infrastructure with a clean primitive (await_human()), the same code that powers your scrappy v1 review queue still works when:
- You add your second reviewer (just
assign_to=...) - You add a fourth notification channel (just register it)
- You add an AI verifier (just pass
verifier=) - You move to durable workflows (swap to the Temporal adapter)