Documentation Index
Fetch the complete documentation index at: https://docs.awaithumans.dev/llms.txt
Use this file to discover all available pages before exploring further.
await_human() blocks until a human acts. That’s the whole point in production, and the exact thing that makes naive tests hang for the full timeout_seconds. This page covers the four shippable patterns to test agent code that uses awaithumans, ranked from “least magic” to “most automated.”
An in-memory
createTestClient() (no server, instant resolution) is on the roadmap — see the test-client tracking issue. Until it lands, the patterns below are the supported ways to write tests today.1. Live dev server + real human (dev loop)
For local iteration on a single task — when you’re shaping the prompt and form, not running CI — the simplest thing is to keepawaithumans dev running in one terminal and click through the dashboard yourself.
2. File-transport email smoke (full automation, no GUI)
The email channel has afile transport that writes each email as a JSON file to a directory instead of sending. Combined with the magic-link action URL, you can drive a full agent ↔ server ↔ “human” loop without ever opening a browser.
examples/email-smoke-py/ (Python) and examples/email-smoke/ (TypeScript). Both are in CI and prove the SDK ↔ server contract on every push.
| Example | Language | What it tests |
|---|---|---|
examples/email-smoke-py/ | Python | SDK → file-transport email → magic-link → SDK resolves |
examples/email-smoke/ | TypeScript | Same loop, TS SDK |
3. Direct POST /complete (skip the channel)
If you’re testing agent logic — branching on decision.approved, error handling for TaskTimeoutError, etc. — you don’t need to exercise the channel layer. Hit the complete endpoint directly:
idempotency_key=f"test:{test_name}" lets a parallel test know exactly which task to complete without race conditions.
4. Verifier-path tests (three corner cases)
When you’ve configured a verifier, there are three behavioral paths to cover:| Path | How to drive it | Expected SDK behavior |
|---|---|---|
| Pass | Submit a response that meets the instructions | await_human() returns the typed response |
| Reject + retry | Submit a response the verifier rejects, then resubmit with one that passes | Task transitions REJECTED → COMPLETED; SDK returns the second submission |
| Exhaust | Submit max_attempts rejected responses | SDK raises VerificationExhaustedError |
examples/verifier-py/ walks all three paths against a real Claude verifier — useful as a fixture template.
For tests that should not call a real LLM, set the verifier’s model to a deployment that returns canned responses (a stub Claude proxy, or LocalAI’s claude adapter), and override the API key env var per-test.
5. Pinning idempotency keys for parallel safety
If your test suite creates tasks in parallel, pin every test’sidempotency_key to its test name so a flaky run doesn’t accidentally pick up a sibling test’s task:
What to put in CI
The minimum useful CI loop:- Boot
awaithumans dev(or use Docker Compose in CI runners). - Run the file-transport email smoke against it.
- Run your agent test suite using pattern 3 (direct complete).
examples/slack-native/.
Common pitfalls
- Tests timing out at the SDK’s
timeout_secondsrather than your test runner’s timeout. Settimeout_seconds=60in tests; the SDK’s minimum is 60s. Anything shorter belongs in a different layer (your test runner’s pytest-timeout, jest’s testTimeout). - Polluting prod data with test tasks. Run tests against a separate
AWAITHUMANS_URL. The dev server’s SQLite DB is at.awaithumans/dev.db— wipe between test runs if you want isolation. - Forgetting to clean up email identities. Each test creates a
fileidentity; tear down withDELETE /api/channels/email/identities/{id}so they don’t accumulate. - Verifier hits a real LLM in CI. That’s both a cost and a flakiness risk. Mock at the env-var level (point
ANTHROPIC_API_KEYat a stub) or skip the verifier in non-prod-shape tests.
Where to next
- Webhooks (
callback_url) — for testing Temporal/LangGraph workflow callbacks - Idempotency — recover-from-crash semantics that test fixtures rely on
- Verifier — full provider list and prompt patterns