The contract
Every provider type plugs into the sameextraction= parameter:
response_schema locally before sending. If the provider returns malformed JSON or fields that don’t match, you get an ExtractionFailedError and nothing is sent to the reviewer (no charge).
LLM providers (vision capable)
These read the document image directly and return structured output matching your schema.OpenAI
OPENAI_API_KEY from your environment if api_key= isn’t passed. Uses OpenAI’s structured-output mode (response_format=json_schema) to guarantee the output matches your Pydantic schema.
Anthropic (Claude)
response_schema as the tool schema; the tool call response gets validated against Pydantic.
Azure OpenAI
Document extraction providers (SaaS)
These do OCR + layout analysis and return structured JSON. Pair them with aStructuringConfig to map the raw extraction to your Pydantic schema.
Reducto
/extract endpoint returns structured data; the StructuringConfig is the LLM that maps it to your specific Pydantic shape. Best for documents with complex layouts (multi-column, mixed text + tables).
Azure Document Intelligence
Local providers (no API calls)
Run entirely on your machine. No credentials, no per-request cost. Slower, depending on your hardware.Docling
PaddleOCR
Comparison
| Provider | What it does | Best for | Cost model |
|---|---|---|---|
| OpenAI | Vision LLM, structured output | Anything text or table | Per token |
| Anthropic | Vision LLM, tool use | Anything text or table | Per token |
| Azure OpenAI | Vision LLM via Azure | Azure-only deployments | Per token |
| Reducto | SaaS OCR + layout | Complex layouts, multi-column | Per page |
| Azure DI | SaaS OCR + prebuilt models | Invoices, receipts, IDs | Per page |
| Docling | Local OCR + layout | Air-gapped envs, batch jobs | Compute only |
| PaddleOCR | Local OCR | Plain text, batch jobs | Compute only |
Credentials never leave your machine
The SDK calls every provider from your Python process. YourOPENAI_API_KEY, REDUCTO_API_KEY, etc. are read from your environment (or passed to the constructor) and used to make the provider request directly. AwaitVerify’s managed backend never sees provider credentials. We only receive:
- The encrypted document fragments
- The extracted result (after your provider returned it)
- The task metadata you attach
What if my provider isn’t on the list?
Two options:- Use Flow A. Run your provider on your machine, then pass the result as
prior_extraction=YourModel(...). We don’t need to support your provider directly; we just need the Pydantic instance. - Open an issue. New providers land based on real demand. PR welcome too. The provider interface is in
awaithumans.providers.base.
Where to go next
The three flows
Flow A (you bring the extraction) vs Flow B (SDK runs the provider).
Response schemas
The Pydantic shape determines the provider’s output schema too.