TL;DR
Provide Cardinal with the schema you want to extract. We’ll return that schema populated with the matching values found in your document.Endpoint
POSThttps://api.trycardinal.ai/splitContent-Type:
multipart/form-dataAuth:
X-API-KEY: <API_KEY>
You may provide eitherMode: SetfileorfileUrl.
fast to true (fast path) or false (standard path)
Default mode is false.
Use customContext (optional, string) to steer extraction with short, domain-specific hints.
Methods of Schema Extraction
We offer two modes on the same endpoint:1) Fast Schema — /extract with fast: true
Use Fast when you need a quick, low-latency, lower-cost pass.
- Prioritizes speed over deep parsing
- Input: PDF or image (
.pdf,.jpg,.jpeg,.png) - Provide your schema as a string
Example Schema (JSON Schema)
Example Schema (Zod-style)
Request (requests)
Example Response
2) Standard Schema — /extract with fast: false
- Runs the full parsing pipeline first, then aligns to your schema
- Best for complex layouts (dense tables, annotations, multi-page forms)
- Slower, but more reliable for production workloads
- Input: PDF or image via upload or fileUrl
Example Schema (JSON Schema)
Request (requests)
Example Response
Supported Schema Formats
Pass your schema as a string in theschema field:
- JSON Schema (stringified object)
- Zod (string)
- TypeScript (interface/type as string)
- Pydantic (model definition as string)
- OpenAPI (schema as string)
- Custom (any structured format as string)
Custom Context
You can provide additional context to guide the extraction process using the optionalcustomContext parameter:
- Parameter:
customContext(string, optional) - Purpose: Provides additional context or instructions to help the AI better understand your document or extraction requirements
- Examples:
"This document contains medical terminology and abbreviations""Focus on financial data and ignore header/footer information""The document may contain handwritten notes in the margins""This is a form from 1995, so date formats may be non-standard"
API Usage Snippets
Python (generic)
cURL
Response Format
/extract returns:
response– the extracted data (either an object or a JSON string)method– “fast” or “slow”pages_processed– present in slow modeconfidence_score– confidence score (0-100) indicating extraction reliability, present in slow modesuccess– present in slow mode
response is a string, parse it:
Long Document Caveats
Longer files (especially with large arrays) may produce truncated arrays or partial results.Tips
- Use
fast: falsefor complex, multi-page documents - Paginate large PDFs and run
/extractper chunk, then merge arrays client-side - Leverage
customContextto provide domain-specific guidance for better extraction accuracy