Extract

TL;DR

Provide Cardinal with the schema you want to extract. We’ll return that schema populated with the matching values found in your document.

Endpoint

POST https://api.trycardinal.ai/split
Content-Type: multipart/form-data
Auth: X-API-KEY: <API_KEY>

You may provide either file or fileUrl.

Mode: Set fast to true (fast path) or false (standard path) Default mode is false. Use customContext (optional, string) to steer extraction with short, domain-specific hints.

Methods of Schema Extraction

We offer two modes on the same endpoint:

1) Fast Schema — `/extract` with `fast: true`

Use Fast when you need a quick, low-latency, lower-cost pass.

Prioritizes speed over deep parsing
Input: PDF or image (.pdf, .jpg, .jpeg, .png)
Provide your schema as a string

Example Schema (JSON Schema)

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "BlurryStatus",
  "type": "object",
  "properties": {
    "is_blurry": {
      "type": "boolean",
      "description": "Indicates whether something is blurry"
    }
  },
  "required": ["is_blurry"]
}

Example Schema (Zod-style)

{
  "schema": "z.object({ is_blurry: z.boolean().describe('Indicates whether something is blurry') })"
}

Request (requests)

import requests
import json

schema = {
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "title": "BlurryStatus",
    "type": "object",
    "properties": {"is_blurry": {"type": "boolean"}},
    "required": ["is_blurry"]
}

data = {
    "fileUrl": "https://example.com/doc.pdf",
    "schema": json.dumps(schema),
    "fast": "true",
    "customContext": "Focus on image quality and clarity when determining blur status"  # optional
}

response = requests.post("https://api.trycardinal.ai/extract", data=data)
data = response.json()

Example Response

{
  "response": "{\"is_blurry\": false}",
  "method": "fast"
}

2) Standard Schema — `/extract` with `fast: false`

Runs the full parsing pipeline first, then aligns to your schema
Best for complex layouts (dense tables, annotations, multi-page forms)
Slower, but more reliable for production workloads
Input: PDF or image via upload or fileUrl

Example Schema (JSON Schema)

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Invoice",
  "type": "object",
  "properties": {
    "invoice_number": { "type": "string" },
    "due_date": { "type": "string", "format": "date" },
    "total_amount": { "type": "number" },
    "line_items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "description": { "type": "string" },
          "amount": { "type": "number" }
        },
        "required": ["description", "amount"]
      }
    }
  },
  "required": ["invoice_number", "due_date", "total_amount", "line_items"]
}

Request (requests)

import requests
import json

schema = {
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "title": "Invoice",
    "type": "object",
    "properties": {
        "invoice_number": {"type": "string"},
        "due_date": {"type": "string", "format": "date"},
        "total_amount": {"type": "number"},
        "line_items": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "description": {"type": "string"},
                    "amount": {"type": "number"}
                },
                "required": ["description", "amount"]
            }
        }
    },
    "required": ["invoice_number", "due_date", "total_amount", "line_items"]
}

data = {
    "fileUrl": "https://example.com/invoice.pdf",
    "schema": json.dumps(schema),
    "fast": "false",  # standard mode
    "customContext": "This is a medical billing invoice. Pay attention to procedure codes and insurance information."  # optional
}

response = requests.post("https://api.trycardinal.ai/extract", data=data)
data = response.json()

Example Response

{
  "success": true,
  "response": "{\"invoice_number\":\"INV-00392\",\"due_date\":\"2025-09-01\",\"total_amount\":1042.50,\"line_items\":[{\"description\":\"Consulting\",\"amount\":900.00},{\"description\":\"Tax\",\"amount\":142.50}]}",
  "method": "slow",
  "pages_processed": 6,
  "confidence_score": 0.95
}

Supported Schema Formats

Pass your schema as a string in the schema field:

JSON Schema (stringified object)
Zod (string)
TypeScript (interface/type as string)
Pydantic (model definition as string)
OpenAPI (schema as string)
Custom (any structured format as string)

Custom Context

You can provide additional context to guide the extraction process using the optional customContext parameter:

Parameter: customContext (string, optional)
Purpose: Provides additional context or instructions to help the AI better understand your document or extraction requirements
Examples:
- "This document contains medical terminology and abbreviations"
- "Focus on financial data and ignore header/footer information"
- "The document may contain handwritten notes in the margins"
- "This is a form from 1995, so date formats may be non-standard"

API Usage Snippets

Python (generic)

import requests
import json

data = {
    "fileUrl": "https://example.com/doc.pdf",
    "schema": "z.object({ field: z.string().nullable() })",
    "fast": "true",  # or "false"
    "customContext": "Document contains technical jargon specific to aerospace industry"
}

response = requests.post("https://api.trycardinal.ai/extract", data=data)
result = response.json()

# `response` may be an object or a stringified JSON depending on config/model:
if isinstance(result["response"], str):
    extracted = json.loads(result["response"])
else:
    extracted = result["response"]

cURL

curl -X POST https://api.trycardinal.ai/extract \
  -H "x-api-key: YOUR_KEY" \
  -F fileUrl="https://example.com/doc.pdf" \
  -F schema='{"$schema":"https://json-schema.org/draft/2020-12/schema","type":"object","properties":{"field":{"type":"string"}}}' \
  -F fast=true \
  -F customContext="Pay special attention to dates and numerical values"

Response Format

/extract returns:

response – the extracted data (either an object or a JSON string)
method – “fast” or “slow”
pages_processed – present in slow mode
confidence_score – confidence score (0-100) indicating extraction reliability, present in slow mode
success – present in slow mode

If response is a string, parse it:

import json

if isinstance(result["response"], str):
    extracted = json.loads(result["response"])
else:
    extracted = result["response"]

Long Document Caveats

Longer files (especially with large arrays) may produce truncated arrays or partial results.

Tips

Use fast: false for complex, multi-page documents
Paginate large PDFs and run /extract per chunk, then merge arrays client-side
Leverage customContext to provide domain-specific guidance for better extraction accuracy

Introduction

Building Blocks

Accessories

Eval

Common Questions

Recipes

Security

On-Premise VPC Deployment

Uptime

Changelog

TL;DR

Endpoint

Methods of Schema Extraction

1) Fast Schema — `/extract` with `fast: true`

Example Schema (JSON Schema)

Example Schema (Zod-style)

Request (requests)

Example Response

2) Standard Schema — `/extract` with `fast: false`

Example Schema (JSON Schema)

Request (requests)

Example Response

Supported Schema Formats

Custom Context

API Usage Snippets

Python (generic)

cURL

Response Format

Long Document Caveats

Tips

API Reference

Extract API

Introduction

Building Blocks

Accessories

Eval

Common Questions

Recipes

Security

On-Premise VPC Deployment

Uptime

Changelog

​TL;DR

​Endpoint

​Methods of Schema Extraction

​1) Fast Schema — /extract with fast: true

​Example Schema (JSON Schema)

​Example Schema (Zod-style)

​Request (requests)

​Example Response

​2) Standard Schema — /extract with fast: false

​Example Schema (JSON Schema)

​Request (requests)

​Example Response

​Supported Schema Formats

​Custom Context

​API Usage Snippets

​Python (generic)

​cURL

​Response Format

​Long Document Caveats

​Tips

​API Reference

Extract API

TL;DR

Endpoint

Methods of Schema Extraction

1) Fast Schema — `/extract` with `fast: true`

Example Schema (JSON Schema)

Example Schema (Zod-style)

Request (requests)

Example Response

2) Standard Schema — `/extract` with `fast: false`

Example Schema (JSON Schema)

Request (requests)

Example Response

Supported Schema Formats

Custom Context

API Usage Snippets

Python (generic)

cURL

Response Format

Long Document Caveats

Tips

API Reference