Split

TL;DR

Send an array of descriptions for what you want (e.g., “invoices”, “contracts”, “account number xyz”). Cardinal returns which pages match each description. Note: Each page is matched to at most one partition type for accuracy purposes.

Endpoint

POST https://api.trycardinal.ai/split
Content-Type: multipart/form-data
Auth: X-API-KEY: <API_KEY>

You may provide either file or fileUrl.

Required parameters

queries (string) — JSON-encoded array of query objects (see format below)
file (file upload) OR fileUrl (string)

Optional parameters

customContext (string) — Additional context to improve classification accuracy and help the model understand domain-specific terminology. Use this to provide background information about the document type, industry, or specific terminology that will help with more accurate page classification.

Query object format

Each query is:

name (string) — label you’ll see in the response (e.g., "invoices")
description (string, optional) — natural-language hint used to find relevant pages

Example queries value

[
  {"name":"invoices","description":"Pages with invoice numbers, totals due, remittance sections"},
  {"name":"contracts","description":"Legal agreements with parties, terms, signatures"},
  {"name":"financial_statements","description":"Balance sheets, income statements, cash flow tables"}
]

Example requests

import json, requests

queries = [
    {"name": "cover_pages", "description": "Title pages or covers"},
    {"name": "data_tables", "description": "Pages with structured tables"},
    {"name": "appendices", "description": "Supplemental materials or references"}
]

resp = requests.post(
    "https://api.trycardinal.ai/split",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    files={"file": open("quarterly-report.pdf", "rb")},
    data={"queries": json.dumps(queries)}
)
print(resp.json())

Example Response

{
  "success": true,
  "pages": [
    {
      "content": "Page 1 content...",
      "page_number": 1
    },
    {
      "content": "Page 2 content...",
      "page_number": 2
    }
  ],
  "partitions": [
    {
      "name": "cover_pages",
      "description": "Title pages, cover sheets, or document headers",
      "pages": [1]
    },
    {
      "name": "data_tables", 
      "description": "Pages with structured data, tables, or numerical information",
      "pages": [3, 4, 7, 8]
    },
    {
      "name": "appendices",
      "description": "Supporting documents, references, or supplementary materials", 
      "pages": [9, 10, 11]
    }
  ]
}

Response Format

The response includes:

success - Boolean indicating if partitioning completed successfully
pages - Array of page objects with content and metadata
partitions - Array of partition results, each containing:
- name - The partition name from your query
- description - The partition description from your query
- pages - Array of page numbers that match this partition (sorted)

Supported File Types

This endpoint supports:

PDF files (.pdf)
Images (.jpg, .jpeg, .png)

Writing Effective Queries

Good query descriptions:

Be specific about content type: “Financial tables with revenue data”
Include context clues: “Pages with signatures or sign-off sections”
Mention visual indicators: “Charts, graphs, or data visualizations”

Less effective queries:

Too vague: “Important pages”
Overly restrictive: “Page 5 specifically about Q3 sales in the northeast region”

Under the Hood

We run the document through our full Markdown pipeline first, converting it into a precise text representation. Only then do we split, ensuring the results are consistent and accurate.

Extraction Endpoint

POST https://api.trycardinal.ai/split/extract
Content-Type: multipart/form-data
Auth: X-API-KEY: <API_KEY> After classifying pages with /split, use /split/extract to download specific pages as a separate PDF.

Required parameters

pages (string) — JSON-encoded array of page numbers (e.g., "[1, 3, 5, 7]")
file (file upload) OR fileUrl (string)

Example requests

import json, requests

# First, get page classifications from /split
split_resp = requests.post(
    "https://api.trycardinal.ai/split",
    headers={"X-API-KEY": "YOUR_API_KEY"},
    files={"file": open("document.pdf", "rb")},
    data={"queries": json.dumps([
        {"name": "invoices", "description": "Invoice pages"}
    ])}
)

partitions = split_resp.json()["partitions"]
invoice_pages = next(p["pages"] for p in partitions if p["name"] == "invoices")

# Extract those pages as a new PDF
extract_resp = requests.post(
    "https://api.trycardinal.ai/split/extract",
    headers={"X-API-KEY": "YOUR_API_KEY"},
    files={"file": open("document.pdf", "rb")},
    data={
        "pages": json.dumps(invoice_pages),  # e.g., [1, 3, 5]
        "filename": "invoices.pdf"
    }
)

# Save the extracted PDF
with open("invoices.pdf", "wb") as f:
    f.write(extract_resp.content)

Response Format

The /split/extract endpoint returns a PDF file directly.

Content-Type: application/pdf
Content-Disposition: attachment; filename="<your-filename>.pdf"

To save the extracted PDF, simply write the response body to a .pdf file.

Introduction

Building Blocks

Accessories

Eval

Common Questions

Recipes

Security

On-Premise VPC Deployment

Uptime

Changelog

TL;DR

Endpoint

Required parameters

Optional parameters

Query object format

Example requests

Example Response

Response Format

Supported File Types

Writing Effective Queries

Under the Hood

Extraction Endpoint

Required parameters

Example requests

Response Format

API Reference

Split API

Introduction

Building Blocks

Accessories

Eval

Common Questions

Recipes

Security

On-Premise VPC Deployment

Uptime

Changelog

​TL;DR

​Endpoint

​Required parameters

​Optional parameters

​Query object format

​Example requests

​Example Response

​Response Format

​Supported File Types

​Writing Effective Queries

​Under the Hood

​Extraction Endpoint

​Required parameters

​Example requests

​Response Format

​API Reference

Split API

TL;DR

Endpoint

Required parameters

Optional parameters

Query object format

Example requests

Example Response

Response Format

Supported File Types

Writing Effective Queries

Under the Hood

Extraction Endpoint

Required parameters

Example requests

Response Format

API Reference