Skip to main content
POST
/
extract
Extract Structured Data by Schema
curl --request POST \
  --url https://api.trycardinal.ai/extract \
  --header 'Content-Type: multipart/form-data' \
  --header 'x-api-key: <api-key>' \
  --form 'fileUrl=<string>' \
  --form 'schema=<string>' \
  --form fast=false \
  --form 'customContext=<string>' \
  --form imageMetadataDetect=false \
  --form file=@example-file
{
  "response": "<string>",
  "method": "fast",
  "pages_processed": 123,
  "confidence_score": 123,
  "image_metadata": [
    {
      "figure_id": 123,
      "caption": "<string>",
      "cropped_image_url": "<string>",
      "metadata": {},
      "bounding_box": {
        "original": {
          "x": 123,
          "y": 123,
          "w": 123,
          "h": 123
        },
        "pixel": {
          "x": 123,
          "y": 123,
          "w": 123,
          "h": 123
        }
      },
      "subfigure_count": 123,
      "processing_times": {
        "detection_ms": 123,
        "crop_ms": 123,
        "ocr_ms": 123,
        "total_ms": 123
      }
    }
  ]
}

Authorizations

x-api-key
string
header
required

Body

multipart/form-data
  • Option 1
  • Option 2
file
file
required

PDF or image to upload (required if no fileUrl). Allowed: .pdf, .jpg, .jpeg, .png

schema
string
required

Required schema definition describing the fields to extract.

fileUrl
string<uri>

Publicly accessible URL of the file to process (required if no file).

fast
boolean
default:false

Fast path that extracts directly from pages without full pipeline post-processing.

customContext
string

Optional additional context or instructions to guide the extraction process. Useful for providing domain-specific guidance or clarifications about the document.

imageMetadataDetect
boolean
default:false

If true, includes image metadata in each page of the response.

Response

Successful schema extraction

response
string
required

Model's structured output matching the provided schema. Will return as a stringified object.

method
enum<string>
Available options:
fast,
slow
pages_processed
integer

Present in slow mode.

confidence_score
number

Confidence score (0-100) indicating extraction reliability, present in slow mode.

image_metadata
object[]

Image metadata entries for this extraction (present if imageMetadataDetect=true).