-

API Documentation

This API allows you to submit a file (or file URL) along with a JSON schema that describes the structure of the data you want to extract. Once submitted, the request is queued for processing, and you can later poll for the result.

How It Works

1. Requirements and Optionals

  • A file in a supported media type, see below.
  • A Talonic API key. Contact Talonic for details.
  • Optional: A valid JSON schema. See JSON-Schema.org for instructions.
  • Optional: A description of the data contained in the file; increases accuracy.

2. Submit a Request

  • Use the /process endpoint to submit a full job (extract + optional recommend + convert + optional validate). You can either upload a file or provide a URL to one, along with the JSON schema describing the expected results.
  • Alternatively, use /extract to only extract markdown from the source without conversion, or /recommend to only generate a recommended JSON schema for the source.

Sample cURL to submit a file directly :

curl -X PUT "https://api.talonic.ai/data-extractor/process" \

-H "Authorization: Bearer YOUR_API_KEY" \

-F "file=@/path/to/your/file.pdf" \

-F "json_schema={\"$schema\":\"http://json-schema.org/draft-07/schema#\",\"type\":\"object\",...}" \

-F "description=Optional description of the file"

Sample cURL to submit a file URL :

curl -X PUT "https://api.talonic.ai/data-extractor/process" \

-H "Authorization: Bearer YOUR_API_KEY" \

-F "file_url=https://example.com/path/to/file.pdf" \

-F "json_schema={\"$schema\":\"http://json-schema.org/draft-07/schema#\",\"type\":\"object\",...}" \

-F "description=Optional description of the file"

3. Poll for Status

  • To check the status and get the result of your processing job use the /process/{job_id} endpoint with the provided job_id.

Sample cURL to poll job status :

curl -X GET "https://api.talonic.ai/data-extractor/process/YOUR_JOB_ID" \

-H "Authorization: Bearer YOUR_API_KEY"

  • The response (ProcessStatusResponse) will show the current status of the conversion.
  • If "successful", it will also include the extracted data according to your JSON schema.

Notes

  • Replace YOUR_API_KEY with your actual API key.
  • Replace placeholders like /path/to/your/file.pdf and YOUR_JOB_ID with your actual file path and job identifier.
  • Use the json_schema field to clearly define what data you expect to be extracted from the file.
  • The description can be used to provide additional context and information about the file to the system that may be necessary for proper extraction and/or mapping.
  • If a file_url is submitted, ensure that it is publicly accessible. Any errors in file validation will result in a "failed" processing status.

As the API is currently in testing, all endpoints and schemas are subject to change.

Servers
Computed URL: https://api.talonic.ai/data-extractor
Server variables

Processing

PUT
/process
Submit a Processing Request

Submit a file or a file URL along with a JSON schema for processing.

No parameters

No parameters

Request body


{ "file": "", "json_schema": { "$schema": "http://json-schema.org/draft-07/schema#", "title": "Invoice", "description": "ACME Invoice", "type": "object", "properties": { "invoiceId": { "type": "string", "description": "A unique identifier for the invoice.", "pattern": "^[A-Z]{2,3}-\\d{6}$", "examples": [ "INV-000001", "AB-123456" ] }, "date": { "type": "string", "description": "The date when the invoice was issued, in YYYY-MM-DD format.", "pattern": "^\\d{4}-\\d{2}-\\d{2}$", "examples": [ "2025-01-01", "2024-12-31" ] }, "dueDate": { "type": "string", "description": "The payment due date for the invoice, in YYYY-MM-DD format.", "pattern": "^\\d{4}-\\d{2}-\\d{2}$", "examples": [ "2025-01-15", "2024-12-31" ] }, "billTo": { "type": "object", "description": "Details of the entity being billed.", "properties": { "name": { "type": "string", "description": "Name of the customer or client.", "examples": [ "Acme Corporation", "John Doe" ] }, address": { "type": "string", "description": "Billing address of the customer or client.", "examples": [ "123 Main St, Anytown, USA", "456 Elm St, Othertown, USA" ] }, "email": { "type": "string", "description": "Email address of the customer or client.", "format": "email", "examples": [ "contact@acme.com", "johndoe@example.com" ] } }, "required": [ "name", "address", "email" ] }, "items": { "type": "array", "description": "List of items or services included in the invoice.", "items": { "type": "object", "properties": { "description": { "type": "string", "description": "Description of the item or service.", "examples": [ "Web design services", "Consulting hours" ] }, "quantity": { "type": "integer", "description": "Quantity of the item or hours of service.", "minimum": 1, "examples": [ 10, 5 ] }, "unitPrice": { "type": "number", "description": "Price per single unit or hour, in the specified currency.", "minimum": 0, "pattern": "^[0-9]+(\\.[0-9]{2})$", "examples": [ 150, 75.5 ] }, "total": { "type": "number", "description": "Total price for the item (quantity * unitPrice).", "minimum": 0, "pattern": "^[0-9]+(\\.[0-9]{2})$", "examples": [ 1500, 377.5 ] } }, "required": [ "description", "quantity", "unitPrice", "total" ] }, "minItems": 1 }, "subtotal": { "type": "number", "description": "Sum of all item totals before taxes and discounts.", "minimum": 0, "pattern": "^[0-9]+(\\.[0-9]{2})$", "examples": [ 1877.5 ] }, "tax": { "type": "number", "description": "Tax amount applied to the subtotal.", "minimum": 0, "pattern": "^[0-9]+(\\.[0-9]{2})$", "examples": [ 150 ] }, "total": { "type": "number", "description": "Total amount due, including taxes and any additional charges.", "minimum": 0, "pattern": "^[0-9]+(\\.[0-9]{2})$", "examples": [ 2027.5 ] }, "currency": { "type": "string", "description": "ISO 4217 currency code.", "pattern": "^[A-Z]{3}$", "examples": [ "USD", "EUR" ] }, "terms": { "type": "string", "description": "Payment terms and conditions.", "examples": [ "Payment is due within 15 days.", "Net 30 days." ] } }, "required": [ "invoiceId", "date", "dueDate", "billTo", "items", "subtotal", "tax", "total", "currency" ] }, "fast_extraction": false, "description": "Generic invoice document for shop orders, all values are in USD if not otherwise stated." }
(object | object)
    One of (object | object)
        #0 object
            file (string | string | string | string | string | string | string | string | string | string | string | string | string | string | string | string)
                Any of (string | string | string | string | string | string | string | string | string | string | string | string | string | string | string | string)
                    #0 string binarymedia type: application/pdf
                    .pdf file (Adobe Acrobat)

                    #1 string binarymedia type: text/csv
                    .csv file (Comma-Separated Values)

                    #2 string binarymedia type: application/msword
                    .doc file (Microsoft Word)

                    #3 string binarymedia type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
                    .docx file (Microsoft Word)

                    #4 string binarymedia type: application/vnd.ms-excel
                    .xls file (Microsoft Excel)

                    #5 string binarymedia type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
                    .xlsx file (Microsoft Excel)

                    #6 string binarymedia type: application/vnd.oasis.opendocument.spreadsheet
                    .ods file (Open Document Sheet)

                    #7 string binarymedia type: application/vnd.oasis.opendocument.text
                    .odt file (Open Document Text)

                    #8 string binarymedia type: application/vnd.apple.numbers
                    .numbers file (Apple Numbers)

                    #9 string binarymedia type: application/vnd.apple.pages
                    .pages file (Apple Pages)

                    #10 string binarymedia type: image/jpeg
                    .jpg file (JPEG Image)

                    #11 string binarymedia type: image/png
                    .png file (PNG Image)

                    #12 string binarymedia type: text/plain
                    .txt file (Plaintext)

                    #13 string binarymedia type: audio/mpeg
                    .mp3 file (MP3 Audio)

                    #14 string binarymedia type: audio/wav
                    .wav file (Waveform Audio)

                    #15 string binarymedia type: audio/ogg
                    .ogg/.oga file (Ogg Audio)

            json_schema stringmedia type: application/json
            Stringified JSON schema describing the desired result.

            fast_extraction string
            Enable fast extraction method for simple documents, which significantly increases processing speed, but potentially reduces accuracy.

                Enum array
                    #0=true
                    #1=false
                Default=false    
            validation string
            Validation policy to apply. 'lax' collects concerns but does not fail the job; 'strict' fails if max_errors is reached or overall invalid; 'none' disables validation.

                Enum array
                    #0="lax"
                    #1="strict"
                    #2="none"
                Default="lax"
            description string≤ 1000 characters
            Optional description of or context for the provided file.

        #1 object
            file_url stringuri
            Publically accessible URL to the file to be processed. (See ProcessRequestFile for supported file formats)

            json_schema stringmedia type: application/json
            Stringified JSON schema describing the desired result.

            fast_extraction string
            Enable fast extraction method for simple documents, which significantly increases processing speed, but potentially reduces accuracy.

                Enum array
                    #0=true
                    #1=false
                Default=false
            validation string
            Validation policy to apply. 'lax' collects concerns but does not fail the job; 'strict' fails if max_errors is reached or overall invalid; 'none' disables validation.

                Enum array
                    #0="lax"
                    #1="strict"
                    #2="none"
                Default="lax"
            description string≤ 1000 characters
            Optional description of or context for the provided file.

Response

Code Description Links
202

Processing request accepted and queued.

Media type

Controls Accept header.
{ "correlation_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6", "job_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6", "status": "queued", "start_time": "2025-09-09T06:41:31.848Z", "estimated_time_seconds": 0, "message": "string", "filename": "string" }
ProcessResponse object
    correlation_id string uuid
    Unique correlation ID for the request.

    job_id string uuid
    Unique job ID for polling status.

    status string
    Initial status of the request.

        Enum array
            #0"queued"
            #1"processing"
            #2"failed"
            #3"success"
            #4"cancelled"

    start_time string date-time
    ISO 8601 timestamp when processing started.

    estimated_time_seconds integer
    Estimated time in seconds for the processing to finish. Only present if status is queued or processing.

    message string
    Informational message about the request.

    filename string
    Original name of the submitted or linked file, including extension
400

Bad Request. Invalid input parameters.

Media type

{ "detail": "string" }
ErrorResponse object
    detail string
    Error message detailing what went wrong.
401

Unauthorized. Missing or invalid API key.

Media type

{ "detail": "string" }
ErrorResponse object
    detail string
    Error message detailing what went wrong.
413

Payload Too Large. Submitted payload is larger than the maximum allowable size.

Media type

{ "detail": "string" }
ErrorResponse object
    detail string
    Error message detailing what went wrong.
415

Unsupported Media Type. The server does not support the provided media type.

Media type

{ "detail": "string" }
ErrorResponse object
    detail string
    Error message detailing what went wrong.
429

Too Many Requests. Wait a minute and try again.

Media type

{ "detail": "string" }
ErrorResponse object
    detail string
    Error message detailing what went wrong.
500

Server error.

Media type

{ "detail": "string" }
ErrorResponse object
    detail string
    Error message detailing what went wrong.
GET
/process/{job_id}
Get Processing Status

Retrieve the status and result of a processing job using its ID.

No parameters

Name Description

Processing request accepted and queued.

Media type

Controls Accept header.
400

Bad Request. Invalid input parameters.

Media type

401

Unauthorized. Missing or invalid API key.

Media type

413

Payload Too Large. Submitted payload is larger than the maximum allowable size.

Media type

Response

Code Description Links
200

Processing status retrieved successfully.

Media type

Controls Accept header.
{ "correlation_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6", "job_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6", "status": "queued", "start_time": "2025-09-10T10:08:56.445Z", "estimated_time_seconds": 0, "finish_time": "2025-09-10T10:08:56.445Z", "message": "string", "filename": "string", "result": {}, "json_schema": {}, "markdown": "string", "validation_result": { "concerns": [ { "path": "string", "text": "string", "level": "error", "code": "missing_value" } ], "summary": "string" } }
ProcessStatusResponse object
    correlation_id string uuid
    Correlation ID of the request.

    job_id string uuid
    Job ID for polling.

    status string
    Current status of the processing job.

        Enum array
        #0"queued"
        #1"processing"
        #2"failed"
        #3"success"
        #4"cancelled"

    start_time string date-time
    ISO 8601 timestamp when processing started.

    estimated_time_seconds integer
    Estimated time in seconds for the processing to finish. Only present if status is queued or processing.

    finish_time string | null date-time
    ISO 8601 timestamp when processing finished, or null if not finished.

    message string
    Status message, if any.

    filename string
    Original name of the submitted or linked file, including extension

    result object | null
    Processing result following the provided JSON schema, or null if not finished.

    json_schema object
    JSON schema used to create the result JSON. Only present if status is finished and include-schema is true.

    markdown string
    Markdown representation of the source data. Only present if status is finished and include-markdown is true.

    validation_result object
    Validation result of the extracted JSON. Only present if status is finished and validate is true.

        concerns array
        List of concerns with their JSON paths.

            Items object
                path string
                JSON path of the field related to the concern

                text string
                Human-readable description of the concern

                level string
                Severity level of the concern

                    Enum array
                    #0"error"
                    #1"warning"
                    #2"info"

                code string
                Code of the concern

                    Enum array
                    #0"missing_value"
                    #1"null_value"
                    #2"additional_value"
                    #3"format_inconsistent"
                    #4"numeric_mismatch"
                    #5"floating_precision_diff"
                    #6"semantic_conflict"
                    #7"array_length_mismatch"
                    #8"type_mismatch"
                    #9"out_of_range"
                    #10"duplicate_value"
                    #11"incomplete_object"
                    #12"extra_fields"
                    #13"order_difference"

        summary string
        Executive summary of the validation results.
                                        
                                                                        
                                
401

Unauthorized. Missing or invalid API key.

Media type

{ "detail": "string" }
ErrorResponse object
    detail string
    Error message detailing what went wrong.
404

Not Found. No job found with the provided ID.

Media type

{ "detail": "string" }
ErrorResponse object
    detail string
    Error message detailing what went wrong.
429

Too Many Requests. Wait a minute and try again.

Media type

{ "detail": "string" }
ErrorResponse object
    detail string
    Error message detailing what went wrong.
500

Server error.

Media type

{ "detail": "string" }
ErrorResponse object
    detail string
    Error message detailing what went wrong.
DELETE
/process/{job_id}
Cancel Processing Job

Cancel a running job by process_id.

No parameters

Name Description

Processing request accepted and queued.

Media type

Controls Accept header.
400

Bad Request. Invalid input parameters.

Media type

401

Unauthorized. Missing or invalid API key.

Media type

413

Payload Too Large. Submitted payload is larger than the maximum allowable size.

Media type

Response

Code Description Links
202

Cancellation request accepted.

400

Bad Request.

Media type

{ "detail": "string" }
ErrorResponse object
    detail string
    Error message detailing what went wrong.
401

Unauthorized. Missing or invalid API key.

Media type

{ "detail": "string" }
ErrorResponse object
    detail string
    Error message detailing what went wrong.
404

Not Found. No job found with the provided ID.

Media type

{ "detail": "string" }
ErrorResponse object
    detail string
    Error message detailing what went wrong.
409

Conflict. Job is already finished or cancelled.

Media type

{ "detail": "string" }
ErrorResponse object
    detail string
    Error message detailing what went wrong.
429

Too Many Requests. Wait a minute and try again.

Media type

{ "detail": "string" }
ErrorResponse object
    detail string
    Error message detailing what went wrong.
500

Server error.

Media type

{ "detail": "string" }
ErrorResponse object
    detail string
    Error message detailing what went wrong.