Documents API

Manage documents in the Data Pillar for RAG (Retrieval-Augmented Generation) pipelines.

Overview

The Documents API provides endpoints to:

Ingest documents for RAG pipelines
List and retrieve documents
Delete documents (GDPR Article 17 compliant)
Get document chunks for direct injection

Documents are automatically processed through:

Text extraction (PDF, DOCX, TXT, MD)
PII/PHI detection and optional redaction
Chunking (semantic, fixed, or paragraph-based)
Embedding generation
Vector storage for semantic search

Endpoints

Method	Endpoint	Description
POST	`/v1/data/documents`	Upload and process document
GET	`/v1/data/documents`	List documents
GET	`/v1/data/documents/:id`	Get document details
GET	`/v1/data/documents/:id/chunks`	Get document chunks
DELETE	`/v1/data/documents/:id`	Delete document

Upload Document

Upload and process a document for RAG.

POST /v1/data/documents
Content-Type: multipart/form-data

Request

cURLPython

bash

curl -X POST https://api.gateflow.ai/v1/data/documents \
  -H "Authorization: Bearer gw_prod_..." \
  -F "file=@document.pdf" \
  -F "name=Company Policies" \
  -F "data_classification=internal" \
  -F "chunking_strategy=semantic" \
  -F "chunk_size_tokens=512" \
  -F "residency_region=eu"

python

import requests

response = requests.post(
    "https://api.gateflow.ai/v1/data/documents",
    headers={"Authorization": "Bearer gw_prod_..."},
    files={"file": open("document.pdf", "rb")},
    data={
        "name": "Company Policies",
        "data_classification": "internal",
        "chunking_strategy": "semantic",
        "chunk_size_tokens": 512,
        "residency_region": "eu"
    }
)

Parameters

Parameter	Type	Required	Default	Description
`file`	file	Yes	-	Document file to upload
`name`	string	No	filename	Document name
`data_classification`	string	No	`internal`	Classification level
`chunking_strategy`	string	No	`semantic`	How to split document
`chunk_size_tokens`	integer	No	512	Target chunk size (100-2000)
`chunk_overlap_tokens`	integer	No	50	Overlap between chunks (0-200)
`residency_region`	string	No	`eu`	Data residency region
`retention_days`	integer	No	null	Auto-delete after N days

Data Classifications

Level	Description
`public`	No restrictions
`internal`	Internal use only
`confidential`	Sensitive business data
`phi`	Protected Health Information
`privileged`	Attorney-client privilege

Chunking Strategies

Strategy	Description
`semantic`	Split on semantic boundaries (paragraphs, sections)
`fixed`	Fixed-size chunks with overlap
`paragraph`	Split on paragraph boundaries

Response

json

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "name": "Company Policies",
  "original_filename": "document.pdf",
  "mime_type": "application/pdf",
  "file_size_bytes": 102400,
  "data_classification": "internal",
  "chunking_strategy": "semantic",
  "chunk_count": 45,
  "embedding_model": "text-embedding-3-small",
  "status": "completed",
  "error_message": null,
  "residency_region": "eu",
  "retention_days": null,
  "metadata": {},
  "created_at": "2024-01-15T10:30:00Z",
  "updated_at": "2024-01-15T10:30:15Z",
  "processing_time_seconds": 15.2
}

List Documents

List documents for the current organization.

GET /v1/data/documents

Parameters

Parameter	Type	Default	Description
`page`	integer	1	Page number
`page_size`	integer	20	Items per page (1-100)
`status`	string	null	Filter by status
`classification`	string	null	Filter by classification

Response

json

{
  "documents": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "name": "Company Policies",
      "status": "completed",
      "chunk_count": 45,
      "created_at": "2024-01-15T10:30:00Z"
    }
  ],
  "total": 150,
  "page": 1,
  "page_size": 20,
  "has_more": true
}

Get Document

Get document details by ID.

GET /v1/data/documents/:id

Get Document Chunks

Retrieve document chunks for full document injection into prompts.

GET /v1/data/documents/:id/chunks

Parameters

Parameter	Type	Default	Description
`limit`	integer	50	Maximum chunks (1-500)
`offset`	integer	0	Skip N chunks

Response

json

{
  "document_id": "550e8400-e29b-41d4-a716-446655440000",
  "document_name": "Company Policies",
  "total_chunks": 45,
  "total_tokens": 23400,
  "chunks": [
    {
      "chunk_id": "chunk-001",
      "chunk_index": 0,
      "content": "Chapter 1: Introduction...",
      "token_count": 512,
      "page_number": 1,
      "section_header": "Introduction",
      "metadata": {}
    }
  ],
  "offset": 0,
  "limit": 50,
  "has_more": false
}

Delete Document

Delete a document and all its chunks. Implements GDPR Article 17 (Right to Erasure).

DELETE /v1/data/documents/:id

Response

json

{
  "message": "Document deleted successfully",
  "document_id": "550e8400-e29b-41d4-a716-446655440000",
  "chunks_deleted": 45
}

Document Status

Status	Description
`pending`	Awaiting processing
`processing`	Being processed
`completed`	Ready for search
`failed`	Processing failed

Supported File Types

Type	Extensions	Max Size
PDF	`.pdf`	50MB
Word	`.docx`, `.doc`	25MB
Text	`.txt`, `.md`	10MB
HTML	`.html`, `.htm`	10MB

Error Codes

Code	Description
400	Invalid file type or parameters
404	Document not found
413	File too large
422	Processing failed

Documents API ​

Overview ​

Endpoints ​

Upload Document ​

Request ​

Parameters ​

Data Classifications ​

Chunking Strategies ​

Response ​

List Documents ​

Parameters ​

Response ​

Get Document ​

Get Document Chunks ​

Parameters ​

Response ​

Delete Document ​

Response ​

Document Status ​

Supported File Types ​

Error Codes ​

See Also ​

Documents API

Overview

Endpoints

Upload Document

Request

Parameters

Data Classifications

Chunking Strategies

Response

List Documents

Parameters

Response

Get Document

Get Document Chunks

Parameters

Response

Delete Document

Response

Document Status

Supported File Types

Error Codes

See Also