Appearance
Documents API
Manage documents in the Data Pillar for RAG (Retrieval-Augmented Generation) pipelines.
Overview
The Documents API provides endpoints to:
- Ingest documents for RAG pipelines
- List and retrieve documents
- Delete documents (GDPR Article 17 compliant)
- Get document chunks for direct injection
Documents are automatically processed through:
- Text extraction (PDF, DOCX, TXT, MD)
- PII/PHI detection and optional redaction
- Chunking (semantic, fixed, or paragraph-based)
- Embedding generation
- Vector storage for semantic search
Endpoints
| Method | Endpoint | Description |
|---|---|---|
| POST | /v1/data/documents | Upload and process document |
| GET | /v1/data/documents | List documents |
| GET | /v1/data/documents/:id | Get document details |
| GET | /v1/data/documents/:id/chunks | Get document chunks |
| DELETE | /v1/data/documents/:id | Delete document |
Upload Document
Upload and process a document for RAG.
POST /v1/data/documents
Content-Type: multipart/form-dataRequest
bash
curl -X POST https://api.gateflow.ai/v1/data/documents \
-H "Authorization: Bearer gw_prod_..." \
-F "file=@document.pdf" \
-F "name=Company Policies" \
-F "data_classification=internal" \
-F "chunking_strategy=semantic" \
-F "chunk_size_tokens=512" \
-F "residency_region=eu"python
import requests
response = requests.post(
"https://api.gateflow.ai/v1/data/documents",
headers={"Authorization": "Bearer gw_prod_..."},
files={"file": open("document.pdf", "rb")},
data={
"name": "Company Policies",
"data_classification": "internal",
"chunking_strategy": "semantic",
"chunk_size_tokens": 512,
"residency_region": "eu"
}
)Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file | file | Yes | - | Document file to upload |
name | string | No | filename | Document name |
data_classification | string | No | internal | Classification level |
chunking_strategy | string | No | semantic | How to split document |
chunk_size_tokens | integer | No | 512 | Target chunk size (100-2000) |
chunk_overlap_tokens | integer | No | 50 | Overlap between chunks (0-200) |
residency_region | string | No | eu | Data residency region |
retention_days | integer | No | null | Auto-delete after N days |
Data Classifications
| Level | Description |
|---|---|
public | No restrictions |
internal | Internal use only |
confidential | Sensitive business data |
phi | Protected Health Information |
privileged | Attorney-client privilege |
Chunking Strategies
| Strategy | Description |
|---|---|
semantic | Split on semantic boundaries (paragraphs, sections) |
fixed | Fixed-size chunks with overlap |
paragraph | Split on paragraph boundaries |
Response
json
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"name": "Company Policies",
"original_filename": "document.pdf",
"mime_type": "application/pdf",
"file_size_bytes": 102400,
"data_classification": "internal",
"chunking_strategy": "semantic",
"chunk_count": 45,
"embedding_model": "text-embedding-3-small",
"status": "completed",
"error_message": null,
"residency_region": "eu",
"retention_days": null,
"metadata": {},
"created_at": "2024-01-15T10:30:00Z",
"updated_at": "2024-01-15T10:30:15Z",
"processing_time_seconds": 15.2
}List Documents
List documents for the current organization.
GET /v1/data/documentsParameters
| Parameter | Type | Default | Description |
|---|---|---|---|
page | integer | 1 | Page number |
page_size | integer | 20 | Items per page (1-100) |
status | string | null | Filter by status |
classification | string | null | Filter by classification |
Response
json
{
"documents": [
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"name": "Company Policies",
"status": "completed",
"chunk_count": 45,
"created_at": "2024-01-15T10:30:00Z"
}
],
"total": 150,
"page": 1,
"page_size": 20,
"has_more": true
}Get Document
Get document details by ID.
GET /v1/data/documents/:idGet Document Chunks
Retrieve document chunks for full document injection into prompts.
GET /v1/data/documents/:id/chunksParameters
| Parameter | Type | Default | Description |
|---|---|---|---|
limit | integer | 50 | Maximum chunks (1-500) |
offset | integer | 0 | Skip N chunks |
Response
json
{
"document_id": "550e8400-e29b-41d4-a716-446655440000",
"document_name": "Company Policies",
"total_chunks": 45,
"total_tokens": 23400,
"chunks": [
{
"chunk_id": "chunk-001",
"chunk_index": 0,
"content": "Chapter 1: Introduction...",
"token_count": 512,
"page_number": 1,
"section_header": "Introduction",
"metadata": {}
}
],
"offset": 0,
"limit": 50,
"has_more": false
}Delete Document
Delete a document and all its chunks. Implements GDPR Article 17 (Right to Erasure).
DELETE /v1/data/documents/:idResponse
json
{
"message": "Document deleted successfully",
"document_id": "550e8400-e29b-41d4-a716-446655440000",
"chunks_deleted": 45
}Document Status
| Status | Description |
|---|---|
pending | Awaiting processing |
processing | Being processed |
completed | Ready for search |
failed | Processing failed |
Supported File Types
| Type | Extensions | Max Size |
|---|---|---|
.pdf | 50MB | |
| Word | .docx, .doc | 25MB |
| Text | .txt, .md | 10MB |
| HTML | .html, .htm | 10MB |
Error Codes
| Code | Description |
|---|---|
| 400 | Invalid file type or parameters |
| 404 | Document not found |
| 413 | File too large |
| 422 | Processing failed |
See Also
- Search API - Query documents
- PII Detection - PII/PHI handling
- Data Isolation - Multi-tenancy