Skip to content

Document Ingestion

Upload and process documents for semantic search and RAG.

Supported Formats

FormatExtensionsNotes
PDF.pdfText and scanned (via OCR)
Word.docx, .docFull formatting preserved
Text.txtPlain text
Markdown.mdRendered to text
HTML.htmlTags stripped

Upload Documents

Single Document

bash
curl -X POST https://api.gateflow.ai/v1/data/documents \
  -H "Authorization: Bearer gw_prod_..." \
  -F "file=@contract.pdf" \
  -F "classification=confidential" \
  -F "metadata={\"department\": \"legal\", \"year\": 2024}"

Response:

json
{
  "document_id": "doc_abc123",
  "filename": "contract.pdf",
  "status": "processing",
  "classification": "confidential",
  "created_at": "2024-01-15T10:30:00Z"
}

Batch Upload

bash
curl -X POST https://api.gateflow.ai/v1/data/documents/batch \
  -H "Authorization: Bearer gw_prod_..." \
  -F "files=@doc1.pdf" \
  -F "files=@doc2.pdf" \
  -F "files=@doc3.pdf" \
  -F "classification=internal"

Processing Pipeline

Processing Status

bash
curl https://api.gateflow.ai/v1/data/documents/doc_abc123 \
  -H "Authorization: Bearer gw_prod_..."

Response:

json
{
  "document_id": "doc_abc123",
  "status": "ready",
  "pages": 15,
  "chunks": 45,
  "pii_detected": true,
  "pii_entities": 3
}

Status Values

StatusDescription
uploadingFile being received
processingBeing parsed and embedded
readyAvailable for search
failedProcessing error

Configuration

Chunking

json
{
  "chunking": {
    "strategy": "semantic",
    "max_chunk_size": 1000,
    "overlap": 100
  }
}

PII Handling

json
{
  "pii": {
    "detect": true,
    "action": "redact",
    "types": ["PERSON", "SSN", "PHONE"]
  }
}

Next Steps

Built with reliability in mind.