Appearance
Document Ingestion
Upload and process documents for semantic search and RAG.
Supported Formats
| Format | Extensions | Notes |
|---|---|---|
| Text and scanned (via OCR) | ||
| Word | .docx, .doc | Full formatting preserved |
| Text | .txt | Plain text |
| Markdown | .md | Rendered to text |
| HTML | .html | Tags stripped |
Upload Documents
Single Document
bash
curl -X POST https://api.gateflow.ai/v1/data/documents \
-H "Authorization: Bearer gw_prod_..." \
-F "file=@contract.pdf" \
-F "classification=confidential" \
-F "metadata={\"department\": \"legal\", \"year\": 2024}"Response:
json
{
"document_id": "doc_abc123",
"filename": "contract.pdf",
"status": "processing",
"classification": "confidential",
"created_at": "2024-01-15T10:30:00Z"
}Batch Upload
bash
curl -X POST https://api.gateflow.ai/v1/data/documents/batch \
-H "Authorization: Bearer gw_prod_..." \
-F "files=@doc1.pdf" \
-F "files=@doc2.pdf" \
-F "files=@doc3.pdf" \
-F "classification=internal"Processing Pipeline
Processing Status
bash
curl https://api.gateflow.ai/v1/data/documents/doc_abc123 \
-H "Authorization: Bearer gw_prod_..."Response:
json
{
"document_id": "doc_abc123",
"status": "ready",
"pages": 15,
"chunks": 45,
"pii_detected": true,
"pii_entities": 3
}Status Values
| Status | Description |
|---|---|
uploading | File being received |
processing | Being parsed and embedded |
ready | Available for search |
failed | Processing error |
Configuration
Chunking
json
{
"chunking": {
"strategy": "semantic",
"max_chunk_size": 1000,
"overlap": 100
}
}PII Handling
json
{
"pii": {
"detect": true,
"action": "redact",
"types": ["PERSON", "SSN", "PHONE"]
}
}Next Steps
- PII Detection - Configure PII handling
- Semantic Search - Query documents