Appearance
Document Tools
Tools for processing, extracting, and managing documents.
Available Tools
| Tool | Description | Permission |
|---|---|---|
document/ocr | Extract text from images/PDFs | document/ocr |
document/process | Process and chunk documents | document/process |
document/status | Check processing status | document/status |
document/list | List processed documents | document/list |
document/delete | Delete a document | document/delete |
document/ocr
Extract text from images and PDFs using OCR.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
file | string | Yes | Base64-encoded file or file URL |
model | string | No | OCR model (default: mistral-document-ai) |
pages | array | No | Specific pages to process |
extract_tables | boolean | No | Extract tables as structured data |
language | string | No | Hint for language detection |
Example
python
import base64
from gateflow_mcp import MCPClient
client = MCPClient(agent_id="agent_abc123", api_key="gf-agent-...")
# Read and encode document
with open("contract.pdf", "rb") as f:
file_b64 = base64.b64encode(f.read()).decode()
# Extract text
result = client.call_tool(
name="document/ocr",
arguments={
"file": file_b64,
"model": "mistral-document-ai",
"extract_tables": True
}
)
print(f"Extracted {result['pages']} pages")
print(f"Text: {result['text'][:500]}...")
# Access tables
for table in result.get("tables", []):
print(f"Table on page {table['page']}: {table['rows']} rows")Response
json
{
"text": "CONSULTING AGREEMENT\n\nThis Agreement is entered into...",
"pages": 5,
"tables": [
{
"page": 2,
"rows": 10,
"columns": 4,
"data": [["Item", "Qty", "Price", "Total"], ...]
}
],
"metadata": {
"model": "mistral-document-ai",
"processing_time_ms": 2500,
"confidence": 0.95
}
}document/process
Process a document for storage and retrieval.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
file | string | Yes | Base64-encoded file |
filename | string | Yes | Original filename |
collection | string | No | Target collection |
chunk_size | integer | No | Characters per chunk (default: 1000) |
chunk_overlap | integer | No | Overlap between chunks (default: 200) |
detect_pii | boolean | No | Scan for PII/PHI |
classification | string | No | Data classification level |
metadata | object | No | Custom metadata |
Example
python
result = client.call_tool(
name="document/process",
arguments={
"file": file_b64,
"filename": "employee_handbook.pdf",
"collection": "hr-documents",
"chunk_size": 1000,
"chunk_overlap": 200,
"detect_pii": True,
"classification": "internal",
"metadata": {
"department": "HR",
"version": "2026"
}
}
)
print(f"Document ID: {result['document_id']}")
print(f"Chunks created: {result['chunks']}")
print(f"PII findings: {result['pii_count']}")Response
json
{
"document_id": "doc_xyz789",
"status": "processed",
"filename": "employee_handbook.pdf",
"pages": 45,
"chunks": 120,
"characters": 95000,
"pii_count": 5,
"pii_types": ["email", "phone_number"],
"collection": "hr-documents",
"classification": "internal",
"processed_at": "2026-02-16T10:00:00Z"
}document/status
Check the processing status of a document.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
document_id | string | Yes | Document identifier |
Example
python
result = client.call_tool(
name="document/status",
arguments={"document_id": "doc_xyz789"}
)
print(f"Status: {result['status']}")
print(f"Progress: {result['progress']}%")Response
json
{
"document_id": "doc_xyz789",
"status": "processing",
"progress": 65,
"stages": {
"upload": "complete",
"ocr": "complete",
"chunking": "in_progress",
"embedding": "pending",
"pii_scan": "pending"
},
"started_at": "2026-02-16T10:00:00Z",
"estimated_completion": "2026-02-16T10:02:00Z"
}document/list
List documents in a collection.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
collection | string | No | Filter by collection |
status | string | No | Filter by status |
limit | integer | No | Max results (default: 20) |
offset | integer | No | Pagination offset |
Example
python
result = client.call_tool(
name="document/list",
arguments={
"collection": "hr-documents",
"status": "processed",
"limit": 10
}
)
for doc in result["documents"]:
print(f"{doc['filename']}: {doc['pages']} pages, {doc['chunks']} chunks")Response
json
{
"documents": [
{
"document_id": "doc_xyz789",
"filename": "employee_handbook.pdf",
"collection": "hr-documents",
"status": "processed",
"pages": 45,
"chunks": 120,
"created_at": "2026-02-16T10:00:00Z"
}
],
"total": 15,
"limit": 10,
"offset": 0
}document/delete
Delete a document and its embeddings.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
document_id | string | Yes | Document to delete |
Example
python
result = client.call_tool(
name="document/delete",
arguments={"document_id": "doc_xyz789"}
)
print(f"Deleted: {result['deleted']}")Response
json
{
"document_id": "doc_xyz789",
"deleted": true,
"chunks_removed": 120,
"deleted_at": "2026-02-16T11:00:00Z"
}Supported File Types
| Type | Extensions | Max Size |
|---|---|---|
.pdf | 50 MB | |
| Word | .docx, .doc | 50 MB |
| Text | .txt, .md | 10 MB |
| Images | .png, .jpg, .tiff | 20 MB |
| Spreadsheets | .xlsx, .csv | 50 MB |
| HTML | .html | 10 MB |
Permissions
Grant document tool access:
yaml
permissions:
tools:
- document/ocr # Extract text
- document/process # Process documents
- document/status # Check status
- document/list # List documents
- document/delete # Delete documents
collections:
- hr-documents # Specific collection access
- legal-contractsBest Practices
- Use collections - Organize documents logically
- Enable PII detection - For sensitive documents
- Set classification - Apply appropriate data classification
- Add metadata - Improve searchability
- Check status - Poll for completion on large documents
Next Steps
- Retrieval Tools - Search documents
- PII Detection - PII handling
- Document Ingestion - Processing guide