Document Tools

Tools for processing, extracting, and managing documents.

Available Tools

Tool	Description	Permission
`document/ocr`	Extract text from images/PDFs	`document/ocr`
`document/process`	Process and chunk documents	`document/process`
`document/status`	Check processing status	`document/status`
`document/list`	List processed documents	`document/list`
`document/delete`	Delete a document	`document/delete`

document/ocr

Extract text from images and PDFs using OCR.

Parameters

Parameter	Type	Required	Description
`file`	string	Yes	Base64-encoded file or file URL
`model`	string	No	OCR model (default: `mistral-document-ai`)
`pages`	array	No	Specific pages to process
`extract_tables`	boolean	No	Extract tables as structured data
`language`	string	No	Hint for language detection

Example

python

import base64
from gateflow_mcp import MCPClient

client = MCPClient(agent_id="agent_abc123", api_key="gf-agent-...")

# Read and encode document
with open("contract.pdf", "rb") as f:
    file_b64 = base64.b64encode(f.read()).decode()

# Extract text
result = client.call_tool(
    name="document/ocr",
    arguments={
        "file": file_b64,
        "model": "mistral-document-ai",
        "extract_tables": True
    }
)

print(f"Extracted {result['pages']} pages")
print(f"Text: {result['text'][:500]}...")

# Access tables
for table in result.get("tables", []):
    print(f"Table on page {table['page']}: {table['rows']} rows")

Response

json

{
  "text": "CONSULTING AGREEMENT\n\nThis Agreement is entered into...",
  "pages": 5,
  "tables": [
    {
      "page": 2,
      "rows": 10,
      "columns": 4,
      "data": [["Item", "Qty", "Price", "Total"], ...]
    }
  ],
  "metadata": {
    "model": "mistral-document-ai",
    "processing_time_ms": 2500,
    "confidence": 0.95
  }
}

document/process

Process a document for storage and retrieval.

Parameters

Parameter	Type	Required	Description
`file`	string	Yes	Base64-encoded file
`filename`	string	Yes	Original filename
`collection`	string	No	Target collection
`chunk_size`	integer	No	Characters per chunk (default: 1000)
`chunk_overlap`	integer	No	Overlap between chunks (default: 200)
`detect_pii`	boolean	No	Scan for PII/PHI
`classification`	string	No	Data classification level
`metadata`	object	No	Custom metadata

Example

python

result = client.call_tool(
    name="document/process",
    arguments={
        "file": file_b64,
        "filename": "employee_handbook.pdf",
        "collection": "hr-documents",
        "chunk_size": 1000,
        "chunk_overlap": 200,
        "detect_pii": True,
        "classification": "internal",
        "metadata": {
            "department": "HR",
            "version": "2026"
        }
    }
)

print(f"Document ID: {result['document_id']}")
print(f"Chunks created: {result['chunks']}")
print(f"PII findings: {result['pii_count']}")

Response

json

{
  "document_id": "doc_xyz789",
  "status": "processed",
  "filename": "employee_handbook.pdf",
  "pages": 45,
  "chunks": 120,
  "characters": 95000,
  "pii_count": 5,
  "pii_types": ["email", "phone_number"],
  "collection": "hr-documents",
  "classification": "internal",
  "processed_at": "2026-02-16T10:00:00Z"
}

document/status

Check the processing status of a document.

Parameters

Parameter	Type	Required	Description
`document_id`	string	Yes	Document identifier

Example

python

result = client.call_tool(
    name="document/status",
    arguments={"document_id": "doc_xyz789"}
)

print(f"Status: {result['status']}")
print(f"Progress: {result['progress']}%")

Response

json

{
  "document_id": "doc_xyz789",
  "status": "processing",
  "progress": 65,
  "stages": {
    "upload": "complete",
    "ocr": "complete",
    "chunking": "in_progress",
    "embedding": "pending",
    "pii_scan": "pending"
  },
  "started_at": "2026-02-16T10:00:00Z",
  "estimated_completion": "2026-02-16T10:02:00Z"
}

document/list

List documents in a collection.

Parameters

Parameter	Type	Required	Description
`collection`	string	No	Filter by collection
`status`	string	No	Filter by status
`limit`	integer	No	Max results (default: 20)
`offset`	integer	No	Pagination offset

Example

python

result = client.call_tool(
    name="document/list",
    arguments={
        "collection": "hr-documents",
        "status": "processed",
        "limit": 10
    }
)

for doc in result["documents"]:
    print(f"{doc['filename']}: {doc['pages']} pages, {doc['chunks']} chunks")

Response

json

{
  "documents": [
    {
      "document_id": "doc_xyz789",
      "filename": "employee_handbook.pdf",
      "collection": "hr-documents",
      "status": "processed",
      "pages": 45,
      "chunks": 120,
      "created_at": "2026-02-16T10:00:00Z"
    }
  ],
  "total": 15,
  "limit": 10,
  "offset": 0
}

document/delete

Delete a document and its embeddings.

Parameters

Parameter	Type	Required	Description
`document_id`	string	Yes	Document to delete

Example

python

result = client.call_tool(
    name="document/delete",
    arguments={"document_id": "doc_xyz789"}
)

print(f"Deleted: {result['deleted']}")

Response

json

{
  "document_id": "doc_xyz789",
  "deleted": true,
  "chunks_removed": 120,
  "deleted_at": "2026-02-16T11:00:00Z"
}

Supported File Types

Type	Extensions	Max Size
PDF	`.pdf`	50 MB
Word	`.docx`, `.doc`	50 MB
Text	`.txt`, `.md`	10 MB
Images	`.png`, `.jpg`, `.tiff`	20 MB
Spreadsheets	`.xlsx`, `.csv`	50 MB
HTML	`.html`	10 MB

Permissions

Grant document tool access:

yaml

permissions:
  tools:
    - document/ocr        # Extract text
    - document/process    # Process documents
    - document/status     # Check status
    - document/list       # List documents
    - document/delete     # Delete documents
  collections:
    - hr-documents        # Specific collection access
    - legal-contracts

Best Practices

Use collections - Organize documents logically
Enable PII detection - For sensitive documents
Set classification - Apply appropriate data classification
Add metadata - Improve searchability
Check status - Poll for completion on large documents

Next Steps

Retrieval Tools - Search documents
PII Detection - PII handling
Document Ingestion - Processing guide

Document Tools ​

Available Tools ​

document/ocr ​

Parameters ​

Example ​

Response ​

document/process ​

Parameters ​

Example ​

Response ​

document/status ​

Parameters ​

Example ​

Response ​

document/list ​

Parameters ​

Example ​

Response ​

document/delete ​

Parameters ​

Example ​

Response ​

Supported File Types ​

Permissions ​

Best Practices ​

Next Steps ​

Document Tools

Available Tools

document/ocr

Parameters

Example

Response

document/process

Parameters

Example

Response

document/status

Parameters

Example

Response

document/list

Parameters

Example

Response

document/delete

Parameters

Example

Response

Supported File Types

Permissions

Best Practices

Next Steps