LLM Tools

Tools for accessing AI language models.

Available Tools

Tool	Description	Permission
`llm/chat`	Send messages to an LLM	`llm/chat`
`llm/embed`	Generate text embeddings	`llm/embed`
`llm/list_models`	List available models	`llm/list_models`

llm/chat

Send messages to a language model and get a response.

Parameters

Parameter	Type	Required	Description
`messages`	array	Yes	Conversation messages
`model`	string	No	Model to use (default: agent's default)
`temperature`	number	No	Sampling temperature (0-2)
`max_tokens`	integer	No	Maximum response tokens
`system`	string	No	System prompt
`tools`	array	No	Function calling tools
`stream`	boolean	No	Stream response

Basic Example

python

from gateflow_mcp import MCPClient

client = MCPClient(agent_id="agent_abc123", api_key="gf-agent-...")

result = client.call_tool(
    name="llm/chat",
    arguments={
        "messages": [
            {"role": "user", "content": "What is the capital of France?"}
        ],
        "model": "gpt-5-mini"
    }
)

print(result["content"])
# Output: The capital of France is Paris.

With System Prompt

python

result = client.call_tool(
    name="llm/chat",
    arguments={
        "messages": [
            {"role": "user", "content": "Summarize this document..."}
        ],
        "model": "gpt-5.2",
        "system": "You are a helpful assistant that summarizes documents concisely.",
        "temperature": 0.3,
        "max_tokens": 500
    }
)

Conversation History

python

result = client.call_tool(
    name="llm/chat",
    arguments={
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "My name is Alice."},
            {"role": "assistant", "content": "Hello Alice! How can I help you today?"},
            {"role": "user", "content": "What's my name?"}
        ],
        "model": "gpt-5-mini"
    }
)

print(result["content"])
# Output: Your name is Alice.

Response

json

{
  "content": "The capital of France is Paris.",
  "model": "gpt-5-mini",
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 8,
    "total_tokens": 23
  },
  "finish_reason": "stop",
  "cost": 0.000023
}

llm/embed

Generate vector embeddings for text.

Parameters

Parameter	Type	Required	Description
`input`	string/array	Yes	Text to embed
`model`	string	No	Embedding model

Single Text

python

result = client.call_tool(
    name="llm/embed",
    arguments={
        "input": "The quick brown fox jumps over the lazy dog.",
        "model": "text-embedding-3-large"
    }
)

print(f"Embedding dimensions: {len(result['embedding'])}")
# Output: Embedding dimensions: 3072

Multiple Texts

python

result = client.call_tool(
    name="llm/embed",
    arguments={
        "input": [
            "First sentence to embed.",
            "Second sentence to embed.",
            "Third sentence to embed."
        ],
        "model": "text-embedding-3-large"
    }
)

print(f"Generated {len(result['embeddings'])} embeddings")

Response

json

{
  "embedding": [0.0023, -0.0045, 0.0089, ...],
  "model": "text-embedding-3-large",
  "dimensions": 3072,
  "usage": {
    "prompt_tokens": 10,
    "total_tokens": 10
  },
  "cost": 0.00001
}

llm/list_models

List models available to the agent.

Parameters

None required.

Example

python

result = client.call_tool(
    name="llm/list_models",
    arguments={}
)

print("Available models:")
for model in result["models"]:
    print(f"  {model['id']}: {model['type']}")

Response

json

{
  "models": [
    {
      "id": "gpt-5-mini",
      "type": "chat",
      "provider": "openai",
      "context_window": 128000,
      "allowed": true
    },
    {
      "id": "gpt-5.2",
      "type": "chat",
      "provider": "openai",
      "context_window": 128000,
      "allowed": true
    },
    {
      "id": "text-embedding-3-large",
      "type": "embedding",
      "provider": "openai",
      "dimensions": 3072,
      "allowed": true
    },
    {
      "id": "claude-opus-4-5-20251107",
      "type": "chat",
      "provider": "anthropic",
      "context_window": 200000,
      "allowed": false,
      "reason": "Not in agent model allowlist"
    }
  ]
}

Advanced Usage

Streaming

python

async def stream_response():
    async for chunk in client.stream_tool(
        name="llm/chat",
        arguments={
            "messages": [{"role": "user", "content": "Write a poem about AI."}],
            "model": "gpt-5.2",
            "stream": True
        }
    ):
        print(chunk["delta"], end="", flush=True)

Function Calling

python

result = client.call_tool(
    name="llm/chat",
    arguments={
        "messages": [
            {"role": "user", "content": "What's the weather in Paris?"}
        ],
        "model": "gpt-5.2",
        "tools": [
            {
                "type": "function",
                "function": {
                    "name": "get_weather",
                    "description": "Get weather for a location",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "location": {"type": "string"}
                        },
                        "required": ["location"]
                    }
                }
            }
        ]
    }
)

if result.get("tool_calls"):
    for call in result["tool_calls"]:
        print(f"Function: {call['function']['name']}")
        print(f"Arguments: {call['function']['arguments']}")

JSON Mode

python

result = client.call_tool(
    name="llm/chat",
    arguments={
        "messages": [
            {"role": "user", "content": "Extract the name and age from: 'John is 30 years old.'"}
        ],
        "model": "gpt-5.2",
        "response_format": {"type": "json_object"}
    }
)

import json
data = json.loads(result["content"])
print(data)  # {"name": "John", "age": 30}

Permissions

Grant LLM access:

yaml

permissions:
  tools:
    - llm/chat
    - llm/embed
    - llm/list_models
  models:
    - gpt-5-mini        # Specific models allowed
    - gpt-5.2
    - text-embedding-3-large

Model Restrictions

If an agent tries to use a non-allowed model:

json

{
  "error": {
    "code": "model_not_allowed",
    "message": "Model 'claude-opus-4-5-20251107' is not in the agent's model allowlist",
    "allowed_models": ["gpt-5-mini", "gpt-5.2"]
  }
}

Cost Tracking

All LLM calls include cost information:

python

result = client.call_tool(name="llm/chat", arguments={...})

print(f"Cost: ${result['cost']:.6f}")
print(f"Tokens: {result['usage']['total_tokens']}")

Best Practices

Use appropriate models - Match model to task complexity
Set max_tokens - Prevent unexpectedly long responses
Lower temperature - For factual/precise tasks
Check allowed models - Use llm/list_models first
Monitor costs - Track usage per agent

Next Steps

Self-Inspect Tools - Agent introspection
Model Allowlists - Model restrictions
Cost Transparency - Cost monitoring

LLM Tools ​

Available Tools ​

llm/chat ​

Parameters ​

Basic Example ​

With System Prompt ​

Conversation History ​

Response ​

llm/embed ​

Parameters ​

Single Text ​

Multiple Texts ​

Response ​

llm/list_models ​

Parameters ​

Example ​

Response ​

Advanced Usage ​

Streaming ​

Function Calling ​

JSON Mode ​

Permissions ​

Model Restrictions ​

Cost Tracking ​

Best Practices ​

Next Steps ​

LLM Tools

Available Tools

llm/chat

Parameters

Basic Example

With System Prompt

Conversation History

Response

llm/embed

Parameters

Single Text

Multiple Texts

Response

llm/list_models

Parameters

Example

Response

Advanced Usage

Streaming

Function Calling

JSON Mode

Permissions

Model Restrictions

Cost Tracking

Best Practices

Next Steps