Skip to content

LLM Tools

Tools for accessing AI language models.

Available Tools

ToolDescriptionPermission
llm/chatSend messages to an LLMllm/chat
llm/embedGenerate text embeddingsllm/embed
llm/list_modelsList available modelsllm/list_models

llm/chat

Send messages to a language model and get a response.

Parameters

ParameterTypeRequiredDescription
messagesarrayYesConversation messages
modelstringNoModel to use (default: agent's default)
temperaturenumberNoSampling temperature (0-2)
max_tokensintegerNoMaximum response tokens
systemstringNoSystem prompt
toolsarrayNoFunction calling tools
streambooleanNoStream response

Basic Example

python
from gateflow_mcp import MCPClient

client = MCPClient(agent_id="agent_abc123", api_key="gf-agent-...")

result = client.call_tool(
    name="llm/chat",
    arguments={
        "messages": [
            {"role": "user", "content": "What is the capital of France?"}
        ],
        "model": "gpt-5-mini"
    }
)

print(result["content"])
# Output: The capital of France is Paris.

With System Prompt

python
result = client.call_tool(
    name="llm/chat",
    arguments={
        "messages": [
            {"role": "user", "content": "Summarize this document..."}
        ],
        "model": "gpt-5.2",
        "system": "You are a helpful assistant that summarizes documents concisely.",
        "temperature": 0.3,
        "max_tokens": 500
    }
)

Conversation History

python
result = client.call_tool(
    name="llm/chat",
    arguments={
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "My name is Alice."},
            {"role": "assistant", "content": "Hello Alice! How can I help you today?"},
            {"role": "user", "content": "What's my name?"}
        ],
        "model": "gpt-5-mini"
    }
)

print(result["content"])
# Output: Your name is Alice.

Response

json
{
  "content": "The capital of France is Paris.",
  "model": "gpt-5-mini",
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 8,
    "total_tokens": 23
  },
  "finish_reason": "stop",
  "cost": 0.000023
}

llm/embed

Generate vector embeddings for text.

Parameters

ParameterTypeRequiredDescription
inputstring/arrayYesText to embed
modelstringNoEmbedding model

Single Text

python
result = client.call_tool(
    name="llm/embed",
    arguments={
        "input": "The quick brown fox jumps over the lazy dog.",
        "model": "text-embedding-3-large"
    }
)

print(f"Embedding dimensions: {len(result['embedding'])}")
# Output: Embedding dimensions: 3072

Multiple Texts

python
result = client.call_tool(
    name="llm/embed",
    arguments={
        "input": [
            "First sentence to embed.",
            "Second sentence to embed.",
            "Third sentence to embed."
        ],
        "model": "text-embedding-3-large"
    }
)

print(f"Generated {len(result['embeddings'])} embeddings")

Response

json
{
  "embedding": [0.0023, -0.0045, 0.0089, ...],
  "model": "text-embedding-3-large",
  "dimensions": 3072,
  "usage": {
    "prompt_tokens": 10,
    "total_tokens": 10
  },
  "cost": 0.00001
}

llm/list_models

List models available to the agent.

Parameters

None required.

Example

python
result = client.call_tool(
    name="llm/list_models",
    arguments={}
)

print("Available models:")
for model in result["models"]:
    print(f"  {model['id']}: {model['type']}")

Response

json
{
  "models": [
    {
      "id": "gpt-5-mini",
      "type": "chat",
      "provider": "openai",
      "context_window": 128000,
      "allowed": true
    },
    {
      "id": "gpt-5.2",
      "type": "chat",
      "provider": "openai",
      "context_window": 128000,
      "allowed": true
    },
    {
      "id": "text-embedding-3-large",
      "type": "embedding",
      "provider": "openai",
      "dimensions": 3072,
      "allowed": true
    },
    {
      "id": "claude-opus-4-5-20251107",
      "type": "chat",
      "provider": "anthropic",
      "context_window": 200000,
      "allowed": false,
      "reason": "Not in agent model allowlist"
    }
  ]
}

Advanced Usage

Streaming

python
async def stream_response():
    async for chunk in client.stream_tool(
        name="llm/chat",
        arguments={
            "messages": [{"role": "user", "content": "Write a poem about AI."}],
            "model": "gpt-5.2",
            "stream": True
        }
    ):
        print(chunk["delta"], end="", flush=True)

Function Calling

python
result = client.call_tool(
    name="llm/chat",
    arguments={
        "messages": [
            {"role": "user", "content": "What's the weather in Paris?"}
        ],
        "model": "gpt-5.2",
        "tools": [
            {
                "type": "function",
                "function": {
                    "name": "get_weather",
                    "description": "Get weather for a location",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "location": {"type": "string"}
                        },
                        "required": ["location"]
                    }
                }
            }
        ]
    }
)

if result.get("tool_calls"):
    for call in result["tool_calls"]:
        print(f"Function: {call['function']['name']}")
        print(f"Arguments: {call['function']['arguments']}")

JSON Mode

python
result = client.call_tool(
    name="llm/chat",
    arguments={
        "messages": [
            {"role": "user", "content": "Extract the name and age from: 'John is 30 years old.'"}
        ],
        "model": "gpt-5.2",
        "response_format": {"type": "json_object"}
    }
)

import json
data = json.loads(result["content"])
print(data)  # {"name": "John", "age": 30}

Permissions

Grant LLM access:

yaml
permissions:
  tools:
    - llm/chat
    - llm/embed
    - llm/list_models
  models:
    - gpt-5-mini        # Specific models allowed
    - gpt-5.2
    - text-embedding-3-large

Model Restrictions

If an agent tries to use a non-allowed model:

json
{
  "error": {
    "code": "model_not_allowed",
    "message": "Model 'claude-opus-4-5-20251107' is not in the agent's model allowlist",
    "allowed_models": ["gpt-5-mini", "gpt-5.2"]
  }
}

Cost Tracking

All LLM calls include cost information:

python
result = client.call_tool(name="llm/chat", arguments={...})

print(f"Cost: ${result['cost']:.6f}")
print(f"Tokens: {result['usage']['total_tokens']}")

Best Practices

  1. Use appropriate models - Match model to task complexity
  2. Set max_tokens - Prevent unexpectedly long responses
  3. Lower temperature - For factual/precise tasks
  4. Check allowed models - Use llm/list_models first
  5. Monitor costs - Track usage per agent

Next Steps

Built with reliability in mind.