Appearance
LLM Tools
Tools for accessing AI language models.
Available Tools
| Tool | Description | Permission |
|---|---|---|
llm/chat | Send messages to an LLM | llm/chat |
llm/embed | Generate text embeddings | llm/embed |
llm/list_models | List available models | llm/list_models |
llm/chat
Send messages to a language model and get a response.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
messages | array | Yes | Conversation messages |
model | string | No | Model to use (default: agent's default) |
temperature | number | No | Sampling temperature (0-2) |
max_tokens | integer | No | Maximum response tokens |
system | string | No | System prompt |
tools | array | No | Function calling tools |
stream | boolean | No | Stream response |
Basic Example
python
from gateflow_mcp import MCPClient
client = MCPClient(agent_id="agent_abc123", api_key="gf-agent-...")
result = client.call_tool(
name="llm/chat",
arguments={
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"model": "gpt-5-mini"
}
)
print(result["content"])
# Output: The capital of France is Paris.With System Prompt
python
result = client.call_tool(
name="llm/chat",
arguments={
"messages": [
{"role": "user", "content": "Summarize this document..."}
],
"model": "gpt-5.2",
"system": "You are a helpful assistant that summarizes documents concisely.",
"temperature": 0.3,
"max_tokens": 500
}
)Conversation History
python
result = client.call_tool(
name="llm/chat",
arguments={
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "My name is Alice."},
{"role": "assistant", "content": "Hello Alice! How can I help you today?"},
{"role": "user", "content": "What's my name?"}
],
"model": "gpt-5-mini"
}
)
print(result["content"])
# Output: Your name is Alice.Response
json
{
"content": "The capital of France is Paris.",
"model": "gpt-5-mini",
"usage": {
"prompt_tokens": 15,
"completion_tokens": 8,
"total_tokens": 23
},
"finish_reason": "stop",
"cost": 0.000023
}llm/embed
Generate vector embeddings for text.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
input | string/array | Yes | Text to embed |
model | string | No | Embedding model |
Single Text
python
result = client.call_tool(
name="llm/embed",
arguments={
"input": "The quick brown fox jumps over the lazy dog.",
"model": "text-embedding-3-large"
}
)
print(f"Embedding dimensions: {len(result['embedding'])}")
# Output: Embedding dimensions: 3072Multiple Texts
python
result = client.call_tool(
name="llm/embed",
arguments={
"input": [
"First sentence to embed.",
"Second sentence to embed.",
"Third sentence to embed."
],
"model": "text-embedding-3-large"
}
)
print(f"Generated {len(result['embeddings'])} embeddings")Response
json
{
"embedding": [0.0023, -0.0045, 0.0089, ...],
"model": "text-embedding-3-large",
"dimensions": 3072,
"usage": {
"prompt_tokens": 10,
"total_tokens": 10
},
"cost": 0.00001
}llm/list_models
List models available to the agent.
Parameters
None required.
Example
python
result = client.call_tool(
name="llm/list_models",
arguments={}
)
print("Available models:")
for model in result["models"]:
print(f" {model['id']}: {model['type']}")Response
json
{
"models": [
{
"id": "gpt-5-mini",
"type": "chat",
"provider": "openai",
"context_window": 128000,
"allowed": true
},
{
"id": "gpt-5.2",
"type": "chat",
"provider": "openai",
"context_window": 128000,
"allowed": true
},
{
"id": "text-embedding-3-large",
"type": "embedding",
"provider": "openai",
"dimensions": 3072,
"allowed": true
},
{
"id": "claude-opus-4-5-20251107",
"type": "chat",
"provider": "anthropic",
"context_window": 200000,
"allowed": false,
"reason": "Not in agent model allowlist"
}
]
}Advanced Usage
Streaming
python
async def stream_response():
async for chunk in client.stream_tool(
name="llm/chat",
arguments={
"messages": [{"role": "user", "content": "Write a poem about AI."}],
"model": "gpt-5.2",
"stream": True
}
):
print(chunk["delta"], end="", flush=True)Function Calling
python
result = client.call_tool(
name="llm/chat",
arguments={
"messages": [
{"role": "user", "content": "What's the weather in Paris?"}
],
"model": "gpt-5.2",
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
}
]
}
)
if result.get("tool_calls"):
for call in result["tool_calls"]:
print(f"Function: {call['function']['name']}")
print(f"Arguments: {call['function']['arguments']}")JSON Mode
python
result = client.call_tool(
name="llm/chat",
arguments={
"messages": [
{"role": "user", "content": "Extract the name and age from: 'John is 30 years old.'"}
],
"model": "gpt-5.2",
"response_format": {"type": "json_object"}
}
)
import json
data = json.loads(result["content"])
print(data) # {"name": "John", "age": 30}Permissions
Grant LLM access:
yaml
permissions:
tools:
- llm/chat
- llm/embed
- llm/list_models
models:
- gpt-5-mini # Specific models allowed
- gpt-5.2
- text-embedding-3-largeModel Restrictions
If an agent tries to use a non-allowed model:
json
{
"error": {
"code": "model_not_allowed",
"message": "Model 'claude-opus-4-5-20251107' is not in the agent's model allowlist",
"allowed_models": ["gpt-5-mini", "gpt-5.2"]
}
}Cost Tracking
All LLM calls include cost information:
python
result = client.call_tool(name="llm/chat", arguments={...})
print(f"Cost: ${result['cost']:.6f}")
print(f"Tokens: {result['usage']['total_tokens']}")Best Practices
- Use appropriate models - Match model to task complexity
- Set max_tokens - Prevent unexpectedly long responses
- Lower temperature - For factual/precise tasks
- Check allowed models - Use
llm/list_modelsfirst - Monitor costs - Track usage per agent
Next Steps
- Self-Inspect Tools - Agent introspection
- Model Allowlists - Model restrictions
- Cost Transparency - Cost monitoring