Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Transcript compaction

Long conversations exceed context windows. Compaction is how you keep an agent session viable without losing important context. This chapter covers agentkit-compaction: the trigger, strategy, and pipeline system.

The design

Compaction is optional and host-configured. It has three concerns:

  1. When to compact — the trigger
  2. How to compact — the strategy (or pipeline of strategies)
  3. What to use for semantic summarization — the optional backend
#![allow(unused)]
fn main() {
let agent = Agent::builder()
    .model(adapter)
    .compaction(CompactionConfig::new(trigger, strategy))
    .build()?;
}

Triggers

A CompactionTrigger decides whether compaction should run before a turn:

#![allow(unused)]
fn main() {
pub trait CompactionTrigger {
    fn should_compact(&self, transcript: &[Item], reason: &CompactionReason) -> bool;
}
}

Built-in: ItemCountTrigger::new(12) fires when the transcript exceeds 12 items.

Strategies

A CompactionStrategy transforms the transcript:

#![allow(unused)]
fn main() {
pub trait CompactionStrategy {
    async fn compact(
        &self,
        request: CompactionRequest,
        ctx: &mut CompactionContext,
    ) -> Result<CompactionResult, CompactionError>;
}
}

Built-in strategies:

StrategyDescription
DropReasoningStrategyRemoves reasoning parts from assistant items
DropFailedToolResultsStrategyRemoves tool results where is_error: true
KeepRecentStrategyKeeps the last N non-preserved items
SummarizeOlderStrategySummarizes older items through the backend

Preservation

KeepRecentStrategy supports preservation rules:

#![allow(unused)]
fn main() {
KeepRecentStrategy::new(8)
    .preserve_kind(ItemKind::System)
    .preserve_kind(ItemKind::Context)
}

System and context items are kept regardless of age. Only user/assistant/tool items are subject to trimming.

Pipelines

Multiple strategies compose into a pipeline:

#![allow(unused)]
fn main() {
CompactionPipeline::new()
    .with_strategy(DropReasoningStrategy::new())
    .with_strategy(DropFailedToolResultsStrategy::new())
    .with_strategy(KeepRecentStrategy::new(8)
        .preserve_kind(ItemKind::System)
        .preserve_kind(ItemKind::Context))
}

Strategies execute in order. Each one receives the output of the previous.

Semantic compaction

For summarization, the host injects a CompactionBackend:

#![allow(unused)]
fn main() {
let config = CompactionConfig::new(trigger, strategy).with_backend(my_backend);
}

The backend receives a SummaryRequest and returns a SummaryResult. agentkit does not include a built-in LLM client — the backend is host-provided. The openrouter-compaction-agent example uses a nested agent loop as the summarization backend.

A compaction example

Before and after a compaction pipeline run:

Before (20 items, trigger threshold: 12):

  [0]  System: "You are a coding assistant"           ← preserved
  [1]  Context: "Project uses Rust 2024..."            ← preserved
  [2]  User: "What files are in src/?"
  [3]  Asst: (reasoning) "Let me list the directory"
             (text) "I'll check..."
             (tool_call) fs.list_directory
  [4]  Tool: ["main.rs", "lib.rs", "parser.rs"]
  [5]  Asst: "There are three files..."
  [6]  User: "Read parser.rs"
  [7]  Asst: (tool_call) fs.read_file
  [8]  Tool: "fn parse() { ... }"
  [9]  Asst: "The parser contains..."
  [10] User: "Add error handling"
  [11] Asst: (tool_call) fs.replace_in_file
  [12] Tool: { is_error: true, "search text not found" }  ← failed
  [13] Asst: "Let me try again..."
             (tool_call) fs.replace_in_file
  [14] Tool: "Replacement successful"
  [15] Asst: (tool_call) shell.exec("cargo check")
  [16] Tool: "Compiling... 0 errors"
  [17] Asst: "Done! I added error handling..."
  [18] User: "Now add tests"
  [19] Asst: (thinking about tests...)


Pipeline:
  1. DropReasoningStrategy     → removes reasoning parts from [3], [19]
  2. DropFailedToolResultsStrategy → removes failed result [12]
  3. KeepRecentStrategy(8, preserve System+Context)

After (10 items):

  [0]  System: "You are a coding assistant"            ← preserved
  [1]  Context: "Project uses Rust 2024..."             ← preserved
  [2]  Asst: "Let me try again..."                      ← recent 8 start here
             (tool_call) fs.replace_in_file
  [3]  Tool: "Replacement successful"
  [4]  Asst: (tool_call) shell.exec("cargo check")
  [5]  Tool: "Compiling... 0 errors"
  [6]  Asst: "Done! I added error handling..."
  [7]  User: "Now add tests"
  [8]  Asst: (now without reasoning part)

The model lost the early conversation but retains the system prompt, project context, and the most recent work. This is usually a good trade-off — the model’s attention is strongest on recent items anyway.

Compaction vs prompt caching

Compaction and prompt caching both operate on the turn request, but they optimize for different things:

  • Prompt caching tries to reuse an unchanged serialized prefix from earlier turns
  • Compaction deliberately changes the serialized transcript to make it shorter

That means compaction often invalidates the cache prefix even when the conversation is still logically continuous.

Consider the actual prompt prefix sent to the provider:

Before compaction:

  [system]
  [context]
  [user 1]
  [assistant 1]
  [tool result 1]
  [user 2]
  [assistant 2]
  [user 3]

  cacheable prefix for turn N:
  └───────────────────────────────────────────────┘


After compaction:

  [system]                       ← still present
  [context]                      ← still present
  [compaction summary]           ← new item, replaces older history
  [assistant 2]
  [user 3]

  new cacheable prefix for turn N+1:
  └─────────────────────────────┘

Provider-side caches are keyed on the exact prompt prefix, not the semantic meaning of the conversation. These changes all tend to invalidate an existing cache entry:

  • dropping reasoning parts
  • removing failed tool results
  • trimming old user/assistant/tool items
  • replacing many old items with a single summary item
  • reordering or refreshing context items

What survives compaction

After compaction, only the compacted transcript is part of future conversation history from the model’s perspective.

RetainedDropped
Preserved System itemsReasoning blocks
Preserved Context itemsFailed tool results
Recent user/assistant/tool items that survivedOlder conversation items past the keep window
Summary items from semantic compactionRaw items replaced by a summary

The provider-side cache itself is not conversation history — it is transport state owned by the provider. It can accelerate reuse of a prompt prefix, but it does not extend the model’s memory. If compaction removes or rewrites earlier items, those items are gone from the request even if an older provider cache entry still exists.

The trade-off

Compaction can reduce cache hit rates in exchange for keeping the session under the context window.

That trade-off is often still correct:

  • without compaction, the session may stop fitting at all
  • with compaction, the transcript becomes shorter and cheaper even if an old cache prefix is no longer reusable
  • preserved system/context prefixes still give the cache some stable surface area

In practice:

  • structural compaction usually causes smaller cache disruptions
  • semantic compaction causes larger cache disruptions because it replaces many items with a new summary
  • long-lived context items and stable tool schemas are still good cache anchors

This does not mean all caching efficiency is lost after compaction. The typical sequence:

  1. the old cacheable prefix becomes invalid because the transcript changed
  2. the compacted transcript is sent on the next turn
  3. that new, shorter transcript becomes the new cacheable prefix
  4. subsequent turns reuse the compacted prefix until the next compaction cycle

Compaction behaves like a cache reset followed by a new stable baseline.

turn N-1:
  long history prefix                          ← cached

turn N:
  compaction runs
  compacted transcript sent                    ← old cache no longer matches

turn N+1, N+2, N+3:
  same compacted transcript prefix reused      ← new cache hits accumulate

This is one reason semantic compaction can still be efficient overall. The summary item may replace a large unstable history with a much smaller durable prefix that is cheap to resend and easy to cache for the next several turns.

This is why caching is configured separately from compaction in agentkit. Compaction decides what the transcript should be. Caching then operates on whatever transcript remains. For the cache model itself, see Chapter 15.

Loop integration

When compaction fires:

  1. AgentEvent::CompactionStarted is emitted (with the trigger reason)
  2. The strategy pipeline transforms the transcript
  3. The loop replaces its working transcript with the compacted result
  4. AgentEvent::CompactionFinished is emitted (with before/after item counts)
Turn lifecycle with compaction:

  submit_input()
       │
       ▼
  ┌── compaction check ──┐
  │                      │
  │  trigger fires?      │
  │  yes → run pipeline  │
  │  no  → skip          │
  └──────────┬───────────┘
             │
             ▼
  begin model turn (with post-compaction transcript)

This happens before the model sees the transcript for the next turn. The model never observes raw compaction artifacts — it just sees a shorter transcript.

Compaction is not summarization

Most compaction strategies are structural — they drop parts or trim items without understanding semantics. DropReasoningStrategy removes reasoning blocks because they’re verbose and not needed for future turns. KeepRecentStrategy drops old items because the model’s attention is weakest on them.

Only SummarizeOlderStrategy (with a CompactionBackend) does semantic work — it summarizes old items into a shorter form. This requires an LLM call, which adds latency and cost. The openrouter-compaction-agent example uses a nested agent loop as the summarization backend.

Example: openrouter-compaction-agent demonstrates all three types: structural (drop reasoning), hybrid (keep recent + summarize older), and semantic (nested-agent summarization backend).

Crate: agentkit-compaction — depends on agentkit-core. The loop integration is in agentkit-loop.