Transcript compaction

Long conversations exceed context windows. Compaction is how you keep an agent session viable without losing important context. This chapter covers agentkit-compaction: the trigger, strategy, and pipeline system.

The design

Compaction is optional and host-configured. It plugs into the loop through the generic LoopMutator seam — a Compactor is just a mutator that decides “should I rewrite the transcript right now?” and, if yes, swaps it out. There are three concerns:

When to compact — the trigger closure
How to compact — the strategy (or pipeline of strategies)
What to use for semantic summarization — the optional backend

#![allow(unused)]
fn main() {
let compactor = StrategyCompactor::builder()
    .item_count_trigger(12)
    .strategy(strategy)
    .build()?;

let agent = Agent::builder()
    .model(adapter)
    .compactor(compactor) // AgentBuilderCompactorExt
    .build()?;
}

Triggers

A trigger is a closure with the shape Fn(&[Item], MutationPoint) -> Option<CompactionReason> + Send + Sync (aliased as TriggerFn). Returning Some(reason) fires compaction; None skips. Built-in helpers:

item_count_trigger(max_items) — fires when the transcript grows beyond max_items
context_window_trigger(window, percent) — fires when the latest item’s reported input_tokens reaches window * percent / 100 (only at AfterTurnEnded)

Custom triggers are plain closures:

#![allow(unused)]
fn main() {
StrategyCompactor::builder()
    .trigger(Box::new(|transcript, point| {
        (point == MutationPoint::AfterTurnEnded && transcript.len() > 20)
            .then_some(CompactionReason::TranscriptTooLong)
    }))
    .strategy(strategy)
    .build()?;
}

Strategies

A CompactionStrategy transforms the transcript:

#![allow(unused)]
fn main() {
pub trait CompactionStrategy {
    async fn compact(
        &self,
        request: CompactionRequest,
        ctx: &mut CompactionContext,
    ) -> Result<CompactionResult, CompactionError>;
}
}

Built-in strategies:

Strategy	Description
`DropReasoningStrategy`	Removes reasoning parts from assistant items
`DropFailedToolResultsStrategy`	Removes tool results where `is_error: true`
`KeepRecentStrategy`	Keeps the last N non-preserved items
`SummarizeOlderStrategy`	Summarizes older items through the backend

Preservation

KeepRecentStrategy supports preservation rules:

#![allow(unused)]
fn main() {
KeepRecentStrategy::new(8)
    .preserve_kind(ItemKind::System)
    .preserve_kind(ItemKind::Context)
}

System and context items are kept regardless of age. Only user/assistant/tool items are subject to trimming.

Pipelines

Multiple strategies compose into a pipeline:

#![allow(unused)]
fn main() {
CompactionPipeline::new()
    .with_strategy(DropReasoningStrategy::new())
    .with_strategy(DropFailedToolResultsStrategy::new())
    .with_strategy(KeepRecentStrategy::new(8)
        .preserve_kind(ItemKind::System)
        .preserve_kind(ItemKind::Context))
}

Strategies execute in order. Each one receives the output of the previous.

Semantic compaction

For summarization, the host injects a CompactionBackend:

#![allow(unused)]
fn main() {
let compactor = StrategyCompactor::builder()
    .trigger(trigger)
    .strategy(strategy) // e.g. SummarizeOlderStrategy
    .backend(my_backend)
    .build()?;
}

The backend receives a SummaryRequest and returns a SummaryResult. agentkit ships AgentCompactor, a backend that runs a nested loop over a sub-agent (use it directly or roll your own). The openrouter-compaction-agent example wires AgentCompactor into the pipeline.

A compaction example

Before and after a compaction pipeline run:

Before (20 items, trigger threshold: 12):

  [0]  System: "You are a coding assistant"           ← preserved
  [1]  Context: "Project uses Rust 2024..."            ← preserved
  [2]  User: "What files are in src/?"
  [3]  Asst: (reasoning) "Let me list the directory"
             (text) "I'll check..."
             (tool_call) fs_list_directory
  [4]  Tool: ["main.rs", "lib.rs", "parser.rs"]
  [5]  Asst: "There are three files..."
  [6]  User: "Read parser.rs"
  [7]  Asst: (tool_call) fs_read_file
  [8]  Tool: "fn parse() { ... }"
  [9]  Asst: "The parser contains..."
  [10] User: "Add error handling"
  [11] Asst: (tool_call) fs_replace_in_file
  [12] Tool: { is_error: true, "search text not found" }  ← failed
  [13] Asst: "Let me try again..."
             (tool_call) fs_replace_in_file
  [14] Tool: "Replacement successful"
  [15] Asst: (tool_call) shell_exec("cargo check")
  [16] Tool: "Compiling... 0 errors"
  [17] Asst: "Done! I added error handling..."
  [18] User: "Now add tests"
  [19] Asst: (thinking about tests...)


Pipeline:
  1. DropReasoningStrategy     → removes reasoning parts from [3], [19]
  2. DropFailedToolResultsStrategy → removes failed result [12]
  3. KeepRecentStrategy(8, preserve System+Context)

After (10 items):

  [0]  System: "You are a coding assistant"            ← preserved
  [1]  Context: "Project uses Rust 2024..."             ← preserved
  [2]  Asst: "Let me try again..."                      ← recent 8 start here
             (tool_call) fs_replace_in_file
  [3]  Tool: "Replacement successful"
  [4]  Asst: (tool_call) shell_exec("cargo check")
  [5]  Tool: "Compiling... 0 errors"
  [6]  Asst: "Done! I added error handling..."
  [7]  User: "Now add tests"
  [8]  Asst: (now without reasoning part)

The model lost the early conversation but retains the system prompt, project context, and the most recent work. This is usually a good trade-off — the model’s attention is strongest on recent items anyway.

Compaction vs prompt caching

Compaction and prompt caching both operate on the turn request, but they optimize for different things:

Prompt caching tries to reuse an unchanged serialized prefix from earlier turns
Compaction deliberately changes the serialized transcript to make it shorter

That means compaction often invalidates the cache prefix even when the conversation is still logically continuous.

Consider the actual prompt prefix sent to the provider:

Before compaction:

  [system]
  [context]
  [user 1]
  [assistant 1]
  [tool result 1]
  [user 2]
  [assistant 2]
  [user 3]

  cacheable prefix for turn N:
  └───────────────────────────────────────────────┘


After compaction:

  [system]                       ← still present
  [context]                      ← still present
  [compaction summary]           ← new item, replaces older history
  [assistant 2]
  [user 3]

  new cacheable prefix for turn N+1:
  └─────────────────────────────┘

Provider-side caches are keyed on the exact prompt prefix, not the semantic meaning of the conversation. These changes all tend to invalidate an existing cache entry:

dropping reasoning parts
removing failed tool results
trimming old user/assistant/tool items
replacing many old items with a single summary item
reordering or refreshing context items

What survives compaction

After compaction, only the compacted transcript is part of future conversation history from the model’s perspective.

Retained	Dropped
Preserved `System` items	Reasoning blocks
Preserved `Context` items	Failed tool results
Recent user/assistant/tool items that survived	Older conversation items past the keep window
Summary items from semantic compaction	Raw items replaced by a summary

The provider-side cache itself is not conversation history — it is transport state owned by the provider. It can accelerate reuse of a prompt prefix, but it does not extend the model’s memory. If compaction removes or rewrites earlier items, those items are gone from the request even if an older provider cache entry still exists.

The trade-off

Compaction can reduce cache hit rates in exchange for keeping the session under the context window.

That trade-off is often still correct:

without compaction, the session may stop fitting at all
with compaction, the transcript becomes shorter and cheaper even if an old cache prefix is no longer reusable
preserved system/context prefixes still give the cache some stable surface area

In practice:

structural compaction usually causes smaller cache disruptions
semantic compaction causes larger cache disruptions because it replaces many items with a new summary
long-lived context items and stable tool schemas are still good cache anchors

This does not mean all caching efficiency is lost after compaction. The typical sequence:

the old cacheable prefix becomes invalid because the transcript changed
the compacted transcript is sent on the next turn
that new, shorter transcript becomes the new cacheable prefix
subsequent turns reuse the compacted prefix until the next compaction cycle

Compaction behaves like a cache reset followed by a new stable baseline.

turn N-1:
  long history prefix                          ← cached

turn N:
  compaction runs
  compacted transcript sent                    ← old cache no longer matches

turn N+1, N+2, N+3:
  same compacted transcript prefix reused      ← new cache hits accumulate

This is one reason semantic compaction can still be efficient overall. The summary item may replace a large unstable history with a much smaller durable prefix that is cheap to resend and easy to cache for the next several turns.

This is why caching is configured separately from compaction in agentkit. Compaction decides what the transcript should be. Caching then operates on whatever transcript remains. For the cache model itself, see Chapter 15.

Loop integration

Compactors register as LoopMutators. The loop runs every registered mutator at each MutationPoint — AfterToolResult (between tool results and the next inference call) and AfterTurnEnded (after the assistant final, interrupt, or cancellation). The trigger decides which points are relevant.

When a compactor fires:

The compactor emits AgentEvent::MutationStarted { mutator, point, .. } with a stable label it chose
The strategy pipeline transforms the transcript through the cursor
The loop validates transcript invariants (tool_use ↔ tool_result pairing) and hard-fails with LoopError::Mutator on a protocol violation
The compactor emits AgentEvent::MutationFinished { mutator, dirty, metadata, .. }

Turn lifecycle with a registered compactor:

  next() → merge pending input
       │
       ▼
  begin model turn
       │
       ▼
  tool_result? ─► run mutators at AfterToolResult ─► continue turn
       │
       ▼
  turn ends ─► run mutators at AfterTurnEnded
       │
       ▼
  next turn sees the post-mutation transcript

The model never observes raw compaction artifacts — it just sees the post-mutation transcript.

Compaction is not summarization

Most compaction strategies are structural — they drop parts or trim items without understanding semantics. DropReasoningStrategy removes reasoning blocks because they’re verbose and not needed for future turns. KeepRecentStrategy drops old items because the model’s attention is weakest on them.

Only SummarizeOlderStrategy (with a CompactionBackend) does semantic work — it summarizes old items into a shorter form. This requires an LLM call, which adds latency and cost. The openrouter-compaction-agent example uses a nested agent loop as the summarization backend.

Example: openrouter-compaction-agent demonstrates all three types: structural (drop reasoning), hybrid (keep recent + summarize older), and semantic (nested-agent summarization backend).

Crate: agentkit-compaction — depends on agentkit-core. The loop integration is in agentkit-loop.

Keyboard shortcuts

agentkit