Transcript compaction
Long conversations exceed context windows. Compaction is how you keep an agent session viable without losing important context. This chapter covers agentkit-compaction: the trigger, strategy, and pipeline system.
The design
Compaction is optional and host-configured. It has three concerns:
- When to compact — the trigger
- How to compact — the strategy (or pipeline of strategies)
- What to use for semantic summarization — the optional backend
#![allow(unused)]
fn main() {
let agent = Agent::builder()
.model(adapter)
.compaction(CompactionConfig::new(trigger, strategy))
.build()?;
}
Triggers
A CompactionTrigger decides whether compaction should run before a turn:
#![allow(unused)]
fn main() {
pub trait CompactionTrigger {
fn should_compact(&self, transcript: &[Item], reason: &CompactionReason) -> bool;
}
}
Built-in: ItemCountTrigger::new(12) fires when the transcript exceeds 12 items.
Strategies
A CompactionStrategy transforms the transcript:
#![allow(unused)]
fn main() {
pub trait CompactionStrategy {
async fn compact(
&self,
request: CompactionRequest,
ctx: &mut CompactionContext,
) -> Result<CompactionResult, CompactionError>;
}
}
Built-in strategies:
| Strategy | Description |
|---|---|
DropReasoningStrategy | Removes reasoning parts from assistant items |
DropFailedToolResultsStrategy | Removes tool results where is_error: true |
KeepRecentStrategy | Keeps the last N non-preserved items |
SummarizeOlderStrategy | Summarizes older items through the backend |
Preservation
KeepRecentStrategy supports preservation rules:
#![allow(unused)]
fn main() {
KeepRecentStrategy::new(8)
.preserve_kind(ItemKind::System)
.preserve_kind(ItemKind::Context)
}
System and context items are kept regardless of age. Only user/assistant/tool items are subject to trimming.
Pipelines
Multiple strategies compose into a pipeline:
#![allow(unused)]
fn main() {
CompactionPipeline::new()
.with_strategy(DropReasoningStrategy::new())
.with_strategy(DropFailedToolResultsStrategy::new())
.with_strategy(KeepRecentStrategy::new(8)
.preserve_kind(ItemKind::System)
.preserve_kind(ItemKind::Context))
}
Strategies execute in order. Each one receives the output of the previous.
Semantic compaction
For summarization, the host injects a CompactionBackend:
#![allow(unused)]
fn main() {
let config = CompactionConfig::new(trigger, strategy).with_backend(my_backend);
}
The backend receives a SummaryRequest and returns a SummaryResult. agentkit does not include a built-in LLM client — the backend is host-provided. The openrouter-compaction-agent example uses a nested agent loop as the summarization backend.
A compaction example
Before and after a compaction pipeline run:
Before (20 items, trigger threshold: 12):
[0] System: "You are a coding assistant" ← preserved
[1] Context: "Project uses Rust 2024..." ← preserved
[2] User: "What files are in src/?"
[3] Asst: (reasoning) "Let me list the directory"
(text) "I'll check..."
(tool_call) fs.list_directory
[4] Tool: ["main.rs", "lib.rs", "parser.rs"]
[5] Asst: "There are three files..."
[6] User: "Read parser.rs"
[7] Asst: (tool_call) fs.read_file
[8] Tool: "fn parse() { ... }"
[9] Asst: "The parser contains..."
[10] User: "Add error handling"
[11] Asst: (tool_call) fs.replace_in_file
[12] Tool: { is_error: true, "search text not found" } ← failed
[13] Asst: "Let me try again..."
(tool_call) fs.replace_in_file
[14] Tool: "Replacement successful"
[15] Asst: (tool_call) shell.exec("cargo check")
[16] Tool: "Compiling... 0 errors"
[17] Asst: "Done! I added error handling..."
[18] User: "Now add tests"
[19] Asst: (thinking about tests...)
Pipeline:
1. DropReasoningStrategy → removes reasoning parts from [3], [19]
2. DropFailedToolResultsStrategy → removes failed result [12]
3. KeepRecentStrategy(8, preserve System+Context)
After (10 items):
[0] System: "You are a coding assistant" ← preserved
[1] Context: "Project uses Rust 2024..." ← preserved
[2] Asst: "Let me try again..." ← recent 8 start here
(tool_call) fs.replace_in_file
[3] Tool: "Replacement successful"
[4] Asst: (tool_call) shell.exec("cargo check")
[5] Tool: "Compiling... 0 errors"
[6] Asst: "Done! I added error handling..."
[7] User: "Now add tests"
[8] Asst: (now without reasoning part)
The model lost the early conversation but retains the system prompt, project context, and the most recent work. This is usually a good trade-off — the model’s attention is strongest on recent items anyway.
Compaction vs prompt caching
Compaction and prompt caching both operate on the turn request, but they optimize for different things:
- Prompt caching tries to reuse an unchanged serialized prefix from earlier turns
- Compaction deliberately changes the serialized transcript to make it shorter
That means compaction often invalidates the cache prefix even when the conversation is still logically continuous.
Consider the actual prompt prefix sent to the provider:
Before compaction:
[system]
[context]
[user 1]
[assistant 1]
[tool result 1]
[user 2]
[assistant 2]
[user 3]
cacheable prefix for turn N:
└───────────────────────────────────────────────┘
After compaction:
[system] ← still present
[context] ← still present
[compaction summary] ← new item, replaces older history
[assistant 2]
[user 3]
new cacheable prefix for turn N+1:
└─────────────────────────────┘
Provider-side caches are keyed on the exact prompt prefix, not the semantic meaning of the conversation. These changes all tend to invalidate an existing cache entry:
- dropping reasoning parts
- removing failed tool results
- trimming old user/assistant/tool items
- replacing many old items with a single summary item
- reordering or refreshing context items
What survives compaction
After compaction, only the compacted transcript is part of future conversation history from the model’s perspective.
| Retained | Dropped |
|---|---|
Preserved System items | Reasoning blocks |
Preserved Context items | Failed tool results |
| Recent user/assistant/tool items that survived | Older conversation items past the keep window |
| Summary items from semantic compaction | Raw items replaced by a summary |
The provider-side cache itself is not conversation history — it is transport state owned by the provider. It can accelerate reuse of a prompt prefix, but it does not extend the model’s memory. If compaction removes or rewrites earlier items, those items are gone from the request even if an older provider cache entry still exists.
The trade-off
Compaction can reduce cache hit rates in exchange for keeping the session under the context window.
That trade-off is often still correct:
- without compaction, the session may stop fitting at all
- with compaction, the transcript becomes shorter and cheaper even if an old cache prefix is no longer reusable
- preserved system/context prefixes still give the cache some stable surface area
In practice:
- structural compaction usually causes smaller cache disruptions
- semantic compaction causes larger cache disruptions because it replaces many items with a new summary
- long-lived context items and stable tool schemas are still good cache anchors
This does not mean all caching efficiency is lost after compaction. The typical sequence:
- the old cacheable prefix becomes invalid because the transcript changed
- the compacted transcript is sent on the next turn
- that new, shorter transcript becomes the new cacheable prefix
- subsequent turns reuse the compacted prefix until the next compaction cycle
Compaction behaves like a cache reset followed by a new stable baseline.
turn N-1:
long history prefix ← cached
turn N:
compaction runs
compacted transcript sent ← old cache no longer matches
turn N+1, N+2, N+3:
same compacted transcript prefix reused ← new cache hits accumulate
This is one reason semantic compaction can still be efficient overall. The summary item may replace a large unstable history with a much smaller durable prefix that is cheap to resend and easy to cache for the next several turns.
This is why caching is configured separately from compaction in agentkit. Compaction decides what the transcript should be. Caching then operates on whatever transcript remains. For the cache model itself, see Chapter 15.
Loop integration
When compaction fires:
AgentEvent::CompactionStartedis emitted (with the trigger reason)- The strategy pipeline transforms the transcript
- The loop replaces its working transcript with the compacted result
AgentEvent::CompactionFinishedis emitted (with before/after item counts)
Turn lifecycle with compaction:
submit_input()
│
▼
┌── compaction check ──┐
│ │
│ trigger fires? │
│ yes → run pipeline │
│ no → skip │
└──────────┬───────────┘
│
▼
begin model turn (with post-compaction transcript)
This happens before the model sees the transcript for the next turn. The model never observes raw compaction artifacts — it just sees a shorter transcript.
Compaction is not summarization
Most compaction strategies are structural — they drop parts or trim items without understanding semantics. DropReasoningStrategy removes reasoning blocks because they’re verbose and not needed for future turns. KeepRecentStrategy drops old items because the model’s attention is weakest on them.
Only SummarizeOlderStrategy (with a CompactionBackend) does semantic work — it summarizes old items into a shorter form. This requires an LLM call, which adds latency and cost. The openrouter-compaction-agent example uses a nested agent loop as the summarization backend.
Example:
openrouter-compaction-agentdemonstrates all three types: structural (drop reasoning), hybrid (keep recent + summarize older), and semantic (nested-agent summarization backend).Crate:
agentkit-compaction— depends onagentkit-core. The loop integration is inagentkit-loop.