Prompt caching
Prompt caching reduces cost and latency by reusing stable prefixes of a turn request. This chapter covers the cache model in agentkit-loop: what the host configures, what the loop passes to providers, and how adapters translate that into provider-specific behavior.
Why caching lives at the request level
Caching is a transport optimization, not transcript semantics. The transcript is the conversation itself: system prompts, user messages, tool calls, tool results, and context items. Caching is applied when a turn is sent to a provider.
That distinction is why agentkit models caching on SessionConfig and TurnRequest, not on Item or Part.
#![allow(unused)]
fn main() {
pub struct SessionConfig {
pub session_id: SessionId,
pub metadata: MetadataMap,
pub cache: Option<PromptCacheRequest>,
}
pub struct TurnRequest {
pub session_id: SessionId,
pub turn_id: TurnId,
pub transcript: Vec<Item>,
pub available_tools: Vec<ToolSpec>,
pub metadata: MetadataMap,
pub cache: Option<PromptCacheRequest>,
}
}
The host sets a session-level default. The loop copies that into each TurnRequest unless the host overrides the next turn explicitly.
The cache request shape
The request is provider-neutral:
#![allow(unused)]
fn main() {
pub enum PromptCacheMode {
Disabled,
BestEffort,
Required,
}
pub enum PromptCacheRetention {
Default,
Short,
Extended,
}
pub enum PromptCacheStrategy {
Automatic,
Explicit {
breakpoints: Vec<PromptCacheBreakpoint>,
},
}
pub enum PromptCacheBreakpoint {
ToolsEnd,
TranscriptItemEnd { index: usize },
TranscriptPartEnd { item_index: usize, part_index: usize },
}
pub struct PromptCacheRequest {
pub mode: PromptCacheMode,
pub strategy: PromptCacheStrategy,
pub retention: Option<PromptCacheRetention>,
pub key: Option<String>,
}
}
Field semantics
| Field | Variant | Meaning |
|---|---|---|
mode | Disabled | Do not send cache hints for this turn |
BestEffort | Use caching if the provider supports it; degrade silently otherwise | |
Required | Fail the turn if the cache request cannot be honored | |
strategy | Automatic | Let the adapter use native provider behavior, or emulate it internally |
Explicit | The host specifies concrete cache boundaries | |
retention | Provider-neutral hint for short-lived vs extended retention | |
key | Optional stable cache key for providers that support one |
Session defaults
The simplest place to configure caching is the session:
#![allow(unused)]
fn main() {
let mut driver = agent
.start(SessionConfig {
session_id: SessionId::new("coding-agent"),
metadata: MetadataMap::new(),
cache: Some(PromptCacheRequest {
mode: PromptCacheMode::BestEffort,
strategy: PromptCacheStrategy::Automatic,
retention: Some(PromptCacheRetention::Short),
key: None,
}),
})
.await?;
}
This says:
- try to use prompt caching
- let the provider or adapter choose the prefix automatically
- prefer short-lived retention
- do not require a user-supplied cache key
None vs Disabled
These have different semantics:
| Value | Meaning |
|---|---|
cache: None | No cache preference — adapters don’t add cache fields; provider-native automatic caching may still happen |
cache: Some(... { mode: Disabled, .. }) | Explicitly disable cache controls from agentkit for this session or turn |
Automatic strategy
PromptCacheStrategy::Automatic is the recommended default for most applications:
#![allow(unused)]
fn main() {
PromptCacheRequest {
mode: PromptCacheMode::BestEffort,
strategy: PromptCacheStrategy::Automatic,
retention: Some(PromptCacheRetention::Short),
key: None,
}
}
Why this is the default shape:
- it keeps the host provider-agnostic
- OpenAI-style providers can use native automatic caching
- Anthropic-style providers can be supported by adapters that synthesize explicit cache headers internally
- unsupported providers degrade cleanly in
BestEffortmode
In other words: the host chooses the policy, not the provider-specific mechanism.
Explicit strategy
When the host knows the desired boundaries, it can specify them directly:
#![allow(unused)]
fn main() {
let cache = PromptCacheRequest {
mode: PromptCacheMode::BestEffort,
strategy: PromptCacheStrategy::Explicit {
breakpoints: vec![
PromptCacheBreakpoint::ToolsEnd,
PromptCacheBreakpoint::TranscriptItemEnd { index: 3 },
],
},
retention: Some(PromptCacheRetention::Short),
key: Some("workspace:agentkit".into()),
};
}
Breakpoints are expressed in request order:
- tools
- transcript items
- transcript parts within an item
This matters for providers that expose explicit cache boundaries on tools or message blocks.
Per-turn overrides
Session defaults are often enough, but the loop also supports per-turn overrides:
#![allow(unused)]
fn main() {
driver.set_next_turn_cache(
PromptCacheRequest::explicit_required([PromptCacheBreakpoint::tools_end()])
.with_retention(PromptCacheRetention::Extended)
.with_key("release-planning"),
)?;
// then submit the user message via the next cooperative interrupt:
input_request.submit(&mut driver, vec![user_item])?;
}
The override applies to the next model turn only. Later turns fall back to the session default. The set_next_turn_cache call is independent of input submission, so it composes with whichever InputRequest / ToolRoundInfo handle is in scope.
How adapters use it
The loop does not interpret cache semantics itself. It passes the normalized request through to the adapter.
For completions-style providers, the mapping hook is:
#![allow(unused)]
fn main() {
fn apply_prompt_cache(
&self,
body: &mut serde_json::Map<String, Value>,
request: &TurnRequest,
) -> Result<(), LoopError>;
}
That gives adapters three implementation choices:
- use native automatic caching controls
- synthesize explicit cache headers or request fields from the normalized request
- ignore unsupported cache requests in
BestEffortmode, or error inRequiredmode
This is the architectural boundary: agentkit keeps the host-facing API stable while each provider adapter chooses the correct wire format.
Reporting cache usage
Providers can report cache reads and writes through normalized usage fields:
#![allow(unused)]
fn main() {
pub struct TokenUsage {
pub input_tokens: u64,
pub output_tokens: u64,
pub reasoning_tokens: Option<u64>,
pub cached_input_tokens: Option<u64>,
pub cache_write_input_tokens: Option<u64>,
}
}
cached_input_tokens- input tokens served from cache
cache_write_input_tokens- input tokens written into cache on this request
This makes caching visible to reporters and host-side cost accounting without exposing provider-specific response formats.
Practical recommendation
For most hosts, start here:
#![allow(unused)]
fn main() {
SessionConfig {
session_id: SessionId::new("demo"),
metadata: MetadataMap::new(),
cache: Some(PromptCacheRequest {
mode: PromptCacheMode::BestEffort,
strategy: PromptCacheStrategy::Automatic,
retention: Some(PromptCacheRetention::Short),
key: None,
}),
}
}
Then reach for explicit breakpoints only when you need to control exact cache boundaries.
Crate: Prompt caching types live in
agentkit-core. Session and turn-level cache handling is inagentkit-loop. Provider-specific cache mapping is in eachagentkit-provider-*crate.