上下文缓存通过复用重复的 prompt 前缀,大幅降低输入成本(最高节省 90%)。NexusFlow 同时支持显式缓存和隐式缓存,兼容 OpenAI 和 Anthropic 两种协议。
请求与返回示例
缓存信息会随正常 Chat Completions 响应一起返回。业务侧仍然从 choices[0].message.content 读取模型结果,从 usage.prompt_tokens_details 读取缓存创建和命中情况。
显式缓存:请求输入
curl -X POST https://nexusflow.hk/v1/chat/completions \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-flash",
"messages": [
{
"role": "system",
"content": [
{
"type": "text",
"text": "<稳定公共前缀,至少 1024 tokens,例如代码库、产品手册、长文档>",
"cache_control": {"type": "ephemeral"}
}
]
},
{
"role": "user",
"content": "基于上面的文档,回答第一个问题"
}
],
"temperature": 0,
"max_tokens": 200
}'
显式缓存:首次返回(创建缓存)
{
"id": "chatcmpl-...",
"object": "chat.completion",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "这里是模型正常返回的答案"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 18628,
"completion_tokens": 344,
"total_tokens": 18972,
"prompt_tokens_details": {
"text_tokens": 18628,
"cache_creation_input_tokens": 18613,
"cache_type": "ephemeral",
"cached_tokens": 0
}
}
}
显式缓存:第二次返回(命中缓存)
{
"id": "chatcmpl-...",
"object": "chat.completion",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "这里是第二次请求的答案"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 18628,
"completion_tokens": 445,
"total_tokens": 19073,
"prompt_tokens_details": {
"text_tokens": 18628,
"cache_creation_input_tokens": 0,
"cache_type": "ephemeral",
"cached_tokens": 18613
}
}
}
隐式缓存:请求输入
curl -X POST https://nexusflow.hk/v1/chat/completions \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.7-max",
"messages": [
{
"role": "system",
"content": "<稳定公共前缀,例如长期不变的知识库、产品说明、代码上下文>"
},
{
"role": "user",
"content": "基于上面的内容,回答新的问题"
}
],
"temperature": 0,
"max_tokens": 200
}'
隐式缓存:可能的命中返回
{
"choices": [
{
"message": {
"role": "assistant",
"content": "这里是模型正常返回的答案"
}
}
],
"usage": {
"prompt_tokens": 15365,
"completion_tokens": 1,
"total_tokens": 15366,
"prompt_tokens_details": {
"cached_tokens": 15232
}
}
}
隐式缓存没有 cache_control 标记,也不会返回 cache_type。如果本次没有命中,prompt_tokens_details.cached_tokens 可能为 0 或不存在;命中与否由上游自动策略决定。