SMRTR AI• Jun 7, 2026• Hacker News

Deep Dive into LLM Token Cost: How Prompt Caching Works

SMRTR summary

Prompt caching in Claude works in three distinct phases: a file enters as fresh input at full cost, gets written to cache at a 25% premium one turn later, then costs just 10% per turn afterward. In long sessions, this slashes costs dramatically, but gaps longer than five minutes expire the cache, making resumed sessions up to 12 times more expensive than active ones.

SMRTR provides this summary for quick context. The original article belongs to Hacker News.

Read the original article

SMRTR AI

Get the next batch of curated summaries in your inbox.

This archive is built from SMRTR newsletter summaries. Subscribe for hand-picked stories without the extra noise.

Prompt Caching for Anthropic and OpenAI Models: Building Cost-Efficient AI Systems

Prompt caching has emerged as a crucial optimization technique for production AI systems, allowing repeated prompt segments like system instructions and tool schemas to be reused...

Read SMRTR summary Original

AI• Daily.dev• Dec 28, 2025

Prompt Caching Explained

Prompt caching stores a language model's internal state for unchanging prompt prefixes, allowing subsequent requests to skip reprocessing those tokens and achieve up to 80%...

Read SMRTR summary Original

AI• Hacker News• Apr 13, 2026

Your intuition of LLM token usage might be wrong

A developer discovered that LLM token usage patterns differ drastically from expectations during a 30-minute coding session with GPT-4-mini, where cached reads consumed 10 times...

Read SMRTR summary Original

AI• Hacker News• Mar 1, 2026

Claude Prompt to Find Inefficiencies in LLM Usage

A new Claude prompt helps developers identify where they might be overusing large language models by having coding agents analyze their workflows for inefficiencies. The tool...

Read SMRTR summary Original

AI• Daily.dev• Feb 1, 2026

How to Run Claude Code for Free with Local and Cloud Models from Ollama

Ollama recently announced compatibility with Anthropic's Messages API, allowing developers to run Claude Code using free local models instead of paying $100-200 monthly for...

Read SMRTR summary Original

AI• Daily.dev• Aug 28, 2025

Are OpenAI and Anthropic Really Losing Money on Inference?

Analyzing AI inference costs shows a stark contrast between input and output token economics. Using a 72 H100 GPU cluster at $2/hour per GPU, input processing costs about $0.003...

Read SMRTR summary Original

Deep Dive into LLM Token Cost: How Prompt Caching Works

Get the next batch of curated summaries in your inbox.

Related Stories

Prompt Caching for Anthropic and OpenAI Models: Building Cost-Efficient AI Systems

Prompt Caching Explained

Your intuition of LLM token usage might be wrong

Claude Prompt to Find Inefficiencies in LLM Usage

How to Run Claude Code for Free with Local and Cloud Models from Ollama

Are OpenAI and Anthropic Really Losing Money on Inference?