Are OpenAI and Anthropic Really Losing Money on Inference?
SMRTR summary
Analyzing AI inference costs shows a stark contrast between input and output token economics. Using a 72 H100 GPU cluster at $2/hour per GPU, input processing costs about $0.003 per million tokens while output generation costs $3.08 per million—a thousand-fold difference. This explains why services like ChatGPT and Claude Code can be profitable despite heavy usage. API businesses enjoy 80-95% gross margins, challenging the notion that AI inference is unsustainably expensive. The economics favor applications processing large inputs with minimal outputs, while video generation remains costly due to its reverse pattern.
SMRTR provides this summary for quick context. The original article belongs to Daily.dev.
Read the original article