Two different tricks for fast LLM inference
SMRTR summary
Anthropic and OpenAI recently launched competing "fast modes" for their coding models using completely different technical approaches. Anthropic achieves 2.5x speed increases by reducing batch sizes while serving their full Opus 4.6 model, while OpenAI partners with Cerebras to deliver 15x faster speeds using specialized giant chips but requires a smaller, less capable model called GPT-5.3-Codex-Spark instead of their premium model.
SMRTR provides this summary for quick context. The original article belongs to Daily.dev.
Read the original article