Llama 3.1 405B now runs at 969 tokens/s on Cerebras Inference
SMRTR summary
Cerebras Inference achieved record speeds with Meta's Llama 3.1 405B model, producing 969 tokens per second, surpassing GPT-4 and Claude 3.5 Sonnet by 12-18 times while offering better latency and context length support.
SMRTR provides this summary for quick context. The original article belongs to Hacker News.
Read the original article