SMRTR AI• Nov 30, 2025• Daily.dev

Why We’ve Been Optimizing the Wrong Thing in LLMs for Years

SMRTR summary

Traditional large language models waste computational resources by spending equal effort predicting common filler words like "the" and "and" as meaningful words, despite filler words comprising over 50% of English text. Meta researchers developed Multi-Token Prediction (MTP), which trains models to predict multiple future tokens simultaneously rather than just the next word. MTP models achieve up to 17% better performance on coding benchmarks and 3x faster inference speeds, with DeepSeek-V3 implementing this approach in production, though the method struggles with knowledge-retrieval tasks.

SMRTR provides this summary for quick context. The original article belongs to Daily.dev.

Read the original article

Why We’ve Been Optimizing the Wrong Thing in LLMs for Years

Get the next batch of curated summaries in your inbox.