SMRTR AI• Aug 12, 2025• Daily.dev

How to Train Your LLM

SMRTR summary

Clay on a potter's wheel transforms into a masterpiece through careful shaping, just like today's sophisticated AI language models. The process begins with a massive foundation of human text—DeepSeek V3 alone trained on 14.8 trillion tokens, equivalent to 123 million novels, nearly matching Google's estimate of all books ever written.

"You can't start working on details until you've first shaped the clay," explains the researcher, describing how language models develop in three critical phases.

First comes self-supervised pre-training, where models consume diverse texts to grasp language fundamentals. Next, post-training fine-tuning teaches models to follow instructions through supervised examples and preference learning. Finally, reinforcement learning rewards desired behaviors like accurate math or logical reasoning.

These models learn tool use and problem-solving through carefully designed training regimens. DeepSeek V3 and Kimi K2, with their trillion-parameter architectures, represent the cutting edge of this technology.

Despite remarkable progress, challenges remain. AI systems often find correct answers through flawed reasoning or fail at tasks with less quantifiable rewards. The ultimate goal—creating truly intentional, truth-seeking assistants—remains an active frontier for researchers.

SMRTR provides this summary for quick context. The original article belongs to Daily.dev.

Read the original article

How to Train Your LLM

Get the next batch of curated summaries in your inbox.