Google's Gemma 4 open AI models use "speculative decoding" to get up to 3x faster
SMRTR summary
Google's Gemma 4 AI models just got significantly faster through a technique called speculative decoding, which uses small "drafter" models to predict upcoming tokens while the main model is still processing — cutting generation time by up to 3x on consumer hardware where memory speed is a common bottleneck.
SMRTR provides this summary for quick context. The original article belongs to Ars Technica.
Read the original article