orthrus: Fast, lossless LLM inference via dual-view diffusion decoding.
SMRTR summary
Orthrus speeds up large language model text generation by up to 7.8× by combining autoregressive and diffusion-style parallel decoding with a shared memory cache, outperforming methods like EAGLE-3 at long context lengths.
SMRTR provides this summary for quick context. The original article belongs to Daily.dev.
Read the original article