Qwen3.6-35B-A3B speculative decoding is net-negative on RTX 3090
SMRTR summary
Speculative decoding reduces Qwen3.6-35B-A3B performance by 3-12% on RTX 3090 hardware because the mixture-of-experts architecture loads new components for each token, creating memory overhead. Standard processing achieves optimal speeds at 135.7 tokens per second.
SMRTR provides this summary for quick context. The original article belongs to Hacker News.
Read the original article