New gpt-oss model from NVIDIA and OpenAI hits record 1.5M tokens per second
SMRTR summary
OpenAI and NVIDIA have released two powerful open-weight language models, gpt-oss-120b and gpt-oss-20b, that achieve record speeds of 1.5 million tokens per second on NVIDIA's hardware. These Apache 2.0-licensed models deliver advanced reasoning capabilities comparable to proprietary systems, with the larger version matching OpenAI's o4-mini while the smaller model runs efficiently on devices with just 16GB of memory, making cutting-edge AI accessible to developers worldwide.
SMRTR provides this summary for quick context. The original article belongs to Interesting Engineering.
Read the original article