SMRTR ProgrammingAug 10, 2025Hacker News

GPT-OSS vs. Qwen3 and a detailed look how things evolved since GPT-2

SMRTR summary

The clicks of mechanical keyboards echo across Silicon Valley as OpenAI unveils its first open-weight language models since 2019. Released this week, gpt-oss-120b and gpt-oss-20b mark a significant departure from the company's closed-source approach of recent years.

"We still have not found anything better than the transformer architecture," notes the technical overview, explaining the models' conventional foundation beneath sophisticated optimizations.

What makes these models particularly remarkable is their ability to run locally. Through MXFP4 optimization, the 20B model can operate on consumer GPUs with just 16GB of RAM, while the 120B version requires a single H100 with 80GB.

Architecture-wise, both models feature a blend of modern enhancements: rotary position embedding replaced absolute positioning; SwiGLU activation functions supplanted GELU; and sliding-window attention alternates with full-context attention layers.

Early benchmarks suggest these open models perform comparably to proprietary alternatives. The 120B version even approaches GPT-5 on reasoning tasks despite being half the size of competing open models like Qwen3.

For developers and researchers seeking accessible yet powerful language models, these releases arrive as a welcome gift from an unexpected source.

SMRTR provides this summary for quick context. The original article belongs to Hacker News.

Read the original article
SMRTR Programming

Get the next batch of curated summaries in your inbox.

This archive is built from SMRTR newsletter summaries. Subscribe for hand-picked stories without the extra noise.