SMRTR AIOct 5, 2025Hacker News

What GPT-OSS leaks about OpenAI's training data

SMRTR summary

GPT models leak training data through "glitch tokens" that reveal phrases from adult websites, gambling sites, and spam content scraped from GitHub repositories. This demonstrates how open-weight models can expose sensitive training data composition.

SMRTR provides this summary for quick context. The original article belongs to Hacker News.

Read the original article
SMRTR AI

Get the next batch of curated summaries in your inbox.

This archive is built from SMRTR newsletter summaries. Subscribe for hand-picked stories without the extra noise.