What GPT-OSS leaks about OpenAI's training data
SMRTR summary
GPT models leak training data through "glitch tokens" that reveal phrases from adult websites, gambling sites, and spam content scraped from GitHub repositories. This demonstrates how open-weight models can expose sensitive training data composition.
SMRTR provides this summary for quick context. The original article belongs to Hacker News.
Read the original article