SMRTR AI• Apr 20, 2026• Hacker News

Even 'uncensored' models can't say what they want

SMRTR summary

Researchers discovered that even "uncensored" AI models quietly avoid using certain charged words by reducing their probability during text generation, a phenomenon they call "flinch." Testing seven models from five major labs using 4,442 contexts across categories like political terms, slurs, and violence, they found that safety-filtered pretrains consistently deflected away from controversial words without triggering obvious refusals, creating probability gaps up to 16,000 times lower than unfiltered models would assign.

SMRTR provides this summary for quick context. The original article belongs to Hacker News.

Read the original article

Even 'uncensored' models can't say what they want

Get the next batch of curated summaries in your inbox.