SMRTR AIApr 20, 2026Hacker News

Even 'uncensored' models can't say what they want

SMRTR summary

Researchers discovered that even "uncensored" AI models quietly avoid using certain charged words by reducing their probability during text generation, a phenomenon they call "flinch." Testing seven models from five major labs using 4,442 contexts across categories like political terms, slurs, and violence, they found that safety-filtered pretrains consistently deflected away from controversial words without triggering obvious refusals, creating probability gaps up to 16,000 times lower than unfiltered models would assign.

SMRTR provides this summary for quick context. The original article belongs to Hacker News.

Read the original article
SMRTR AI

Get the next batch of curated summaries in your inbox.

This archive is built from SMRTR newsletter summaries. Subscribe for hand-picked stories without the extra noise.