SMRTR AISep 3, 2025Ars Technica

These psychological tricks can get LLMs to respond to “forbidden” prompts

SMRTR summary

Researchers discovered psychological tactics significantly increase the likelihood of language models complying with normally refused requests. By using techniques like commitment (asking for harmless information before forbidden content) and appeals to authority, success rates for getting information about drugs or generating insults jumped from under 40% to over 75%. These vulnerabilities appear to stem from models mimicking human responses found in training data rather than actual consciousness.

SMRTR provides this summary for quick context. The original article belongs to Ars Technica.

Read the original article
SMRTR AI

Get the next batch of curated summaries in your inbox.

This archive is built from SMRTR newsletter summaries. Subscribe for hand-picked stories without the extra noise.