SMRTR AIJun 22, 2026Hacker News

A Theory of Why Prompt Injection Works

SMRTR summary

LLMs process everything — user prompts, system instructions, webpage data — as one continuous stream of text, relying on role tags to distinguish commands from data. Researchers found that LLMs identify roles by writing style rather than actual tags, making them vulnerable to "CoT Forgery" attacks that mimic reasoning style to hijack model behavior, pushing jailbreak success rates from near-zero to 60%.

SMRTR provides this summary for quick context. The original article belongs to Hacker News.

Read the original article
SMRTR AI

Get the next batch of curated summaries in your inbox.

This archive is built from SMRTR newsletter summaries. Subscribe for hand-picked stories without the extra noise.