The Hidden Attack Surface in Every LLM: How Special Tokens Enable 96% Jailbreak Success Rates
SMRTR summary
Security researchers discovered that attackers can exploit special tokens in large language models to achieve a 96% jailbreak success rate by injecting control sequences like <|im_start|> directly into user input. These tokens, designed to structure conversations and mark role boundaries, are treated as authoritative commands rather than regular text, allowing attackers to hijack system instructions and bypass safety measures. The vulnerability affects major models including GPT-3.5, GPT-4, and LLaMA, with techniques ranging from direct role-switching to invisible Unicode payload injection that evades detection systems.
SMRTR provides this summary for quick context. The original article belongs to Daily.dev.
Read the original article