SMRTR TechFeb 4, 2025ZDNet

Jailbreak Anthropic's new AI safety system for a $15,000 reward

SMRTR summary

Anthropic's new Constitutional Classifiers AI safety system uses one AI to monitor another, guided by principles. In testing, 183 people spent 3,000+ hours trying to jailbreak Claude 3.5 Sonnet without success. The system blocked over 95% of synthetic jailbreak attempts, versus 14% by Claude alone. Anthropic offers up to $15,000 for successful jailbreaks until February 10.

SMRTR provides this summary for quick context. The original article belongs to ZDNet.

Read the original article
SMRTR Tech

Get the next batch of curated summaries in your inbox.

This archive is built from SMRTR newsletter summaries. Subscribe for hand-picked stories without the extra noise.