SMRTR Tech• Feb 4, 2025• ZDNet

Jailbreak Anthropic's new AI safety system for a $15,000 reward

SMRTR summary

Anthropic's new Constitutional Classifiers AI safety system uses one AI to monitor another, guided by principles. In testing, 183 people spent 3,000+ hours trying to jailbreak Claude 3.5 Sonnet without success. The system blocked over 95% of synthetic jailbreak attempts, versus 14% by Claude alone. Anthropic offers up to $15,000 for successful jailbreaks until February 10.

SMRTR provides this summary for quick context. The original article belongs to ZDNet.

Read the original article

Jailbreak Anthropic's new AI safety system for a $15,000 reward

Get the next batch of curated summaries in your inbox.