Jailbreak Anthropic's new AI safety system for a $15,000 reward
SMRTR summary
Anthropic's new Constitutional Classifiers AI safety system uses one AI to monitor another, guided by principles. In testing, 183 people spent 3,000+ hours trying to jailbreak Claude 3.5 Sonnet without success. The system blocked over 95% of synthetic jailbreak attempts, versus 14% by Claude alone. Anthropic offers up to $15,000 for successful jailbreaks until February 10.
SMRTR provides this summary for quick context. The original article belongs to ZDNet.
Read the original article