'Humanity's Last Exam' benchmark is stumping top AI models - can you do any better?
SMRTR summary
A new benchmark, Humanity's Last Exam (HLE), tests AI knowledge limits with 3,000 expert-level questions across various subjects. Created by Scale AI and the Center for AI Safety, HLE addresses "benchmark saturation" as AI improves. Current AI models scored below 10% on HLE, compared to over 90% on some existing benchmarks. Researchers will use HLE to study AI systems and their limitations, with plans for public release.
SMRTR provides this summary for quick context. The original article belongs to ZDNet.
Read the original article