GLM-5.2 ranks better than GPT-5.5 in new agentic knowledge work eval
SMRTR summary
Claude Fable 5 tops a new AI benchmark called AA-Briefcase, which tests models on real-world tasks like financial modeling and strategy work using thousands of messy files. GLM-5.2 ranks third overall but leads open-weight models, beating GPT-5.5 while costing less than 25% of Claude Opus 4.8's price.
SMRTR provides this summary for quick context. The original article belongs to Hacker News.
Read the original article