NoLiMA: GPT-4o achieve 99.3% accuracy in short contexts (<1K tokens), performance degrades to 69.7% at 32K tokens.
SMRTR summary
GPT-4's accuracy on the NoLiMA benchmark declines from 99.3% for short contexts to 69.7% at 32K tokens, highlighting large language models' difficulties with long-context reasoning despite increased context windows.
SMRTR provides this summary for quick context. The original article belongs to Dev.to.
Read the original article