Are LLMs not getting better?
SMRTR summary
Research comparing large language models' code that passes tests versus code approved by maintainers reveals programming abilities plateaued since early 2024, contradicting claims of continuous improvement. Statistical analysis using Brier scores shows constant performance models better predict outcomes than linear growth trends, indicating LLMs haven't actually improved at writing mergeable code for over a year despite industry hype.
SMRTR provides this summary for quick context. The original article belongs to Daily.dev.
Read the original article