SMRTR AI• Aug 23, 2025• 9to5Mac

Apple trained a large language model to efficiently understand long-form video

SMRTR summary

Apple researchers developed SlowFast-LLaVA-1.5, a more efficient video-understanding language model that outperforms larger competitors. The model uses a two-stream approach—analyzing fewer frames in detail while tracking movement across more frames—allowing it to process long videos without overwhelming its context window. Despite handling only 128 frames maximum, it sets new benchmarks for video analysis while maintaining strong image understanding capabilities.

SMRTR provides this summary for quick context. The original article belongs to 9to5Mac.

Read the original article

Apple trained a large language model to efficiently understand long-form video

Get the next batch of curated summaries in your inbox.