Should LLMs just treat text content as an image?
SMRTR summary
DeepSeek research reveals that language models can extract 10 text tokens from a single image token with near-perfect accuracy, making image representations 10 times more efficient than text. This "optical compression" technique allows models to process dramatically more information by converting text into images before processing, potentially revolutionizing how AI systems handle large amounts of textual data.
SMRTR provides this summary for quick context. The original article belongs to Daily.dev.
Read the original article