DeepSeek releases ‘sparse attention’ model that cuts API costs in half
SMRTR summary
DeepSeek launched an experimental AI model called V3.2-exp that uses "sparse attention" technology to dramatically reduce server costs for long-context operations by using a "lightning indexer" and token selection system to process only the most relevant information. Initial testing shows the new approach can cut API costs in half for long-context tasks, potentially offering valuable cost-saving techniques to AI providers struggling with expensive inference operations.
SMRTR provides this summary for quick context. The original article belongs to TechCrunch.
Read the original article