SMRTR AI• Mar 24, 2026• Hacker News

hypura -- Run a 1T parameter model on a 32gb Mac by streaming tensors from NVMe

SMRTR summary

Hypura runs large AI models on memory-limited Macs by streaming components from SSD storage instead of RAM. It runs 40GB Llama 70B on 32GB Macs at 0.3 tokens/second.

SMRTR provides this summary for quick context. The original article belongs to Hacker News.

Read the original article

SMRTR AI

Get the next batch of curated summaries in your inbox.

This archive is built from SMRTR newsletter summaries. Subscribe for hand-picked stories without the extra noise.