hypura -- Run a 1T parameter model on a 32gb Mac by streaming tensors from NVMe
SMRTR summary
Hypura runs large AI models on memory-limited Macs by streaming components from SSD storage instead of RAM. It runs 40GB Llama 70B on 32GB Macs at 0.3 tokens/second.
SMRTR provides this summary for quick context. The original article belongs to Hacker News.
Read the original article