Running Google Gemma 4 Locally with LM Studio's New Headless CLI and Claude Code
SMRTR summary
Google's new Gemma 4 model is turning laptops into surprisingly powerful AI workstations. The 26-billion parameter model uses a clever mixture-of-experts architecture that activates only 4 billion parameters per task, letting it run at 51 tokens per second on a MacBook Pro while delivering performance that rivals models hundreds of times larger.
This efficiency breakthrough means developers can now run sophisticated AI entirely offline. No API costs, no data leaving your machine, and no rate limits. The model fits comfortably in 48 GB of memory and handles everything from code review to image analysis locally.
LM Studio's new headless architecture makes this practical for real workflows. The latest version strips away the GUI requirement, letting you run everything from the command line. You can even configure it to work with existing tools like Claude Code, creating a fully offline coding assistant that costs nothing per use.
The performance numbers tell the story: Gemma 4 scores competitively against models requiring 400 billion parameters while running on consumer hardware. During testing, the system pushed memory to 46 GB with GPU utilization at 90%, but stayed responsive throughout. For developers tired of cloud API limitations, this represents a genuine shift toward practical local AI inference.
SMRTR provides this summary for quick context. The original article belongs to Hacker News.
Read the original article