SMRTR AINov 3, 2024Hacker News

Hertz-dev, the first open-source base model for conversational audio

SMRTR summary

Standard Intelligence has open-sourced hertz-dev, an 8.5 billion parameter audio-only speech generation model. The model consists of three components: hertz-codec, hertz-vae, and a 6.6 billion parameter transformer stack. Hertz-dev offers low latency of 120ms on an RTX 4090, about twice as fast as other public models. This base model can be fine-tuned for various tasks and represents a step toward real-time voice interaction. The company is currently developing a more advanced version of Hertz with improved capabilities and coherence.

SMRTR provides this summary for quick context. The original article belongs to Hacker News.

Read the original article
SMRTR AI

Get the next batch of curated summaries in your inbox.

This archive is built from SMRTR newsletter summaries. Subscribe for hand-picked stories without the extra noise.