SMRTR AI• Nov 3, 2024• Hacker News

Hertz-dev, the first open-source base model for conversational audio

SMRTR summary

Standard Intelligence has open-sourced hertz-dev, an 8.5 billion parameter audio-only speech generation model. The model consists of three components: hertz-codec, hertz-vae, and a 6.6 billion parameter transformer stack. Hertz-dev offers low latency of 120ms on an RTX 4090, about twice as fast as other public models. This base model can be fine-tuned for various tasks and represents a step toward real-time voice interaction. The company is currently developing a more advanced version of Hertz with improved capabilities and coherence.

SMRTR provides this summary for quick context. The original article belongs to Hacker News.

Read the original article

Hertz-dev, the first open-source base model for conversational audio

Get the next batch of curated summaries in your inbox.