Hertz-dev, the first open-source base model for conversational audio
SMRTR summary
Standard Intelligence has open-sourced hertz-dev, an 8.5 billion parameter audio-only speech generation model. The model consists of three components: hertz-codec, hertz-vae, and a 6.6 billion parameter transformer stack. Hertz-dev offers low latency of 120ms on an RTX 4090, about twice as fast as other public models. This base model can be fine-tuned for various tasks and represents a step toward real-time voice interaction. The company is currently developing a more advanced version of Hertz with improved capabilities and coherence.
SMRTR provides this summary for quick context. The original article belongs to Hacker News.
Read the original article