Sopro TTS: A 169M model with zero-shot voice cloning that runs on the CPU
SMRTR summary
Sopro TTS is a lightweight 169-million parameter English text-to-speech model that enables zero-shot voice cloning using just 3-12 seconds of reference audio while running efficiently on standard CPUs. The model achieves 0.25 real-time factor on an M3 CPU, generating 30 seconds of audio in 7.5 seconds, and uses dilated convolutions instead of Transformer architecture. Sopro supports streaming generation and offers adjustable parameters for voice similarity control, making advanced voice synthesis accessible without specialized hardware requirements.
SMRTR provides this summary for quick context. The original article belongs to Hacker News.
Read the original article