CosyVoice – Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
SMRTR summary
CosyVoice is an open-source text-to-speech system that generates human-like speech in 27 languages and dialects, including Chinese, English, Japanese, and Korean. The latest Fun-CosyVoice 3.0 version achieves state-of-the-art performance in voice cloning and naturalness while supporting streaming output with just 150ms latency for real-time applications.
SMRTR provides this summary for quick context. The original article belongs to Github.
Read the original article