Crossing the uncanny valley of conversational voice
SMRTR summary
Sesame is developing "voice presence" technology for more natural AI voice assistants. Their Conversational Speech Model uses transformers to generate contextual speech in real-time. While promising, the system still lags behind humans in prosody. Sesame plans to scale up the model, expand language support, and explore duplex models for conversational dynamics. The company is open-sourcing key research components to encourage collaboration.
SMRTR provides this summary for quick context. The original article belongs to Daily.dev.
Read the original article