Transcribe speech 100x faster and 100x cheaper with open models
SMRTR summary
NVIDIA and Kyutai have released highly accurate open-source speech recognition models, rivaling proprietary APIs. These models, like Parakeet and Canary, offer impressive word error rates and processing speeds across multiple languages.
Modal implemented a batch transcription service using these models, achieving over 100x faster or cheaper performance compared to a popular proprietary API. The service can transcribe one week of audio in just one minute for $1, demonstrating significant cost savings and efficiency gains for large-scale audio processing tasks.
SMRTR provides this summary for quick context. The original article belongs to Daily.dev.
Read the original article