Cohere launches a practical, open voice model for transcription
Cohere has released a new open-source voice model specifically aimed at transcription use cases. At a relatively light 2 billion parameters, the model is designed to run on consumer-grade GPUs, making it practical for individuals and organizations that want to self-host speech-to-text capabilities without relying on cloud APIs.
The model currently supports 14 languages, bringing accurate and private transcription within reach for a broad set of users. By targeting a compact footprint, Cohere has prioritized accessibility and affordability—allowing independent developers, journalists, educators, and small businesses to deploy transcription locally with modest hardware.
Open-sourcing the model invites the community to contribute improvements, build custom fine-tuned variants, and integrate the technology into accessibility tools like captioning and assistive apps. The ability to self-host also addresses privacy and compliance concerns for sensitive audio data, while lowering operating costs compared with some cloud transcription services.
Overall, this release is a practical win for democratising speech-to-text: it combines multilingual support, manageable compute requirements, and an open development model to accelerate real-world transcription use-cases across many sectors.