Zac Liu provides a tutorial on how you can use A360 AI Platform to easily run OpenAI’s Whisper model without installing it yourself.
OpenAI has released Whisper, their state-of-the-art speech recognition (speech-to-text) model. OpenAI states that Whisper approaches human-level robustness and accuracy on English speech recognition. Whisper model is now open source and freely available for anyone to use.
Installing Whisper model in your local environment might not be very straight-forward. It requires Python 3.7+, new version of PyTorch, and FFmpeg, an audio processing library. But here we provide an easy way for you to run Whisper in a cloud environment without any complicated installation process by using the A360 AI Platform Community Edition.
A360 AI Community Edition is freely available to data scientists. You can sign up here.
A360 AI Platform
Once you have access to A360 AI Platform, navigate to Project, from there you can create a project. Then go to Workspace to create a new workspace. Select our custom “openai–whisper–cpu” image to start a Jupyter Lab workspace; the recommended compute configuration for running Whisper model is 2 CPU/ 4 GB Memory.
Once your Jupyter server is configured, you will be able to use the starter notebook to begin transcribing your audio files into text!
Note: Since we only provide CPU compute in the Community Edition, each 10–sec audio file takes about 5-7 seconds to complete the transcription. If you are interested in using GPU on A360 AI Platform, please reach out to us here.
- OpenAI Whisper public release.
- Official Whisper Github repository.
- Deep-dive analysis into Whisper’s accuracy, inference time, and cost-to-run.