Discord STT bot using Mozilla DeepSpeech

Ilya Nevolin
2 min readApr 1, 2020

I was researching other solutions for doing Speech-to-Text on my Discord bots. I stumbled upon Mozilla’s DeepSpeech open-source project. I was a bit afraid getting this thing up and running would take a long time, but it only took me a few minutes. The upside is that it’s a bit quicker at transcribing since it’s a local solution, but its accuracy is less than my current provider’s (WitAI). DeepSpeech officially only supports the English language and its accuracy is pretty okay.

Installing and using DeepSpeech is pretty straightforward.

  1. Use npm to install the DeepSpeech package:
npm i deepspeech

2. Download the latest model from: https://github.com/mozilla/DeepSpeech/releases

3. Extract the model archive.

4. Add the code below to one my bots, and adjust the model paths if necessary.

5. Replace the function call “transcribe_witai” to “transcribe_deepspeech” instead.

const DeepSpeech = require('deepspeech');
const LM_ALPHA = 0.75;
const LM_BETA = 1.85;
const BEAM_WIDTH = 1024;
let modelPath = './models/output_graph.pbmm';
let lmPath = './models/lm.binary';
let triePath = './models/trie';
let model = new DeepSpeech.Model(modelPath, BEAM_WIDTH);
let desiredSampleRate = model.sampleRate();
model.enableDecoderWithLM(lmPath, triePath, LM_ALPHA, LM_BETA);
async function transcribe_deepspeech(file) {
const audioBuffer = fs.readFileSync(file)
const audioLength = (audioBuffer.length / 2) * (1 / desiredSampleRate);
console.log('audio length', audioLength);
let result = model.stt(audioBuffer.slice(0, audioBuffer.length / 2));
console.log('result:', result);
return result;
}

For more info visit DeepSpeech’s Github: https://github.com/mozilla/DeepSpeech

My Github: https://github.com/healzer

--

--

Ilya Nevolin

Become a rockstar programmer and try to reach genius status on codr https://nevolin.be/codr/