- Overview
"DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu’s Deep Speech research paper."DeepSpeech provide lots of lauange api support, Python Javascript, c, and it's easily use to involve in application
- Install DeepSpeech
Follow user guide instruction.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Create and activate a virtualenv | |
virtualenv -p python3 $HOME/tmp/deepspeech-venv/ | |
source $HOME/tmp/deepspeech-venv/bin/activate | |
# Install DeepSpeech | |
pip3 install deepspeech | |
# Download pre-trained English model files | |
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.7.0/deepspeech-0.7.0-models.pbmm | |
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.7.0/deepspeech-0.7.0-models.scorer | |
# Download example audio files | |
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.7.0/audio-0.7.0.tar.gz | |
tar xvf audio-0.7.0.tar.gz |
- Demo
Using command line tool to inference sound data.
$> deepspeech --model deepspeech-0.7.0-models.pbmm --audio audio/2830-3980-0043.wavLoading model from file deepspeech-0.7.0-models.pbmmOutput:
TensorFlow: v1.15.0-24-gceb46aa
DeepSpeech: v0.7.1-0-g2e9c281
Loaded model in 0.0093s.
Loading scorer from files deepspeech-0.7.0-models.scorer
Loaded scorer in 0.00023s.
Running inference.
experience proves this
Inference took 1.480s for 1.975s audio file.
The red color string is inference text data of input sound data.Using Python API to inference sound data.This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
import wave import numpy as np data = "audio/8455-210777-0068.wav" # your power is sufficient i said sound data # using wave lib to read wav file wf = wave.open(data, 'rb') frames = wf.getnframes() pcm_data = wf.readframes(frames) wf.close() # transfer audio data to int16 type audio = np.frombuffer(pcm_data, dtype=np.int16) from deepspeech import Model # load pre-trained model ds = Model("./deepspeech-0.7.0-models.pbmm") # do inference output = ds.stt(audio) # print inference text data print(output) Output:your power is sufficient i said sound data
Reference:
沒有留言:
張貼留言