To reproduce our Wall Street Journal (WSJ) experiments, please follow the
instructions below. All the steps should be done at the kaldi WSJ recipe
directory
(or you can add symlinks to all the files (local
, steps
, utils
, etc)
as some people do). In order to perform steps 1, 2, 5 you should source
path.sh
file from the recipe before you sourced $LVSR/env.sh
.
Check that $LVSR
environment variable points to this repository and
$LD_LIBRARY_PATH
includes a path to openfst library.
-
Compile a Fuel-compatible dataset file in HDF5 format. This step requires kaldi and kaldi-python.
First, run prepare data part from the WSJ recipe. You'll get all
*.scp
files which link waves and text.Then, run
$LVSR/exp/wsj/write_hdf_dataset.sh
The resulting file
wsj.h5
should be put to $FUEL_DATA_PATH folder. -
Compile language model FST's from ARPA-format language models provided with WSJ. This step requires kaldi.
$LVSR/exp/wsj/make_all_wsj_graphs.sh <lmfile> <lmsdir>
where
<lmfile>
is the arpa languge model which goes with WSJ dataset. (we placed it to$FUEL_DATA_PATH/WSJ/lm_bg.arpa.gz
) and<lmsdir>
is a directry to place FST language models (we usedata/lms
). -
Train the model. You don't need kaldi for training and it doesn't use any scripts from the recipe.
For the End-to-End Attention-based Large Vocabulary Speech Recognition use the
model=wsj_paper7
and for the Task Loss Estimation for Sequence Prediction usemodel=wsj_reward6
in the following script:$LVSR/bin/run.py train <model> $LVSR/exp/wsj/configs/<model>.yaml
This will start training and save the model to the folder
<model>
. -
Decode the model on the validation and training datasets.
$LVSR/exp/wsj/decode.sh wsj_paper7 <part> <beam-size>
or
$LVSR/exp/wsj/decode_tle.sh wsj_reward6 <part> <beam-size>
for the task loss estimation model.
We typically use beam size 200 to get the best performance. However, with beam size the scores are typically only 10% worse and decoding is much faster. You can see reports at
wsj_paper6/reports
directory. For every language model option, beam size and subset there are alignment images andreport.txt
which contains transcripts, approximate CER and WER and other auxiliary information. -
Score the recognized transcripts:
$LVSR/exp/wsj/score.sh <part> <model>/reports/valid_trigram_200/
This script produces
<part>-text.wer
and<part>-text.errs
files in the corresponding directory. The first one contains WER and the second one the report ofcompute-wer
kadli script.