Skip to content

Commit

Permalink
Add instructions for dataset preparation
Browse files Browse the repository at this point in the history
  • Loading branch information
robinhad authored Feb 15, 2021
1 parent 32bfbfb commit ee0e5c6
Showing 1 changed file with 13 additions and 0 deletions.
13 changes: 13 additions & 0 deletions scripts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# How to prepare dataset for training

1. Download Ukrainian dataset from [https://github.com/egorsmkv/speech-recognition-uk](https://github.com/egorsmkv/speech-recognition-uk).
2. Delete Common Voice folder in dataset
3. Download [import_ukrainian.py](scripts/import_ukrainian.py) and put into DeepSpeech/bin folder.
4. Run import script
5. Download Common Voice 6.1 Ukrainian dataset
6. Convert to DeepSpeech format
7. Merge train.csv from dataset and from DeepSpeech into one file
8. Put CV files into dataset files folder
9. Put dev.csv and test.csv into folder

You have a reproducible dataset!

0 comments on commit ee0e5c6

Please sign in to comment.