This repo supports the following speech datasets:
- LJ Speech (Public Domain)
- Blizzard 2012 (Creative Commons Attribution Share-Alike)
You can use any other dataset if you write a preprocessor for it.
Each training example consists of:
- The text that was spoken
- A mel-scale spectrogram of the audio
- A linear-scale spectrogram of the audio
The preprocessor is responsible for generating these. See ljspeech.py for a commented example.
For each training example, a preprocessor should:
-
Load the audio file:
wav = audio.load_wav(wav_path)
-
Compute linear-scale and mel-scale spectrograms (float32 numpy arrays):
spectrogram = audio.spectrogram(wav).astype(np.float32) mel_spectrogram = audio.melspectrogram(wav).astype(np.float32)
-
Save the spectrograms to disk:
np.save(os.path.join(out_dir, spectrogram_filename), spectrogram.T, allow_pickle=False) np.save(os.path.join(out_dir, mel_spectrogram_filename), mel_spectrogram.T, allow_pickle=False)
Note that the transpose of the matrix returned by
audio.spectrogram
is saved so that it's in time-major format. -
Generate a tuple
(spectrogram_filename, mel_spectrogram_filename, n_frames, text)
to write to train.txt. n_frames is just the length of the time axis of the spectrogram.
After you've written your preprocessor, you can add it to preprocess.py by following the example of the other preprocessors in that file.
If your training data is in a language other than English, you will probably want to change the
text cleaners by setting the cleaners
hyperparameter.
-
If your text is in a Latin script or can be transliterated to ASCII using the Unidecode library, you can use the transliteration cleaners by setting the hyperparameter
cleaners=transliteration_cleaners
. -
If you don't want to transliterate, you can define a custom character set. This allows you to train directly on the character set used in your data.
To do so, edit symbols.py and change the
_characters
variable to be a string containing the UTF-8 characters in your data. Then set the hyperparametercleaners=basic_cleaners
. -
If you're not sure which option to use, you can evaluate the transliteration cleaners like this:
from text import cleaners cleaners.transliteration_cleaners('Здравствуйте') # Replace with the text you want to try