Skip to content

Latest commit

 

History

History
94 lines (67 loc) · 3.08 KB

README.md

File metadata and controls

94 lines (67 loc) · 3.08 KB

Align Open.Bible data

Language Passing Failing Unknown Notes Aligned Sample
Yoruba 💚 Psalm 119
Ewe 💚 Psalm 119
Lingala 💚 Psalm 119
Asante Twi 💚
Akuapem Twi 💚
Chichewa ❤️‍🩹 Passing with bad alignments Psalm 119
Hausa 💔
Luo 💔
Luganda 💔
Kikuyu 💔
Arabic
Kurdi Sorani
Polish
Vietnamese

Clone this repo

$ git clone https://github.com/coqui-ai/open-bible-scripts.git

Alignment Approach 1: Use the Montreal Forced Aligner

The first alignment approach is to use MFA to align and train a new acoustic model from stratch.

Dependencies

You need to install a couple things on your own:

gnu-parallel covo

Start with the run script for pre-processing

Use the language name as defined in open-bible-scripts/data/*.txt. Use the language code as expected by covo.

E.g., for Yoruba use yoruba and yo, for Ewe use ewe and ee, for Luganda luganda and lg, and so on.

$ cd open-bible-scripts
open-bible-scripts$ ./run-pre-alignment.sh yoruba yo

Generate alignments with mfa train

$ docker run -it --mount "type=bind,src=/home/ubuntu/open-bible-scripts,dst=/mnt" mmcauliffe/montreal-forced-aligner
(base) root@d8095c794d5f:/# conda activate aligner
(aligner) root@d8095c794d5f:/# mfa train --clean --num_jobs `nproc` --temp_directory /mnt/yoruba/data/mfa-tmp-dir --config_path /mnt/MFA_CONFIG /mnt/yoruba/data /mnt/yoruba/dict.txt /mnt/yoruba/data/mfa-output &> /mnt/yoruba/data/LOG &

# At this point, alignment will take a while,
# so you might want to detach from the docker container 
# with `Ctrl-P followed by Ctrl-Q`

Finish with the run script for post-processing

Use the language name as defined in open-bible-scripts/data/*.txt.

E.g., for Yoruba use yoruba, for Ewe use ewe, for Luganda luganda, and so on.

$ cd open-bible-scripts
open-bible-scripts$ ./run-post-alignment.sh yoruba yo

Alignment Approach 2: Use timing files from Biblica

This works for only Lingala, Akuapem Twi, and Asante Twi.

Split using timing file

Install sox on your OS. See linux installation below

sudo apt-get install sox
sudo apt-get install libsox-fmt-mp3
sox --version
python3 -m venv venv
source venv/bin/activate
pip install -U pip
pip install pandas

Execute the run-biblica-splits-*.sh script from the root dir, for example with Lingala:

./run-biblica-splits-lingala.sh