- Initial code release with training and inference pipelines.
- Checkpoint release
Install the dependencies from requirements.txt
:
pip install -r requirements.txt
-
Noise Dataset for WavLM-based Augmentation: The noise dataset for the WavLM noise augmentation is sourced from DNS Challenge. You can use the following script to download the dataset:
bash download-dns-challenge-3.sh
and untar
datasets_fullband/datasets_fullband.noise_fullband.tar.bz2
-
Generated Datasets: The other data used for training SYLBER are generated using the SDHuBERT repository. Please follow the instructions there for data preparation.
-
Checkpoints: Pretrained model checkpoints for sylber are available on Google Drive: linik
python train.py --config-name=sylber_base
python train.py --config-name=sylber_base_stage2
The training is split into two stages. Make sure to review the configurations in the configs/
directory for detailed settings.
For inference to obtain segmentations and visualize results, please refer to demo.ipynb
.
For using SPARC, refer to Speech-Articulatory-Coding for installation and usage instructions.
Website adapted from: https://github.com/BytedanceSpeech/bytedancespeech.github.io
If you use this work, please cite our paper:
@article{cho2024sylber,
title={Sylber: Syllabic Embedding Representation of Speech from Raw Audio},
author={Cho, Cheol Jun and Lee, Nicholas and Gupta, Akshat and Agarwal, Dhruv and Chen, Ethan and Black, Alan W and Anumanchipalli, Gopala K},
journal={arXiv preprint arXiv:2410.07168},
year={2024}
}