Code for collection/generation of text for tts data collection
Developers:
- Anders Jess Pedersen ([email protected])
- Dan Saattrup Nielsen ([email protected])
The quickest way to build the dataset is using Docker. With Docker installed, simply
write make docker
and the final dataset will be built in the data/processed
directory, with the individual datasets in data/raw
.
To install the project for further development, run the following steps:
- Run
make install
, which installs Poetry (if it isn't already installed), sets up a virtual environment and all Python dependencies therein. - Run
source .venv/bin/activate
to activate the virtual environment.
With the project installed, you can build the dataset by running:
python src/scripts/build_tts_dataset.py
NB: Running the above script on a machine running MacOS may result in an urllib.error.URLError
-exception being thrown, in which case one should follow the steps described here.
.
├── .devcontainer
│ └── devcontainer.json
├── .github
│ └── workflows
│ ├── ci.yaml
│ └── docs.yaml
├── .gitignore
├── .pre-commit-config.yaml
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── Dockerfile
├── LICENSE
├── README.md
├── config
│ ├── __init__.py
│ ├── config.yaml
│ └── hydra
│ └── job_logging
│ └── custom.yaml
├── data
│ ├── final
│ │ └── .gitkeep
│ ├── processed
│ │ └── .gitkeep
│ └── raw
│ └── .gitkeep
├── docs
│ └── .gitkeep
├── gfx
│ ├── .gitkeep
│ └── alexandra_logo.png
├── makefile
├── models
│ └── .gitkeep
├── notebooks
│ └── .gitkeep
├── poetry.lock
├── poetry.toml
├── pyproject.toml
├── src
│ ├── scripts
│ │ ├── build_tts_dataset.py
│ │ └── fix_dot_env_file.py
│ └── tts_text
│ ├── __init__.py
│ ├── __pycache__
│ ├── bus_stops_and_stations.py
│ ├── dates.py
│ ├── times.py
│ └── utils.py
└── tests
├── __init__.py
├── __pycache__
└── test_dummy.py