Skip to content

Latest commit

 

History

History
144 lines (112 loc) · 4.18 KB

training_with_cli.md

File metadata and controls

144 lines (112 loc) · 4.18 KB

Training with pyannote-audio-train command line tool

This tutorial is a very preliminary draft. Expect hiccups and missing details.

pyannote.audio provides a command line tool called pyannote-audio-train that allows to train models directly from your terminal. It relies on extra dependencies installed using th [cli] suffix:

pip install pyannote-audio[cli]

TL;DR

Calling the following command would train a PyanNet model for voice activity detection on the AMI corpus...

pyannote-audio-train \
    model=PyanNet \
    task=VoiceActivityDetection \
    protocol=AMI.SpeakerDiarization.only_words

... which is more or less equivalent to running the following Python script:

from pyannote.audio.tasks import VoiceActivityDetection
from pyannote.audio.models.segmentation import PyanNet
from pyannote.database import get_protocol
from pytorch_lightning import Trainer

protocol = get_protocol("AMI.SpeakerDiarization.only_words")
task = VoiceActivityDetection(protocol)
model = PyanNet(task=task)
trainer = Trainer()
trainer.fit(model)

Hydra-based configuration

pyannote-audio-train relies on Hydra to configure the training process. Adding --cfg job option to the previous command will let you know about the actual configuration used for training:

pyannote-audio-train --cfg job \
    model=PyanNet \
    task=VoiceActivityDetection \
    protocol=AMI.SpeakerDiarization.only_words
task:
  _target_: pyannote.audio.tasks.VoiceActivityDetection
  duration: 3.0
  warm_up: 0.0
  balance: null
  weight: null
  batch_size: 32
  num_workers: null
  pin_memory: false
[...]

To change the duration of audio chunks used for training to 2 seconds, you would do

pyannote-audio-train \
    model=PyanNet \
    task=VoiceActivityDetection task.duration=2.0 \
    protocol=AMI.SpeakerDiarization.only_words

You get the idea...

Configuring data augmentation

Create a YAML file that can be loaded by torch_audiomentations's from_dict utility function:

cat /path/to/custom_config/augmentation/background_noise.yaml
# @package _group_
transform: Compose
params:
  shuffle: False
  transforms:
    - transform: AddBackgroundNoise
      params:
        background_paths: /path/to/directory/containing/background/audio
        min_snr_in_db: 5.
        max_snr_in_db: 15.
        mode: per_example
        p: 0.9
pyannote-audio-train \
    --config-dir /path/to/custom_config \
    model=PyanNet \
    task=VoiceActivityDetection task.duration=2.0 \
    protocol=AMI.SpeakerDiarization.only_words \
    +augmentation=background_noise

Training on a Slurm cluster

As described in Hydra's documentation, hydra-submitit-launcher allows to launch multiple jobs (e.g. to perform a grid search on some hyper-parameters).

pip install hydra-submitit-launcher --upgrade

Here, we launch a grid of (3 x 2 =) six different jobs:

  • 2, 3, or 4 LSTM layers
  • mono-directional or bidirectional LSTMs
pyannote-audio-train 
    --multirun hydra/launcher=submitit_slurm \
    model=PyanNet +model.lstm.num_layers=2,3,4 +model.lstm.bidirectional=true,false \
    task=VoiceActivityDetection \
    protocol=AMI.SpeakerDiarization.only_words

Known bugs: pytorch-lightning + hydra-submitit + multi-GPU do not play well together (here, here, and here)

Training on Jean Zay cluster

+hydra.launcher.additional_parameters.account=eie@gpu   # --account option
hydra.launcher.qos=qos_gpu-dev                          # QOS
hydra.launcher.gpus_per_task=1                          # number of GPUs
hydra.launcher.cpus_per_gpu=10                          # number of CPUS per GPUs (10 is )
hydra.launcher.timeout_min=120                          # --time option (in minutes)
task.duration=2,5,10 hydra.sweep.subdir=\${task.duration}s_chunks