Skip to content

Latest commit

 

History

History
113 lines (91 loc) · 4.92 KB

README.md

File metadata and controls

113 lines (91 loc) · 4.92 KB

Translation

Feature Extraction

Once you have obtained a pretrained model, and your data is prepared as outlined in DATASETS.md, you are ready for feature extraction. Below is an example for how it can be run.

num_items_per_shard is tuned based on the size of our dataset (fewer items per shard result in larger slurm array). max_batch_size=2 is based on our setup using fp32 and 32GB V100 GPUs. When using a CLIP checkpoint with SignHiera vision tower, pass from_clip=true. If you are not using a SLURM cluster, you can pass launcher.cluster=local to run feature extraction on the local machine.

For additional parameters, please refer to the FeatureExtractionConfig dataclass in launch_feature_extraction.py.

python launch_feature_extraction.py \
  data_dir=/path/to/how2sign \
  output_dir=/path/to/empty/folder/for/features \
  from_clip=true \
  video_backend=pyav \
  model_name=hiera_base_128x224 \
  pretrained_model_path=/path/to/pretrained/model/checkpoint.pth \
  epochs=1 \
  split=test \
  num_items_per_shard=50 \
  num_frames=128 \
  sampling_rate=2 \
  target_fps=25 \
  max_batch_size=2 \
  launcher.cluster=slurm

After extracting the features, you should make sure they follow the nested folder structure expected by our SignFeaturesDataset below. The manifest .tsv files should have the following columns: ["video_name", "length", "label"], where video_name uniquely identifies a video in the dataset, length denotes the length of the features tensor, and label is the translation. You can obtain manifest files with the length column populated based on the extracted features via our provided get_feature_lengths.py script.

├── DATASET_NAME/
    ├── manifests/
        ├── train_features.tsv
        ├── val_features.tsv
        └── test_features.tsv
    └── features/
        ├── 0 (epoch)
            ├── 00000 (prefix)
                ├── 00000-abcdef.pt (video_name.pt)
                └── ...
            ├── 00001
                ├── 00001-cdefgh.pt
                └── ...
        └── ...  (if extracted more than 1 epoch)

Training

After you have successfully extracted the features, you can simply run the translation pipeline as follows.

For T5:

# sweep over 5 random seeds
python run_translation.py \
    --config-name slt_t5 \
    --multirun \
    run=64gpu \
    hydra.sweep.dir=sweep_t5 \
    'data.train_data_dirs="/path/to/features/how2sign,/path/to/features/youtube-asl"'  \
    data.val_data_dir=/path/to/features/how2sign \
    optim.train_batch_size=4 \
    optim.gradient_accumulation_steps=1 \
    common.eval_steps=500 \
    common.seed=1,2,3,4,5 \
    wandb.project=t5-sweep

For Tarrés et al. (2023):

# sweep over 5 random seeds
python run_translation.py \
    --config-name slt_tarres_fairseq \
    --multirun \
    run=1gpu \
    hydra.sweep.dir=sweep_tf \
    model.name_or_path=/path/to/folder/w/tarres-tokenizer \
    data.train_data_dirs=/path/to/features/how2sign \
    data.val_data_dir=/path/to/features/how2sign \
    common.seed=1,2,3,4,5 \
    wandb.project=tf-sweep \
    common.fp16=true

Adjust the number of GPUs and gradient accumulation steps based on your setup. These setups are based on fp32 training on 32GB V100 GPUs. You can also play with the max sequence length (data.max_source_positions) to reduce memory footprint. While we recommend common.fp16=true for BART training and the slt_tarres_fairseq configuration, fp16 does not work with T5. We have not tried bfloat16 training but encourage adding support for it.

For all additional training and evaluation parameters, refer to the dataclasses in run_translation.py.

Evaluation

You can run the translation pipeline in eval mode to only perform evaluation on the validation and test sets. To do this, simply pass the appropriate overrides, e.g.:

python run_translation.py \
    --config-name slt_t5 \
    hydra.sweep.dir=sweep_eval_t5 \
    'data.train_data_dirs="/path/to/features/how2sign,/path/to/features/youtube-asl"'  \
    data.val_data_dir=/path/to/features/how2sign \
    common.eval=true \
    common.load_model=/path/to/finetuned-t5/best_model.pth

Feature Extraction Demo

To help you get started with the feature extraction process, we've provided a Colab notebook that demonstrates the key steps. The notebook walks through the workflow, allowing you to interactively extract features from your data using our tools.

You can access the demo here: Feature Extraction Demo on Colab

Simply open the notebook, connect to a runtime, and follow the instructions. This example provides insights into how to utilize our feature extraction pipeline effectively.