T5 experiments

The T5 experiments are based off the excellent nanoT5 repository, see LICENSE.

Environment setup

Following nanoT5's setup:

conda create -n ntng_t5 python=3.8
conda activate ntng_t5
pip install -r requirements.txt

The following commands result in the following pip freeze as of 24.07.2023. We also include our lscpu and nvidia-smi.

Commands for each experiment

By default the experiments are run for 24 hours. For more details check the default config with all hyperparameters here. We include the RST measurements here.

Baseline

    python -m t5.train stacking.typ=none

Stacking

    python -m t5.train stacking.typ=stack

Layer Dropping

    python -m t5.train stacking.typ=drop optim.base_lr=1e-2 stacking.gamma_factor=20

Sophia

    python -m t5.train stacking.typ=none optim.name=sophia optim.rho=1e-2 optim.base_lr=1e-3 sophia_freq=10

Lion

    python -m t5.train stacking.typ=none optim.name=lion optim.base_lr=7.5e-4

Fine-Tuning

We fine-tune the models in the original nanoT5 repository using the following command:


    python -m nanoT5.main task=ft google/t5-v1_1-base model.random_init=false model.checkpoint_path="/path/to/pytorch_model.bin

All our models do not modify the original T5 architecture, so all checkpoints trained in this repository are compabible with the original nanoT5 repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

T5 experiments

Environment setup

Commands for each experiment

Baseline

Stacking

Layer Dropping

Sophia

Lion

Fine-Tuning

Files

README.md

Latest commit

History

README.md

File metadata and controls

T5 experiments

Environment setup

Commands for each experiment

Baseline

Stacking

Layer Dropping

Sophia

Lion

Fine-Tuning