Skip to content

Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)

Notifications You must be signed in to change notification settings

JeanKaddour/NoTrainNoGain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

No Train No Gain

Code for the paper "No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models"; Jean Kaddour, Oscar Key, Piotr Nawrot, Pasquale Minervini, Matt J. Kusner .

Running the code

See the README for the:

Citation and license

We use two excellent open source codebases to implement our experiments:

  • The BERT experiments are forked of Cramming
  • The T5 experiments are forked of NanoT5

If you find this repository useful, please consider citing both our work and these original codebases.

To cite our work, we suggest the following BibTeX:

@misc{kaddourNoTrainNo2023,
	title = {No {Train} {No} {Gain}: {Revisiting} {Efficient} {Training} {Algorithms} {For} {Transformer}-based {Language} {Models}},
	url = {http://arxiv.org/abs/2307.06440},
	doi = {10.48550/arXiv.2307.06440},
	urldate = {2023-07-17},
	publisher = {arXiv},
	author = {Kaddour, Jean and Key, Oscar and Nawrot, Piotr and Minervini, Pasquale and Kusner, Matt J.},
	month = jul,
	year = {2023},
	note = {arXiv:2307.06440 [cs]},
}

We provide separate licenses for the BERT experiments and the T5 experiments.

Contact

Feel free to open an issue, or email us, with any questions.

About

Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)

Resources

Stars

Watchers

Forks

Languages