Adding GPU automatic mixed precision training #200

vinhngx · 2019-08-01T04:58:15Z

Automatic Mixed Precision training on GPU for Tensorflow has been recently introduced:

https://medium.com/tensorflow/automatic-mixed-precision-in-tensorflow-for-faster-ai-training-on-nvidia-gpus-6033234b2540

Automatic mixed precision training makes use of both FP32 and FP16 precisions where appropriate. FP16 operations can leverage the Tensor cores on NVIDIA GPUs (Volta, Turing or newer architectures) for much improved throughput.

This PR adds GPU automatic mixed precision training to xlnet classification fine-tuning task via setting the flag --gpu_auto_mixed_precision=True.

Here's a Colab notebook demonstrating the usage of automatic mixed precision on the xlnet classifier fine-tuning task:
https://colab.research.google.com/drive/1dPX1L8MAHvLxBAcwgRLHmh9TdIB3bQjx

On a V100 GPU, this results in about 50% increased throughput.

How mixed precision works

Mixed precision is the use of both float16 and float32 data types when training a model.

Performing arithmetic operations in float16 takes advantage of the performance gains of using specialized processing units such as the Tensor cores on NVIDIA GPUs. Due to the smaller representable range of float16, performing the entire training with float16 data type can result in underflow of the gradients, leading to convergence or model quality issues.

However, performing only select arithmetic operations in float16 results in performance gains when using compatible hardware accelerators, decreasing training time and reducing memory usage, typically without sacrificing model performance.

To learn more about mixed precision and how it works:

Overview of Automatic Mixed Precision for Deep Learning
NVIDIA Mixed Precision Training Documentation
NVIDIA Deep Learning Performance Guide

vinhngx · 2019-08-27T15:19:43Z

Would you mind reviewing this PR @zihangdai?

LifeIsStrange · 2020-06-10T20:44:51Z

@vinhngx I find it deeply sad that the number one state of the art transformer model is in state of abandonware.

How many downstream papers did not benefit from FP16 (and other community improvements) because of this?

Do you know any XLnet reimplementation that offers mixed precision support?
Maybe the Hugging Face one? Or the tensorflow one?

Anyway my point on abandonware is more general and impactful than that.
I see that you work at Nvidia. It is in the interest of Nvidia that enterprises uses in production SOTA systems (as they use CUDA). But a lot of enterprises will not use machine learning where they could because many tasks requires decent accuracy (meaning that those companies uss cases require to use in production the state of the art).

Nowadays the state of the art is easily discoverable on Nlp progress or on paperswithcode.com
BUT soo many state of the art codes are in an abandonware, unmaintained state!

If Nvidia could in a friendly manner, inject ~0.1% of, its workforce on AI, to fork and actually maintain SOTA engeenering solutions to ML problems, a lot of enterprise could easily and with confidence begin to use AI (and therefore Nvidia GPUs) for tasks where it is already underused or for new tasks.

The number of likes of https://github.com/huggingface/transformers just reflect how important of a need this is. But the library only address general pre training.

As an example, I tried to use state of the art coreference resolution for my semantic parser:
https://github.com/sebastianruder/NLP-progress/blob/master/english/coreference_resolution.md
I tried ALL OF THEM, and they are all abandonware, their dependencies are broken and their API / usage cryptic. And they might have a number of low hanging fruit improvements that would be really nice to have.
I tried all of them and couldn't make any of them work! The state of NLP/NLU is a joke and proof how much is is not industry/production ready.

Nvidia must understand that it is in its best interest to maintain the most important state of the art libs. By doing so its reputation would shine and AI usage in enterprises would too, and as such GPU sales too.

In an ideal world Nvidia would build a framework {spacy / corenlp / stanza} - like but with the big difference vs others of being always (mostly) up to date with the state of the art and by reusing the same API and benefiting from SOTAs upgrades without breaking changes for downstream users (like huggingface transformers generic API achieve).

But not need of this ideal world, if Nvidia could just maintain the gist of the SOTA (in N libs and not on a unified framework) that would still be the breakthrough of the decade in NLP enterprise friendliness.
It would be a game changer for humanity.

What do you think about this @vinhngx ?
I would like you to share this idea and this need to other Nvidia AI engineers :)

@zihangdai

adding GPU automatic mixed precision training

558493f

LifeIsStrange mentioned this pull request Aug 20, 2019

how to train a bert model with distributed training ? NVIDIA/DeepLearningExamples#39

Closed

remove os flag

4b43d18

vinhngx requested review from zihangdai and kimiyoung February 7, 2020 02:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding GPU automatic mixed precision training #200

Adding GPU automatic mixed precision training #200

vinhngx commented Aug 1, 2019 •

edited

Loading

vinhngx commented Aug 27, 2019 •

edited

Loading

LifeIsStrange commented Jun 10, 2020

Adding GPU automatic mixed precision training #200

Are you sure you want to change the base?

Adding GPU automatic mixed precision training #200

Conversation

vinhngx commented Aug 1, 2019 • edited Loading

vinhngx commented Aug 27, 2019 • edited Loading

LifeIsStrange commented Jun 10, 2020

vinhngx commented Aug 1, 2019 •

edited

Loading

vinhngx commented Aug 27, 2019 •

edited

Loading