-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'NVIDIA:master' into fix_nnunet_lowres_axis
- Loading branch information
Showing
856 changed files
with
129,882 additions
and
31,155 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
# Image Classification | ||
|
||
Image classification is the task of categorizing an image into one of several predefined classes, often also giving a probability of the input belonging to a certain class. This task is crucial in understanding and analyzing images, and it comes quite effortlessly to human beings with our complex visual systems. Most powerful image classification models today are built using some form of Convolution Neural Networks (CNNs), which are also the backbone of many other tasks in Computer Vision. | ||
|
||
![What is Image Classification?](../../PyTorch/Classification/img/1_image-classification-figure-1.PNG) | ||
|
||
[Source](https://github.com/NVlabs/stylegan) | ||
|
||
In this overview, we will cover | ||
- Types of image Classification | ||
- How does it work? | ||
- How is the performance evaluated? | ||
- Use cases and applications | ||
- Where to get started | ||
|
||
--- | ||
## Types of image Classification | ||
Image Classification can be broadly divided into either Binary or Multi-class problems depending on the number of categories. Binary image classification problems entail predicting one of two classes. An example of this would be to predict whether an image is that of a dog or not. A subtly different problem is that of single-class (one vs all) classification, where the goal is to recognize data from one class and reject all other. This is beneficial when there is an overabundance of data from one of the classes, also called a class imbalance. | ||
|
||
![Input and Outputs for Image Classification](../../PyTorch/Classification/img/1_image-classification-figure-2.PNG) | ||
|
||
In Multi-class classification problems, models categorize instances into one of three or more categories. Multi-class models often also return confidence scores (or probabilities) of an image belonging to each of the possible classes. This should not be confused with multi-label classification, where a model assigns multiple labels to an instance. | ||
|
||
--- | ||
## How is the performance evaluated? | ||
Image Classification performance is often reported as Top-1 or Top-5 scores. In top-1 score, classification is considered correct if the top predicted class (with the highest predicted probability) matches the true class for a given instance. In top-5, we check if one of the top 5 predictions matches the true class. The score is just the number of correct predictions divided by the total number of instances evaluated. | ||
|
||
--- | ||
## Use cases and applications | ||
### Categorizing Images in Large Visual Databases | ||
Businesses with visual databases may accumulate large amounts of images with missing tags or meta-data. Unless there is an effective way to organize such images, they may not be much use at all. On the contrary, they may hog precious storage space. Automated image classification algorithms can classify such untagged images into predefined categories. Businesses can avoid expensive manual labor by employing automated image classification algorithms. | ||
|
||
A related task is that of Image Organization in smart devices like mobile phones. With Image Classification techniques, images and videos can be organized for improved accessibility. | ||
|
||
### Visual Search | ||
Visual Search or Image-based search has risen to popularity over the recent years. Many prominent search engines already provide this feature where users can search for visual content similar to a provided image. This has many applications in the e-commerce and retail industry where users can take a snap and upload an image of a product they are interested in purchasing. This makes the shopping experience much more efficient for customers, and can increase sales for businesses. | ||
|
||
|
||
### Healthcare | ||
Medical Imaging is about creating visual images of internal body parts for clinical purposes. This includes health monitoring, medical diagnosis, treatment, and keeping organized records. Image Classification algorithms can play a crucial role in Medical Imaging by assisting medical professionals detect presence of illness and having consistency in clinical diagnosis. | ||
|
||
--- | ||
## Getting started | ||
NVIDIA provides examples for JAX models on [Rosetta](https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects). These examples provide you with easy to consume and highly optimized scripts for both training and inferencing. The quick start guide at our GitHub repository will help you in setting up the environment using NGC Docker Images, download pre-trained models from NGC and adapt the model training and inference for your application/use-case. | ||
|
||
These models are tested and maintained by NVIDIA, leveraging mixed precision using tensor cores on our latest GPUs for faster training times while maintaining accuracy. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# ViT on GPUs | ||
Please refer to [Rosetta ViT](https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/vit), NVIDIA's project that enables seamless training of LLMs, CV models and multimodal models in JAX, for information about running Vision Transformer models and experiments on GPUs. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
Paxml (aka Pax) is a framework for training LLMs. It allows for advanced and configurable experimentation and parallelization. It is based on [JAX](https://github.com/google/jax) and [Praxis](https://github.com/google/praxis). | ||
|
||
# PAXML on GPUs | ||
Please refer to [Rosetta PAXML](https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/pax), NVIDIA's project that enables seamless training of LLMs, CV models and multimodal models in JAX, for information about running models and experiments on GPUs in PAXML. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
# Language Modeling | ||
|
||
|
||
Language modeling (LM) is a natural language processing (NLP) task that determines the probability of a given sequence of words occurring in a sentence. | ||
|
||
In an era where computers, smartphones and other electronic devices increasingly need to interact with humans, language modeling has become an indispensable technique for teaching devices how to communicate in natural languages in human-like ways. | ||
|
||
But how does language modeling work? And what can you build with it? What are the different approaches, what are its potential benefits and limitations, and how might you use it in your business? | ||
|
||
In this guide, you’ll find answers to all of those questions and more. Whether you’re an experienced machine learning engineer considering implementation, a developer wanting to learn more, or a product manager looking to explore what’s possible with natural language processing and language modeling, this guide is for you. | ||
|
||
Here’s a look at what we’ll cover: | ||
|
||
- Language modeling – the basics | ||
- How does language modeling work? | ||
- Use cases and applications | ||
- Getting started | ||
|
||
|
||
## Language modeling – the basics | ||
|
||
### What is language modeling? | ||
|
||
"*Language modeling is the task of assigning a probability to sentences in a language. […] | ||
Besides assigning a probability to each sequence of words, the language models also assign a | ||
probability for the likelihood of a given word (or a sequence of words) to follow a sequence | ||
of words.*" Source: Page 105, [Neural Network Methods in Natural Language Processing](http://amzn.to/2wt1nzv), 2017. | ||
|
||
|
||
### Types of language models | ||
|
||
There are primarily two types of Language Models: | ||
|
||
- Statistical Language Models: These models use traditional statistical techniques like N-grams, Hidden Markov Models (HMM), and certain linguistic rules to learn the probability distribution of words. | ||
- Neural Language Models: They use different kinds of Neural Networks to model language, and have surpassed the statistical language models in their effectiveness. | ||
|
||
"*We provide ample empirical evidence to suggest that connectionist language models are | ||
superior to standard n-gram techniques, except their high computational (training) | ||
complexity.*" Source: [Recurrent neural network based language model](http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf), 2010. | ||
|
||
Given the superior performance of neural language models, we include in the container two popular state-of-the-art neural language models: BERT and Transformer-XL. | ||
|
||
### Why is language modeling important? | ||
|
||
Language modeling is fundamental in modern NLP applications. It enables machines to understand qualitative information, and enables people to communicate with machines in the natural languages that humans use to communicate with each other. | ||
|
||
Language modeling is used directly in a variety of industries, including tech, finance, healthcare, transportation, legal, military, government, and more -- actually, you probably have just interacted with a language model today, whether it be through Google search, engaging with a voice assistant, or using text autocomplete features. | ||
|
||
|
||
## How does language modeling work? | ||
|
||
The roots of modern language modeling can be traced back to 1948, when Claude Shannon | ||
published a paper titled "A Mathematical Theory of Communication", laying the foundation for information theory and language modeling. In the paper, Shannon detailed the use of a stochastic model called the Markov chain to create a statistical model for the sequences of letters in English text. The Markov models, along with n-gram, are still among the most popular statistical language models today. | ||
|
||
However, simple statistical language models have serious drawbacks in scalability and fluency because of its sparse representation of language. Overcoming the problem by representing language units (eg. words, characters) as a non-linear, distributed combination of weights in continuous space, neural language models can learn to approximate words without being misled by rare or unknown values. | ||
|
||
Therefore, as mentioned above, we introduce two popular state-of-the-art neural language models, BERT and Transformer-XL, in Tensorflow and PyTorch. More details can be found in the [NVIDIA Deep Learning Examples Github Repository ](https://github.com/NVIDIA/DeepLearningExamples) | ||
|
||
|
||
## Use cases and applications | ||
|
||
### Speech Recognition | ||
|
||
Imagine speaking a phrase to the phone, expecting it to convert the speech to text. How does | ||
it know if you said "recognize speech" or "wreck a nice beach"? Language models help figure it out | ||
based on the context, enabling machines to process and make sense of speech audio. | ||
|
||
|
||
### Spelling Correction | ||
|
||
Language-models-enabled spellcheckers can point to spelling errors and possibly suggest alternatives. | ||
|
||
|
||
### Machine translation | ||
|
||
Imagine you are translating the Chinese sentence "我在开车" into English. Your translation system gives you several choices: | ||
|
||
- I at open car | ||
- me at open car | ||
- I at drive | ||
- me at drive | ||
- I am driving | ||
- me am driving | ||
|
||
A language model tells you which translation sounds the most natural. | ||
|
||
## Getting started | ||
NVIDIA provides examples for JAX models on [Rosetta](https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects). These examples provide you with easy to consume and highly optimized scripts for both training and inferencing. The quick start guide at our GitHub repository will help you in setting up the environment using NGC Docker Images, download pre-trained models from NGC and adapt the model training and inference for your application/use-case. | ||
|
||
These models are tested and maintained by NVIDIA, leveraging mixed precision using tensor cores on our latest GPUs for faster training times while maintaining accuracy. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
T5X is a framework for training, evaluation, and inference of sequence models (starting with language). It is based on [JAX](https://github.com/google/jax) and [Flax](https://github.com/google/flax). To learn more, see the [T5X Paper](https://arxiv.org/abs/2203.17189). | ||
|
||
# T5X on GPUs | ||
|
||
Please refer to [Rosetta T5X](https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/t5x), NVIDIA's project that enables seamless training of LLMs, CV models and multimodal models in JAX, for information about running models and experiments on GPUs in T5X. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# Imagen on GPUs | ||
Please refer to [Rosetta Imagen](https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/imagen), NVIDIA's project that enables seamless training of LLMs, CV models and multimodal models in JAX, for information about running Imagen models and experiments on GPUs. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.