Skip to content

Commit

Permalink
Fix reference spacing
Browse files Browse the repository at this point in the history
  • Loading branch information
profvjreddi committed Dec 12, 2023
1 parent b93ea02 commit baf0888
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions contents/benchmarking/benchmarking.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -563,7 +563,7 @@ The [Common Objects in Context (COCO) dataset][https://cocodataset.org/](@lin201

#### GPT-3 (2020)

While the above examples primarily focus on image datasets, there have been significant developments in text datasets as well. One notable example is GPT-3[@brown2020language], developed by OpenAI. GPT-3 is a language model trained on a diverse range of internet text. Although the dataset used to train GPT-3 is not publicly available, the model itself, consisting of 175 billion parameters, is a testament to the scale and complexity of modern machine learning datasets and models.
While the above examples primarily focus on image datasets, there have been significant developments in text datasets as well. One notable example is GPT-3 [@brown2020language], developed by OpenAI. GPT-3 is a language model trained on a diverse range of internet text. Although the dataset used to train GPT-3 is not publicly available, the model itself, consisting of 175 billion parameters, is a testament to the scale and complexity of modern machine learning datasets and models.

#### Present and Future

Expand Down Expand Up @@ -667,7 +667,7 @@ As machine learning models become more sophisticated, so do the benchmarks requi

**Out-of-Distribution Generalization**: Testing how well models perform on data that is different from the original training distribution. This evaluates the model's ability to generalize to new, unseen data. Example benchmarks are Wilds [@koh2021wilds], RxRx, and ANC-Bench.

**Adversarial Robustness:** Evaluating model performance under adversarial attacks or perturbations to the input data. This tests the model's robustness. Example benchmarks are ImageNet-A[@hendrycks2021natural], ImageNet-C[@xie2020adversarial], and CIFAR-10.1.
**Adversarial Robustness:** Evaluating model performance under adversarial attacks or perturbations to the input data. This tests the model's robustness. Example benchmarks are ImageNet-A [@hendrycks2021natural], ImageNet-C [@xie2020adversarial], and CIFAR-10.1.

**Real-World Performance:** Testing models on real-world datasets that closely match end tasks, rather than just canned benchmark datasets. Examples are medical imaging datasets for healthcare tasks or actual customer support chat logs for dialogue systems.

Expand Down

0 comments on commit baf0888

Please sign in to comment.