From 8551d634a8daac3572ab7e9e9c60fa2ec2afcb8c Mon Sep 17 00:00:00 2001 From: Stella Biderman Date: Sun, 31 Dec 2023 16:57:20 -0500 Subject: [PATCH] Add HParam spreadsheet (#9) * Add HParam spreadsheet * Update README.md * Update README.md * Update README.md --- README.md | 64 ++++++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 47 insertions(+), 17 deletions(-) diff --git a/README.md b/README.md index a446178..e42dc0a 100644 --- a/README.md +++ b/README.md @@ -1,50 +1,76 @@ # The Cookbook -Deep learning for dummies, by Quentin Anthony et al. - -All the practical details that go into working with real models! +Deep learning for dummies, by Quentin Anthony, Hailey Schoelkopf, and Stella Biderman +All the practical details that go into working with real models! If you're just getting started, we recommend jumping ahead to [Basics](#basics) for some introductory resources on transformers. ## Table of Contents -### Calculations +- [The Cookbook](#the-cookbook) + * [Calculations](#calculations) + * [Benchmarks](#benchmarks) + * [Reading List](#reading-list) + + [Basics](#basics) + + [How to do LLM Calculations](#how-to-do-llm-calculations) + + [Distributed Deep Learning](#distributed-deep-learning) + + [Best Practices](#best-practices) + * [Minimal Repositories for Educational Purposes](#minimal-repositories-for-educational-purposes) + * [Contributing](#contributing) + +## Calculations For training/inference calculations (e.g. FLOPs, memory overhead, and parameter count) - **[calc](./calc/)** -### Benchmarks +Useful external calculators include + +[Cerebras Model Lab](https://www.cerebras.net/model-lab/). User-friendly tool to apply Chinchilla scaling laws. + +[Transformer Training and Inference VRAM Estimator](https://vram.asmirnov.xyz/) by Alexander Smirnov. A user-friendly tool to estimate VRAM overhead. + +## Benchmarks For benchmarks (e.g. communication) - **[benchmarks](./benchmarks/)** +## Reading List -## Reading List and Similar Resources - -[Transformers Math 101](https://blog.eleuther.ai/transformer-math/). A blog post from EleutherAI on training/inference memory estimations, parallelism, FLOP calculations, and deep learning datatypes +### Basics [LLM Visualizations](https://bbycroft.net/llm). Clear LLM visualizations and animations for basic transformer understanding. -[Transformer Inference Arithmetic](https://kipp.ly/transformer-inference-arithmetic/). A breakdown on the memory overhead, FLOPs, and latency of transformer inference +[Annotated PyTorch Paper Implementations](https://nn.labml.ai/) -[ML-Engineering Repository](https://github.com/stas00/ml-engineering). Containing community notes and practical details of everything deep learning training led by Stas Bekman +[Jay Alammar's blog](https://jalammar.github.io/blog) contains many blog posts pitched to be accessible to a wide range of backgrounds. We recommend his posts [the Illustrated Transformer](https://jalammar.github.io/illustrated-transformer/), and [the Illustrated GPT-2](https://jalammar.github.io/illustrated-gpt2/) in particular. + +[The Annotated Transformer](https://nlp.seas.harvard.edu/annotated-transformer/) by Sasha Rush, Austin Huang, Suraj Subramanian, Jonathan Sum, Khalid Almubarak, and Stella Biderman. A walk through of the seminal paper "Attention is All You Need" along with in-line implementations in PyTorch. + +### How to do LLM Calculations + +[Transformers Math 101](https://blog.eleuther.ai/transformer-math/). A blog post from EleutherAI on training/inference memory estimations, parallelism, FLOP calculations, and deep learning datatypes. + +[Transformer Inference Arithmetic](https://kipp.ly/transformer-inference-arithmetic/). A breakdown on the memory overhead, FLOPs, and latency of transformer inference [LLM Finetuning Memory Requirements](https://blog.scottlogic.com/2023/11/24/llm-mem.html) by Alex Birch. A practical guide on the memory overhead of finetuning models. -[Annotated PyTorch Paper Implementations](https://nn.labml.ai/) +### Distributed Deep Learning [Everything about Distributed Training and Efficient Finetuning](https://sumanthrh.com/post/distributed-and-efficient-finetuning/) by Sumanth R Hegde. High-level descriptions and links on parallelism and efficient finetuning. -[Transformer Training and Inference VRAM Estimator](https://vram.asmirnov.xyz/) by Alexander Smirnov. User-friendly tool to estimate VRAM overhead. +[Efficient Training on Multiple GPUs](https://huggingface.co/docs/transformers/main/en/perf_train_gpu_many) by Hugging Face. Contains a detailed walk-through of model, tensor, and data parallelism along with the ZeRO optimizer. -[Cerebras Model Lab](https://www.cerebras.net/model-lab/). User-friendly tool to apply Chinchilla scaling laws. +### Best Practices + +[ML-Engineering Repository](https://github.com/stas00/ml-engineering). Containing community notes and practical details of everything deep learning training led by Stas Bekman + +[Common HParam Settings](https://docs.google.com/spreadsheets/d/14vbBbuRMEHoqeuMHkTfw3uiZVmyXNuoSp8s-aHvfvZk/edit?usp=sharing) by Stella Biderman. Records common settings for model training hyperparameters and her current recommendations for training new models. + +## Minimal Repositories for Educational Purposes -## Minimal Repositories for Understanding +Large language models are frequently trained using very complex codebases due to the need to optimize things to work at scale and support a wide variety of configurable options. This can make them less useful pedagogical tools, so some people have developed striped-down so-called "Minimal Implementations" that are sufficient for smaller scale work and more pedagogically useful. GPT Inference - https://github.com/pytorch-labs/gpt-fast/tree/main -[RWKV](https://www.rwkv.com/) -- https://github.com/Hannibal046/nanoRWKV/tree/main - GPT Training - https://github.com/karpathy/minGPT @@ -53,6 +79,10 @@ Architecture-Specific Examples - https://github.com/zphang/minimal-llama - https://github.com/zphang/minimal-opt +[RWKV](https://www.rwkv.com/) +- https://github.com/Hannibal046/nanoRWKV/tree/main + + ## Contributing If you found a bug, typo, or would like to propose an improvement please don't hesitate to open an [Issue](https://github.com/EleutherAI/cookbook/issues) or contribute a [PR](https://github.com/EleutherAI/cookbook/pulls).