Skip to content

Commit

Permalink
Update README and add model directory site (#12)
Browse files Browse the repository at this point in the history
* Update README and add llm directory site source

* Add model directory source

* Create static.yml (#13)
  • Loading branch information
Quentin-Anthony authored Dec 31, 2023
1 parent d482435 commit 26ec83c
Show file tree
Hide file tree
Showing 4 changed files with 1,943 additions and 4 deletions.
43 changes: 43 additions & 0 deletions .github/workflows/static.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Simple workflow for deploying static content to GitHub Pages
name: Deploy static content to Pages

on:
# Runs on pushes targeting the default branch
push:
branches: ["main"]

# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
permissions:
contents: read
pages: write
id-token: write

# Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued.
# However, do NOT cancel in-progress runs as we want to allow these production deployments to complete.
concurrency:
group: "pages"
cancel-in-progress: false

jobs:
# Single deploy job since we're just deploying
deploy:
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Pages
uses: actions/configure-pages@v4
- name: Upload artifact
uses: actions/upload-pages-artifact@v3
with:
# Upload entire repository
path: './model-directory'
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v4
25 changes: 21 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,21 @@ All the practical details and utilities that go into working with real models! I
## Table of Contents

- [The Cookbook](#the-cookbook)
* [Calculations](#calculations)
* [Benchmarks](#benchmarks)
* [Utilities](#utilites)
+ [Calculations](#calculations)
+ [Benchmarks](#benchmarks)
* [Reading List](#reading-list)
+ [Basics](#basics)
+ [How to do LLM Calculations](#how-to-do-llm-calculations)
+ [Distributed Deep Learning](#distributed-deep-learning)
+ [Best Practices](#best-practices)
+ [Data/Model Directories](#data-and-model-directories)
* [Minimal Repositories for Educational Purposes](#minimal-repositories-for-educational-purposes)
* [Contributing](#contributing)

## Calculations
## Utilities

### Calculations

For training/inference calculations (e.g. FLOPs, memory overhead, and parameter count)
- **[calc](./calc/)**
Expand All @@ -27,7 +31,7 @@ Useful external calculators include

[Transformer Training and Inference VRAM Estimator](https://vram.asmirnov.xyz/) by Alexander Smirnov. A user-friendly tool to estimate VRAM overhead.

## Benchmarks
### Benchmarks

For benchmarks (e.g. communication)
- **[benchmarks](./benchmarks/)**
Expand Down Expand Up @@ -58,12 +62,25 @@ For benchmarks (e.g. communication)

[Efficient Training on Multiple GPUs](https://huggingface.co/docs/transformers/main/en/perf_train_gpu_many) by Hugging Face. Contains a detailed walk-through of model, tensor, and data parallelism along with the ZeRO optimizer.

Papers
- [Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM](https://arxiv.org/abs/2104.04473)
- [Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis](https://arxiv.org/abs/1802.09941)
- [ZeRO: Memory Optimizations Toward Training Trillion Parameter Models](https://arxiv.org/abs/1910.02054)
- [PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel](https://arxiv.org/abs/2304.11277)
- [PyTorch Distributed: Experiences on Accelerating Data Parallel Training](https://arxiv.org/abs/2006.15704)

### Best Practices

[ML-Engineering Repository](https://github.com/stas00/ml-engineering). Containing community notes and practical details of everything deep learning training led by Stas Bekman

[Common HParam Settings](https://docs.google.com/spreadsheets/d/14vbBbuRMEHoqeuMHkTfw3uiZVmyXNuoSp8s-aHvfvZk/edit?usp=sharing) by Stella Biderman. Records common settings for model training hyperparameters and her current recommendations for training new models.

### Data and Model Directories

[Directory of LLMs](https://docs.google.com/spreadsheets/d/1gc6yse74XCwBx028HV_cvdxwXkmXejVjkO-Mz2uwE0k/edit?usp=sharing) by Stella Biderman. Records details of trained LLMs including license, architecture type, and dataset.

[Data Provenance Explorer](https://dataprovenance.org/) A tool for tracing and filtering on data provenance for the most popular open source finetuning data collections.

## Minimal Repositories for Educational Purposes

Large language models are frequently trained using very complex codebases due to the need to optimize things to work at scale and support a wide variety of configurable options. This can make them less useful pedagogical tools, so some people have developed striped-down so-called "Minimal Implementations" that are sufficient for smaller scale work and more pedagogically useful.
Expand Down
Loading

0 comments on commit 26ec83c

Please sign in to comment.