diff --git a/.github/workflows/static.yml b/.github/workflows/static.yml new file mode 100644 index 0000000..059326d --- /dev/null +++ b/.github/workflows/static.yml @@ -0,0 +1,43 @@ +# Simple workflow for deploying static content to GitHub Pages +name: Deploy static content to Pages + +on: + # Runs on pushes targeting the default branch + push: + branches: ["main"] + + # Allows you to run this workflow manually from the Actions tab + workflow_dispatch: + +# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages +permissions: + contents: read + pages: write + id-token: write + +# Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued. +# However, do NOT cancel in-progress runs as we want to allow these production deployments to complete. +concurrency: + group: "pages" + cancel-in-progress: false + +jobs: + # Single deploy job since we're just deploying + deploy: + environment: + name: github-pages + url: ${{ steps.deployment.outputs.page_url }} + runs-on: ubuntu-latest + steps: + - name: Checkout + uses: actions/checkout@v4 + - name: Setup Pages + uses: actions/configure-pages@v4 + - name: Upload artifact + uses: actions/upload-pages-artifact@v3 + with: + # Upload entire repository + path: './model-directory' + - name: Deploy to GitHub Pages + id: deployment + uses: actions/deploy-pages@v4 diff --git a/README.md b/README.md index 935e048..a4e1b84 100644 --- a/README.md +++ b/README.md @@ -6,17 +6,21 @@ All the practical details and utilities that go into working with real models! I ## Table of Contents - [The Cookbook](#the-cookbook) - * [Calculations](#calculations) - * [Benchmarks](#benchmarks) + * [Utilities](#utilites) + + [Calculations](#calculations) + + [Benchmarks](#benchmarks) * [Reading List](#reading-list) + [Basics](#basics) + [How to do LLM Calculations](#how-to-do-llm-calculations) + [Distributed Deep Learning](#distributed-deep-learning) + [Best Practices](#best-practices) + + [Data/Model Directories](#data-and-model-directories) * [Minimal Repositories for Educational Purposes](#minimal-repositories-for-educational-purposes) * [Contributing](#contributing) -## Calculations +## Utilities + +### Calculations For training/inference calculations (e.g. FLOPs, memory overhead, and parameter count) - **[calc](./calc/)** @@ -27,7 +31,7 @@ Useful external calculators include [Transformer Training and Inference VRAM Estimator](https://vram.asmirnov.xyz/) by Alexander Smirnov. A user-friendly tool to estimate VRAM overhead. -## Benchmarks +### Benchmarks For benchmarks (e.g. communication) - **[benchmarks](./benchmarks/)** @@ -58,12 +62,25 @@ For benchmarks (e.g. communication) [Efficient Training on Multiple GPUs](https://huggingface.co/docs/transformers/main/en/perf_train_gpu_many) by Hugging Face. Contains a detailed walk-through of model, tensor, and data parallelism along with the ZeRO optimizer. +Papers +- [Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM](https://arxiv.org/abs/2104.04473) +- [Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis](https://arxiv.org/abs/1802.09941) +- [ZeRO: Memory Optimizations Toward Training Trillion Parameter Models](https://arxiv.org/abs/1910.02054) +- [PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel](https://arxiv.org/abs/2304.11277) +- [PyTorch Distributed: Experiences on Accelerating Data Parallel Training](https://arxiv.org/abs/2006.15704) + ### Best Practices [ML-Engineering Repository](https://github.com/stas00/ml-engineering). Containing community notes and practical details of everything deep learning training led by Stas Bekman [Common HParam Settings](https://docs.google.com/spreadsheets/d/14vbBbuRMEHoqeuMHkTfw3uiZVmyXNuoSp8s-aHvfvZk/edit?usp=sharing) by Stella Biderman. Records common settings for model training hyperparameters and her current recommendations for training new models. +### Data and Model Directories + +[Directory of LLMs](https://docs.google.com/spreadsheets/d/1gc6yse74XCwBx028HV_cvdxwXkmXejVjkO-Mz2uwE0k/edit?usp=sharing) by Stella Biderman. Records details of trained LLMs including license, architecture type, and dataset. + +[Data Provenance Explorer](https://dataprovenance.org/) A tool for tracing and filtering on data provenance for the most popular open source finetuning data collections. + ## Minimal Repositories for Educational Purposes Large language models are frequently trained using very complex codebases due to the need to optimize things to work at scale and support a wide variety of configurable options. This can make them less useful pedagogical tools, so some people have developed striped-down so-called "Minimal Implementations" that are sufficient for smaller scale work and more pedagogically useful. diff --git a/model-directory/index.html b/model-directory/index.html new file mode 100644 index 0000000..8ffc96f --- /dev/null +++ b/model-directory/index.html @@ -0,0 +1,1878 @@ + +
Model | +Date | +Parameters | +Organizaton | +Organization Type | +Author Location | +Language | +Model Accessibility | +Data Accessibility | +Architecture | +Sources | +
---|---|---|---|---|---|---|---|---|---|---|
GPT-2 | +February 14, 2019 | +1.5B | +OpenAI | +Company | +USA | +English | +Open (MIT) | +Externally Replicated | +Decoder-only | ++ |
Megatron-BERT | +September 17, 2019 | +3.9B | +NVIDIA | +Company | +USA | +English | +Closed | +Closed | +Encoder-only | +Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism | +
Megatron-LM | +September 17, 2019 | +8.3B | +NVIDIA | +Company | +USA | +English | +Closed | +Closed | +Decoder-only | +Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism | +
T5 | +October 23, 2019 | +11.0B | +Company | +USA | +English | +Open (Apache 2.0) | +Open (C4) | +Encoder-Decoder | ++ | |
Meena | +January 27, 2020 | +2.6B | +Company | +USA | +English | +Closed | +Closed | +Encoder-Decoder (Evolved) | ++ | |
Turing NLG | +February 13, 2020 | +17.2B | +Microsoft | +Company | +USA | +English | +Closed | +Closed | +Decoder-only | +Turing-NLG: A 17-billion-parameter language model by Microsoft | +
GPT-3 | +May 28, 2020 | +175.0B | +OpenAI | +Company | +USA | +English | +API | +Closed | +Decoder-only | ++ |
DeBERTa | +June 5, 2020 | +1.5B | +Microsoft | +Company | +USA | +English | +Open (MIT) | +Closed + OWT + BC | +Encoder-only | +DeBERTa: Decoding-enhanced BERT with Disentangled Attention | +
mT5 | +October 22, 2020 | +13.0B | +Company | +USA | +Multilingual | +Open (Apache 2.0) | +Open (mC4) | +Encoder-Decoder | +mT5: A massively multilingual pre-trained text-to-text transformer | +|
AlphaFold2 | +November 30, 2020 | +93.0M | +DeepMind | +Company | +USA | +Protein Structure | +Open (Apache 2.0) | +Open | +Custom | ++ |
CPM | +December 1, 2020 | +2.6B | +Tsinghua University | +Non-profit | +China | +Chinese | +Non-Commercial | +Closed | +Decoder-only | +CPM: A Large-scale Generative Chinese Pre-trained Language Model | +
AraGPT2 | +December 31, 2020 | +1.5B | +American University of Beirut | +Non-profit | +Lebanon | +Arabic | +Open (Custom) | +Partially Released (OSCAR) | +Decoder-only | +AraGPT2: Pre-Trained Transformer for Arabic Language Generation, Checkpoint | +
Switch-C | +January 11, 2021 | +1.5T | +Company | +USA | +English | +Open (Apache 2.0) | +Open (C4) | +Encoder-Decoder MoE | +Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity | +|
Wu Dao 1.0 | +March 20, 2021 | +2.6B | +BAAI | +Company | +China | +Chinese, English | +API | +Closed | +Decoder-only MoE | ++ |
GPT-Neo | +March 22, 2021 | +2.7B | +EleutherAI | +Non-profit | +Germany, Canada, USA | +English | +Open (MIT) | +Open (Pile) | +Decoder-only | ++ |
PanGu-α | +April 26, 2021 | +200.0B | +Huawei | +Company | +China | +Chinese | +Closed | +Closed | +Decoder-only | +PanGu-$α$: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation | +
ByT5 | +May 28, 2021 | +13.0B | +Company | +USA | +Multilingual | +Open (Apache 2.0) | +Open (C4) | +Encoder-Decoder | +ByT5: Towards a token-free future with pre-trained byte-to-byte models | +|
Wu Dao 2.0 | +June 1, 2021 | +1.8T | +BAAI | +Company | +China | +Chinese, English | +API | +Closed + Pile | +Decoder-only MoE | +Chinese AI lab challenges Google, OpenAI with a model of 1.75 trillion parameters - PingWest | +
GPT-J | +June 8, 2021 | +6.0B | +EleutherAI | +Non-profit | +Australia, USA | +English | +Open (Apache 2.0) | +Open (Pile) | +Decoder-only | ++ |
CPM-2.1 | +June 20, 2021 | +11.0B | +Tsinghua University | +Non-profit | +China | +Chinese | +Non-Commercial | +Closed | +Decoder-only | ++ |
Codex | +August 10, 2021 | +Undisclosed | +OpenAI | +Company | +USA | +English | +API | +Closed | +Decoder-only | ++ |
Jurassic-1 | +August 11, 2021 | +178.0B | +AI21 Labs | +Company | +Israel | +English | +API | +Open (Pile) | +Decoder-only | +White paper, data discovered via adversarial attacks and confirmed via personal communication | +
Wu Dao-GLM-XXL | +August 24, 2021 | +10.0B | +BAAI | +Company | +China | +Chinese | +Closed | +Closed | +Encoder-Decoder (Autoregressive) | ++ |
Wu Dao-GLM-XXL | +August 24, 2021 | +10.0B | +BAAI | +Company | +China | +English | +Closed | +Closed | +Encoder-Decoder (Autoregressive) | ++ |
CodeT5 | +September 2, 2021 | +16.0B | +Salesforce | +Company | +USA | +Code, English | +Open (BSD 3-Clause) | +Closed | +Encoder-Decoder | +arXiv | +
HyperCLOVA | +September 10, 2021 | +204.0B | +NAVER | +Company | +Korea | +Korean | +Closed | +Closed | +Decoder-only | +arXiv | +
NTT Dialog | +September 17, 2021 | +1.6B | +NTT | +Company | +Japan | +English, Chinese | +Non-Commercial | +Closed | +Encoder-Decoder | ++ |
Plato-XL | +September 20, 2021 | +11.0B | +Baidu | +Company | +China | +Chinese, English | +Open (Apache 2.0) | +Closed | +Decoder-only PrefixLM | ++ |
Zidong Taichu | +September 27, 2021 | +100.0B | +Chinese Academy of Sciences | +Non-profit | +China | ++ | Closed | +Closed | +Undisclosed | +https://www.lwxsd.com/pcen/info_view.php?tab=mynews&VID=39572 | +
Yuan 1.0 | +October 10, 2021 | +245.0B | +Inspur AI Research | +Company | +China | +Chinese | +Limited Access | +Partially Released | +Decoder-only | +arXiv | +
Megatron-Turing | +October 11, 2021 | +530.0B | +Microsoft, NVIDIA | +Company | +USA | +English | +Closed | +Closed + Pile | +Decoder-only | ++ |
PAGnol | +October 16, 2021 | +1.5B | +LightOn | +Company | +France | +French | +API | +Closed + OSCAR | +Decoder-only | +PAGnol: An Extra-Large French Generative Model | +
Anthropic LM | +December 1, 2021 | +52.0B | +Anthropic | +Company | +USA | +English | +Closed | +Closed | +Decoder-only | ++ |
ERNIE 3.0 | +December 8, 2021 | +260.0B | +Baidu | +Company | +China | +Chinese, English | +Closed | +Closed | +Encoder-Decoder w/ KGs | ++ |
Gopher | +December 8, 2021 | +280.0B | +DeepMind | +Company | +USA | +English | +Closed | +Closed + C4 | +Decoder-only | ++ |
GLaM | +December 13, 2021 | +1.2T | +Company | +USA | +English | +Closed | +Closed | +Decoder-only MoE | +GLaM: Efficient Scaling of Language Models with Mixture-of-Experts | +|
Exaone | +December 14, 2021 | +300.0B | +LG | +Company | +Korea | +English, Korean | +ChatBot | +Closed | +Undisclosed | ++ |
XGLM | +December 20, 2021 | +7.5B | +Meta | +Company | +USA, UK, Germany | +Multilingual | +Closed | +Closed | +Decoder-only | +https://arxiv.org/abs/2112.10668 | +
FairSeq Dense | +December 20, 2021 | +13.0B | +Meta | +Company | +USA, UK, Germany | +English | +Open (Ambiguous) | +Closed | +Decoder-only | ++ |
LaMDA | +January 20, 2022 | +137.0B | +Company | +USA | +English | +Closed | +Closed + C4 | +Decoder-only | +LaMDA: Language Models for Dialog Applications | +|
GPT-NeoX-20B | +February 2, 2022 | +20.0B | +EleutherAI | +Non-profit | +Multinational | +English | +Open (Apache 2.0) | +Open (Pile) | +Decoder-only | ++ |
GPT-SW3 | +February 15, 2022 | +3.5B | +AI Sweden | +Non-profit | +Sweden | +Swedish | +Limited Access | +Closed + Pile + OSCSAR + mC4 | +Decoder-only | +GPT-SW3: An Autoregressive Language Model for the Nordic Languages | +
PolyCoder | +February 26, 2022 | +2.7B | +Carnegie Mellon University | +Non-profit | +USA | +Code | +Released (Unlicensed) | +Partially Released | +Decoder-only | +NinedayWang/PolyCoder-2.7B · Hugging Face | +
CodeGen | +March 25, 2022 | +16.1B | +Salesforce | +Company | +USA | +Code, English | +Open (BSD 3-Clause) | +Closed + Pile | +Decoder-only | +arXiv , https://huggingface.co/Salesforce/codegen-16B-multi | +
Chinchilla | +March 29, 2022 | +70.0B | +DeepMind | +Company | +USA | +English | +Closed | +Closed + C4 | +Decoder-only | +Training Compute-Optimal Large Language Models | +
PaLM | +April 4, 2022 | +540.0B | +Company | +USA | +English | +Closed | +Closed | +Decoder-only | +PaLM: Scaling Language Modeling with Pathways | +|
NOOR | +April 11, 2022 | +13.0B | +Technology Innovation Institute, LightOn | +Multilateral colaboration | +UAE, France | +Arabic | +Closed | +Closed | +Decoder-only | +https://aclanthology.org/2022.bigscience-1.8.pdf | +
InCoder-6B | +April 12, 2022 | +6.0B | +Meta | +Company | +USA | +Code | +Non-Commercial | +Closed | +Decoder-only (Autoregressive Span Corruption) | ++ |
Lyra-fr | +April 12, 2022 | +10.0B | +LightOn | +Company | +France | +French | +API | +Closed | +Decoder-only | ++ |
OPT | +May 2, 2022 | +175.0B | +Meta | +Company | +USA | +English | +Non-Commercial | +Closed + Pile | +Decoder-only | +OPT: Open Pre-trained Transformer Language Models | +
UL2 | +May 10, 2022 | +20.0B | +Company | +USA | +English | +Open (Apache 2.0) | +Open (C4) | +Encoder-Decoder | ++ | |
YaLM | +May 23, 2022 | +100.0B | +Yandex | +Company | +Russia | +Russian, English | +Open (Apache 2.0) | +Closed + Pile | +Decoder-only | ++ |
BLOOM | +May 26, 2022 | +176.0B | +BigScience | +Multilateral colaboration | +Multinational | +Multilingual | +Commercial Use | +Limited Access + S2ORC + OSCAR | +Decoder-only | ++ |
NeMo Megatron | +September 14, 2022 | +20.0B | +NVIDIA | +Company | +USA | +English | +Open (CC-BY-4.0) | +Open (Pile) | +Decoder-only | ++ |
GLM-130B | +October 5, 2022 | +130.0B | +Tsinghua University | +Non-profit | +China | +English | +Non-Commercial | +Closed + Pile | +Decoder-only (Autoregressive Span Corruption) | ++ |
GenSLMs | +October 11, 2022 | +25.0B | +Argone National Lab | +Non-profit | +USA | +Genomics | +Open (MIT) | +Partially Released | +Custom | +GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics | bioRxiv | +
Galactica | +November 16, 2022 | +120.0B | +Meta | +Company | +USA | +English | +Non-Commercial | +Closed + S2ORC | +Decoder-only | ++ |
GPT-3.5 | +November 19, 2022 | +Undisclosed | +OpenAI | +Company | +USA | +Multilingual | +API | +Closed | +Undisclosed | ++ |
OpenFold | +November 24, 2022 | +93.0M | +Multilateral Colaboration | +Multilateral colaboration | +USA | +Protein Structure | +Open (Apache 2.0) | +Open | +Custom | ++ |
Pythia | +December 10, 2022 | +12.0B | +EleutherAI | +Non-profit | +Multinational | +English | +Open (Apache 2.0) | +Open (Pile) | +Decoder-only | ++ |
Pythia-deduped | +December 10, 2022 | +12.0B | +EleutherAI | +Non-profit | +Multinational | +English | +Open (Apache 2.0) | +Open (Pile) | +Decoder-only | ++ |
BioMedGPT | +December 15, 2022 | +2.7B | +Stanford CRFM | +Non-Profit | +USA | +English (Medical Texts) | +Commercial Use | +Open (Pile) | +Decoder-only | ++ |
Palmyra | +January 5, 2023 | +20.0B | +Writer | +Company | +USA | +English | +Open (Apache 2.0) | +Closed | +Decoder-only | ++ |
SantaCoder | +January 9, 2023 | +1.1B | +BigCode Project | +Company | +Multinational | +Code | +Commercial Use | +Open (Stack) | +Decoder-only (Autoregressive Span Corruption) | ++ |
MusicLM | +January 26, 2023 | +1.4B | +Company | +USA | +Music | +Closed | +Partially Released (FMA) | +Custom | +https://arxiv.org/abs/2301.11325 | +|
Anthropic 175B LM | +February 15, 2023 | +175.0B | +Anthropic | +Company | +USA | ++ | Closed | +Closed | +Undisclosed | ++ |
GPT-3-Finnish | +February 16, 2023 | +13.0B | +University of Turku | +Non-profit | +Finland | +Finnish | +Open (Apache 2.0) | +Reproducible + ROOTS | +Decoder-only | ++ |
LLaMA | +February 24, 2023 | +65.0B | +Meta | +Company | +USA | +English | +Non-Commercial | +Externally Replicated | +Decoder-only | ++ |
GPT-SW3 40B | +March 1, 2023 | +40.0B | +AI Sweden, RISE | +Non-profit | +Sweden | +Nordic Languages and English | +Limited Access | +Closed + Pile + mC4 + OSCAR | +Decoder-only | ++ |
Jurassic-2 | +March 9, 2023 | +Undisclosed | +AI21 | +Company | +Israel | +Multilingual (7) | +API | +Closed | +Undisclosed | ++ |
ChatGLM | +March 13, 2023 | +6.0B | +Tsinghua University | +Non-profit | +China | +English, Chinese | +Commercial Use | +Closed | +Decoder-only (Autoregressive Span Corruption) | ++ |
Claude-1 | +March 14, 2023 | +Undisclosed | +Anthropic | +Company | +USA | +Multilingual | +ChatBot | +Closed | +Undisclosed | ++ |
GPT-4 | +March 14, 2023 | +Undisclosed | +OpenAI | +Company | +USA | +Multilingual | +ChatBot | +Closed | +Undisclosed | ++ |
ESM-2 | +March 16, 2023 | +15.0B | +Meta | +Company | +USA | +Protein | +Open (MIT) | +Open | +Encoder-only | ++ |
CerebrasGPT | +March 28, 2023 | +13.0B | +Cerebras | +Company | +USA, Canada | +English | +Open (Apache 2.0) | +Open (Pile) | +Decoder-only | ++ |
BloombergGPT | +March 30, 2023 | +50.6B | +Bloomberg | +Company | +USA | +English | +Closed | +Closed + Pile + C4 | +Decoder-only | +BloombergGPT: A Large Language Model for Finance | +
RedPajamas-INCITE | +April 17, 2023 | +7.0B | +TogetherAI | +Company | +USA | +English | +Open (Apache 2.0) | +Open (RP) | +Decoder-only | ++ |
StableLM | +April 19, 2023 | +7.0B | +Stability AI | +Company | +USA | +English | +Open (CC-BY-SA-4.0) | +Closed + Pile | +Decoder-only | ++ |
CodeGen2 | +May 3, 2023 | +16.0B | +Salesforce | +Company | +USA | +Code | +Open (Apache 2.0) | +Open (Stack) | +Decoder-only (Autoregressive Span Corruption) | ++ |
MPT-7B | +May 5, 2023 | +6.7B | +MosaicML | +Company | +USA | +English | +Open (Apache 2.0) | +Open (C4, RP, S2ORC, Stack, Pile) | +Decoder-only | ++ |
StarCoder | +May 9, 2023 | +15.0B | +BigCode Project | +Company | +Multinational | +Code | +Commercial Use | +Open (Stack) | +Decoder-only (Autoregressive Span Corruption) | ++ |
PaLM-2 | +May 10, 2023 | +Undisclosed | +Company | +USA | +Multilingual | +ChatBot | +Closed | +Decoder-only | ++ | |
CodeT5+ | +May 13, 2023 | +16.0B | +Salesforce | +Company | +USA | +Code, English | +Open (BSD 3-Clause) | +Closed | +Encoder-Decoder | ++ |
OpenCALM | +May 15, 2023 | +7.0B | +CyberAgent | +Company | +Japan | +Japanese | +Open (CC BY-SA 4.0) | +Closed | +Decoder-only | ++ |
RWKV v4 | +May 22, 2023 | +13.0B | +EleutherAI and GenAI Commons | +Non-profit | +Multinational | +English | +Open (Apache 2.0) | +Open (Pile) | +RNN | ++ |
Falcon-40B | +May 25, 2023 | +40.0B | +Technology Innovation Institute, LightOn | +Multilateral colaboration | +UAE, France | +English | +Open (Apache 2.0) | +Partially Released | +Decoder-only | ++ |
StellarX | +May 27, 2023 | +4.0B | +Arkane Industries | +Company | +Undisclosed | ++ | CC-BY-NC-SA-4.0 | +Open (RP) | +Decoder-only | +Dampish/StellarX-4B-V0.2 | +
Polyglot-Ko | +June 4, 2023 | +12.8B | +EleutherAI | +Non-profit | +Korea | +Korean | +Open (Apache 2.0) | +Closed | +Decoder-only | ++ |
PULI-GPTrio | +June 8, 2023 | +7.7B | +Hungarian Research Centre for Linguistics | +Non-profit | +Hungary | +English, Hungarian, Chinese | +CC-BY-NC | +Closed | +Decoder-only | +NYTK/PULI-GPTrio | +
Baichuan-1 7B | +June 15, 2023 | +7.0B | +Baichuan AI | +Company | +China | +Chinese, English | +Commercial Use | +Closed | +Decoder-only | +baichuan-inc/Baichuan-7B | +
Zidong Taichu 2.0 | +June 16, 2023 | +100.0B | +Chinese Academy of Sciences | +Non-profit | +China | ++ | Closed | +Closed | +Undisclosed | ++ |
Phi-1 | +June 20, 2023 | +1.3B | +Microsoft | +Company | +USA | +English | +Non-Commercial | +Closed | +Decoder-only | ++ |
MPT-30B | +June 22, 2023 | +30.0B | +MosaicML | +Company | +USA | +English | +Open (Apache 2.0) | +Open (C4, RP, S2ORC, Stack, Pile) | +Decoder-only | ++ |
Inflection-1 | +June 23, 2023 | +Undisclosed | +Inflection | +Company | +USA | ++ | ChatBot | +Closed | +Decoder-only | ++ |
ChatGLM2 | +June 25, 2023 | +6.0B | +Tsinghua University | +Non-profit | +China | +English, Chinese | +Commercial Use | +Closed | +Decoder-only (Autoregressive Span Corruption) | ++ |
OpenLLaMA | +June 30, 2023 | +13.0B | +UC Berkeley | +Non-profit | +USA | +English | +Open (Apache 2.0) | +Open (RP) | +Decoder-only | ++ |
Codegen2.5 | +July 6, 2023 | +7.0B | +Salesforce | +Company | +USA | +Code | +Open (Apache 2.0) | +Open (Stack) | +Decoder-only (Autoregressive Span Corruption) | +CodeGen2.5: Small, but mighty | +
Baichuan-1 13B | +July 10, 2023 | +13.0B | +Baichuan AI | +Company | +China | +Chinese, English | +Commercial Use | +Closed | +Decoder-only | +https://huggingface.co/baichuan-inc/Baichuan-13B | +
Claude-2 | +July 11, 2023 | +Undisclosed | +Anthropic | +Company | +USA | +Multilingual | +ChatBot | +Closed | +Undisclosed | ++ |
LLaMA 2 | +July 18, 2023 | +70.0B | +Meta | +Company | +USA | +English | +Commercial Use | +Closed | +Decoder-only | ++ |
Exaone | +July 19, 2023 | +Undisclosed | +LG | +Company | +Korea | +English, Korean | +ChatBot | +Closed | +Undisclosed | ++ |
BTLM-3B-8k-base | +July 24, 2023 | +3.0B | +Cerebras | +Company | +USA, Canada, UAE | +English | +Open (Apache 2.0) | +Open (RP) | +Decoder-only | ++ |
bilingual-gpt-neox-4b | +July 30, 2023 | +3.8B | +Rinna | +Company | +Japan | +Japanese | +Open (MIT) | +Open (Pile + mC4 + RP) | +Decoder-only | ++ |
Jiang | +August 1, 2023 | +30.0B | +KDF | +Company | +China | +Chinese | +Open (Apache 2.0) | +Closed | +Decoder-only | ++ |
Baichuan-1 53B | +August 8, 2023 | +53.0B | +Baichuan AI | +Company | +China | +Chinese, English | +Commercial Use | +Closed | +Decoder-only | +百川智能发布旗下第三代大模型产品Baichuan-53B | +
StableLM Japanese | +August 10, 2023 | +7.0B | +Stability AI | +Company | +USA, Japan | +Japanese | +Open (Apache 2.0) | +Open (Composite) | +Decoder-only | ++ |
DeciCoder | +August 16, 2023 | +1.0B | +Deci | +Company | +Israel | +Code | +Open (Apache 2.0) | +Open (StarCoder) | +Decoder-only | +Deci/DeciCoder-1b · Hugging Face | +
Dou Bao | +August 18, 2023 | +Undisclosed | +ByteDance | +Company | +China | ++ | API | +Closed | +Undisclosed | ++ |
Jais | +August 30, 2023 | +13.0B | +Technology Innovation Institute, LightOn | +Multilateral colaboration | +USA, Canada, UAE | +Arabic | +Open (Apache 2.0) | +Closed + Pile + mC4 | +Decoder-only | ++ |
Falcon-180B | +September 6, 2023 | +180.0B | +Technology Innovation Institute, LightOn | +Multilateral colaboration | +UAE, France | +English | +Commercial Use | +Partially Released | +Decoder-only | ++ |
Persimmon-8B | +September 7, 2023 | +8.0B | +Adept AI | +Company | +USA | +English | +Open (Apache 2.0) | +Closed | +Decoder-only | +Releasing Persimmon-8B | +
XGen | +September 7, 2023 | +7.0B | +Salesforce | +Company | +USA | +English, Code | +Open (Apache 2.0) | +Closed + RP + Stack | +Decoder-only | +Salesforce/xgen-7b-8k-base · Hugging Face | +
Phi-1.5 | +September 11, 2023 | +1.3B | +Microsoft | +Company | +USA | +English | +Non-Commercial | +Closed | +Decoder-only | ++ |
FLM-101B | +September 17, 2023 | +101.0B | +Multilateral, lead by BAAI | +Multilateral colaboration | +China | +Chinese, English | +Open (Apache 2.0) | +Closed | +Decoder-only | +FLM-101B: An Open LLM and How to Train It with $100K Budget | +
Baichuan-2 | +September 19, 2023 | +13.0B | +Baichuan AI | +Company | +China | +English, Chinese | +Commercial Use | +Closed | +Decoder-only | +Baichuan 2: Open Large-scale Language Models | +
InternLM | +September 20, 2023 | +20.0B | +Shanghai AI Laboratory | +Non-profit | +China | +Chinese, English | +Commercial Use | +Closed | +Decoder-only | +https://huggingface.co/internlm/internlm-20b | +
Unnamed Model | +September 26, 2023 | +3.6B | +Line Corporation | +Company | +Japan | +Japanese | +Open (Apache 2.0) | +Closed | +Decoder-only | ++ |
Mistral-7B-v0.1 | +September 27, 2023 | +7.0B | +Mistral AI | +Company | +France | +English | +Open (Apache 2.0) | +Closed | +Decoder-only | +https://mistral.ai/news/announcing-mistral-7b/ , https://arxiv.org/abs/2310.06825 | +
RWKV v4 World | +October 10, 2023 | +7.0B | +EleutherAI and GenAI Commons | +Non-profit | +Multinational | +Multilingual | +Open (Apache 2.0) | +Open (incl. Pile) | +RNN | ++ |
CodeFuse | +October 10, 2023 | +13.0B | +Ant Group | +Company | +China | +Chinese, English, Code | +Open (Custom) | +Closed | +Decoder-only | ++ |
Qwen-14B | +October 13, 2023 | +14.0B | +Alibaba Group | +Company | +China | +Multilingual | +Commercial Use | +Closed | +Decoder-only | +arXiv , https://huggingface.co/Qwen/Qwen-14B/blob/main/LICENSE | +
ChatGLM3 | +October 27, 2023 | +6.0B | +Tsinghua University | +Non-profit | +China | +English, Chinese | +Commercial Use | +Closed | +Decoder-only (Autoregressive Span Corruption) | ++ |
Skywork | +October 30, 2023 | +13.0B | +Kunlun | +Company | +China | +English, Chinese | +Commercial Use | +Partially Released | +Decoder-only | ++ |
DeepSeek-Coder | +November 2, 2023 | +33.0B | +DeepSeek | +Company | +China | +Code, English, Chinese | +Commercial Use | +Closed | +Decoder-only | +DeepSeek Coder | +
FinGPT-3 | +November 3, 2023 | +13.3B | +The University of Turku, Hugging Face, the Finish National Library, and AMD | +Multilateral colaboration | +Finland, USA | +Finish | +Open (Apache 2.0) | +Closed + mC4 | +Decoder-only | ++ |
Grok-0 | +November 5, 2023 | +33.0B | +xAI | +Company | +USA | +English | +Closed | +Closed | +Undisclosed | +https://x.ai/ | +
Grok-1 | +November 5, 2023 | +Undisclosed | +xAI | +Company | +USA | +English | +Closed | +Closed | +Undisclosed | +https://x.ai/ | +
Yi | +November 5, 2023 | +34.0B | +01.AI | +Company | +China | +English, Chinese | +Non-Commercial | +Closed | +Decoder-only | +https://huggingface.co/01-ai/Yi-34B | +
XVERSE | +November 8, 2023 | +65.0B | +Shenzhen Yuanxiang Technology | +Company | +China | +Multilingual (40) | +Downloadable (Unknown) | +Closed | +Decoder-only | +https://huggingface.co/xverse/XVERSE-65B | +
FORGE | +November 12, 2023 | +26.0B | +Oak Ridge National Lab | +Non-profit | +USA | +English | +Closed | +Open | +Decoder-only | ++ |
RWKV v5 | +November 14, 2023 | +3.0B | +EleutherAI and GenAI Commons | +Non-profit | +Multinational | +English | +Open (Apache 2.0) | +Open (Pile) | +RNN | ++ |
Phi-2 | +November 16, 2023 | +2.7B | +Microsoft | +Company | +USA | +English | +Non-Commercial | +Closed | +Decoder-only | ++ |
Inflection-2 | +November 22, 2023 | +Undisclosed | +Inflection | +Company | +USA | +English | +ChatBot | +Closed | +Undisclosed | ++ |
Qwen-72B | +November 30, 2023 | +72.0B | +Alibaba | +Company | +China | +English, Chinese | +Commercial Use | +Closed | +Decoder-only | ++ |
Mamba | +December 1, 2023 | +2.8B | +Carnegie Mellon University and Princeton | +Non-profit | +USA | +English | +Open (Apache 2.0) | +Open (Pile) | +State Space Model | ++ |
Gemini | +December 6, 2023 | +Undisclosed | +Company | +USA | +Multilingual | +ChatBot | +Closed | +Decoder-only | ++ | |
Mixtral | +December 11, 2023 | +12.0B | +Mistral AI | +Company | +France | +Multilingual (5) + Code | +Open (Apache 2.0) | +Closed | +Decoder-only MoE 8x7B | ++ |
Amber | +December 11, 2023 | +7.0B | +Petuum, MBZUAI, USC, CMU, UIUC, and UCSD | +Multilateral colaboration | +USA and Saudi Arabia | +English | +Open (Apache 2.0) | +Open (RW, StarCoder, RP-1) | +Decoder-only | ++ |
CrystalCoder | +December 11, 2023 | +7.0B | +Petuum, MBZUAI, USC, CMU, UIUC, and UCSD | +Multilateral colaboration | +USA and Saudi Arabia | +English, Code | +Open (Apache 2.0) | +Open (SlimPJ, StarCoder) | +Decoder-only | ++ |
SHAI | +December 25, 2023 | +10.0B | ++ | + | + | + | + | + | Decoder-only | ++ |
YAYI | +December 25, 2023 | +30.0B | +Wenge Research | +Company | +China | +Multilingual | +Non-Commercial | ++ | Decoder-only | ++ |