-
Notifications
You must be signed in to change notification settings - Fork 35
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #145 from stanford-crfm/jonathan/1124-weekly-assets
weekly update
- Loading branch information
Showing
8 changed files
with
243 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,22 +1,22 @@ | ||
--- | ||
- type: model | ||
name: Yi | ||
organization: 01.AI | ||
description: Yi is a LLM that can accept input/outputs in both English and Chinese. | ||
organization: 01 AI | ||
description: The Yi series models are large language models trained from scratch by developers at 01 AI. | ||
created_date: 2023-11-02 | ||
url: https://github.com/01-ai/Yi | ||
model_card: https://huggingface.co/01-ai/Yi-34B | ||
modality: text; text | ||
analysis: Evaluated on common sense reasoning and reading comprehension, analogous to LLaMA 2's analysis. | ||
analysis: Evaluated on standard language benchmarks, common sense reasoning, and reading comprehension in comparison to SoTA LLMs. | ||
size: 34B parameters (dense) | ||
dependencies: [] | ||
training_emissions: unknown | ||
training_time: unknown | ||
training_hardware: unknown | ||
quality_control: '' | ||
quality_control: Model underwent supervised fine-tuning, leading to a greater diversity of responses. | ||
access: open | ||
license: Apache 2.0 | ||
intended_uses: Academic research and free commercial usage | ||
prohibited_uses: '' | ||
monitoring: none | ||
intended_uses: '' | ||
prohibited_uses: none | ||
monitoring: unknown | ||
feedback: https://huggingface.co/01-ai/Yi-34B/discussions |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
--- | ||
- type: model | ||
name: Deepseek | ||
organization: Deepseek AI | ||
description: Deepseek is a 67B parameter model with Grouped-Query Attention trained on 2 trillion tokens from scratch. | ||
created_date: 2023-11-29 | ||
url: https://github.com/deepseek-ai/DeepSeek-LLM | ||
model_card: https://huggingface.co/deepseek-ai/deepseek-llm-67b-base | ||
modality: text; text | ||
analysis: Deepseek and baseline models (for comparison) evaluated on a series of representative benchmarks, both in English and Chinese. | ||
size: 67B parameters (dense) | ||
dependencies: [] | ||
training_emissions: unknown | ||
training_time: unknown | ||
training_hardware: unknown | ||
quality_control: Training dataset comprised of diverse data composition and pruned and deduplicated. | ||
access: open | ||
license: MIT | ||
intended_uses: '' | ||
prohibited_uses: none | ||
monitoring: unknown | ||
feedback: https://huggingface.co/deepseek-ai/deepseek-llm-67b-base/discussions |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
--- | ||
- type: model | ||
name: Qwen | ||
organization: Qwen AI | ||
description: Qwen is a Transformer-based large language model, which is pretrained on a large volume of data, including web texts, books, codes, etc. | ||
created_date: 2023-11-26 | ||
url: https://arxiv.org/pdf/2309.16609.pdf | ||
model_card: https://huggingface.co/Qwen/Qwen-72B | ||
modality: text; text | ||
analysis: Evaluated on MMLU, C-Eval, GSM8K, MATH, HumanEval, MBPP, BBH, CMMLU, which are currently popular benchmarks, to test the model’s Chinese and English knowledge capabilities, translation, mathematical reasoning, coding and other capabilities. | ||
size: 72B parameters (dense) | ||
dependencies: [] | ||
training_emissions: unknown | ||
training_time: unknown | ||
training_hardware: unknown | ||
quality_control: none | ||
access: open | ||
license: Apache 2.0 | ||
intended_uses: '' | ||
prohibited_uses: none | ||
monitoring: unknown | ||
feedback: https://huggingface.co/Qwen/Qwen-72B/discussions |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters