-
Notifications
You must be signed in to change notification settings - Fork 49
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #96 from stanford-crfm/dev
Release v1.0
- Loading branch information
Showing
35 changed files
with
1,117 additions
and
58 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
|
||
name: Run Tests | ||
on: [push] | ||
jobs: | ||
Run-Mistral-Tests: | ||
runs-on: self-hosted | ||
steps: | ||
- run: echo "🎉 The job was automatically triggered by a ${{ github.event_name }} event." | ||
- run: echo "🐧 This job is now running on a ${{ runner.os }} server hosted by GitHub!" | ||
- run: echo "🔎 The name of your branch is ${{ github.ref }} and your repository is ${{ github.repository }}." | ||
- name: Check out repository code | ||
uses: actions/checkout@v2 | ||
- run: echo "💡 The ${{ github.repository }} repository has been cloned to the runner." | ||
- run: echo "🖥️ The workflow is now ready to test your code on the runner." | ||
- name: Setup | ||
run: | | ||
cp -r /home/stanzabuild/mistral/wandb . | ||
wandb offline | ||
- name: Tests for arguments (single node/single GPU) | ||
if: always() | ||
run: | | ||
cd tests | ||
CUDA_VISIBLE_DEVICES=0 pytest test_args.py | ||
- name: Tests for checkpoints (single node/single GPU) | ||
if: always() | ||
run: | | ||
cd tests | ||
CUDA_VISIBLE_DEVICES=0 pytest test_checkpoint.py | ||
- name: Tests for upcasting (single node/single GPU) | ||
if: always() | ||
run: | | ||
cd tests | ||
CUDA_VISIBLE_DEVICES=0 pytest test_fp.py | ||
- name: Tests for random seed (single node/single GPU) | ||
if: always() | ||
run: | | ||
cd tests | ||
CUDA_VISIBLE_DEVICES=0 pytest test_seed.py | ||
- run: echo "🍏 This job's status is ${{ job.status }}." |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
Differences between Mistral and Hugging Face | ||
=============== | ||
|
||
Mistral is not a replacement for Hugging Face. Rather, we extend the current functionalities in Hugging Face | ||
by fixing stability issues with GPT training, adding evaluation scripts and supporting distributed training | ||
with the DeepSpeed optimization library. | ||
|
||
|
||
**Stability** | ||
|
||
When training GPT-2 Small models with Hugging Face, some of the models crashed due to numerical instability. | ||
We fixed the this issue by rearranging the order of operations in scaled dot-product attention computation | ||
and upcasting to FP32. We also scaled down the weights by dividing by the layer number to prevent overflow. | ||
|
||
|
||
**Evaluation** | ||
|
||
We added online evaluation so we can get PPL on arbitrary datasets while training. | ||
|
||
|
||
**Parallelism** | ||
|
||
We noticed that integrating parallelism (e.g. tensor model-parallelism and pipelining) breaks the current | ||
Hugging Face APIs. | ||
|
||
|
||
**Distributed Training** | ||
|
||
We provide ready-to-use scripts and configuration files to run distributed training with DeepSpeed, | ||
Google Cloud Platform and Kubernetes. | ||
|
||
|
||
**Future** | ||
|
||
We are closely working with folks from Hugging Face. We plan to integrate Mistral into the Hugging Face library | ||
in the future |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
[mypy] | ||
disable_error_code=override | ||
|
||
# do not follow imports (except for ones found in typeshed) | ||
ignore_missing_imports = True | ||
#Ignore errors for third parties | ||
ignore_errors = True | ||
follow_imports = silent | ||
|
||
# treat Optional per PEP 484 | ||
strict_optional = False | ||
|
||
warn_unused_configs = True | ||
warn_redundant_casts = True | ||
# ensure all execution paths are returning | ||
warn_no_return= True | ||
warn_unreachable = True | ||
allow_redefinition = True | ||
|
||
show_error_codes = True | ||
check_untyped_defs = True | ||
|
||
|
||
files= | ||
src, | ||
tests, | ||
train.py | ||
python_version = 3.6 | ||
|
||
[mypy-src.*] | ||
ignore_errors = False |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.