Merge branch 'main' into auto-itercount

EleutherAI · Sep 24, 2024 · 4022bd9 · 4022bd9
2 parents a74bde0 + 1bce90c
commit 4022bd9
Show file tree

Hide file tree

Showing 28 changed files with 1,768 additions and 201 deletions.
diff --git a/README.md b/README.md
@@ -15,7 +15,7 @@ GPT-NeoX leverages many of the same features and technologies as the popular Meg
 * Cutting edge architectural innovations including rotary and alibi positional embeddings, parallel feedforward attention layers, and flash attention.
 * Predefined configurations for popular architectures including Pythia, PaLM, Falcon, and LLaMA 1 \& 2
 * Curriculum Learning
-* Easy connections with the open source ecosystem, including Hugging Face's [tokenizers](https://github.com/huggingface/tokenizers) and [transformers](https://github.com/huggingface/transformers/) libraries, logging via [WandB](https://wandb.ai/site), and evaluation via our [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness).
+* Easy connections with the open source ecosystem, including Hugging Face's [tokenizers](https://github.com/huggingface/tokenizers) and [transformers](https://github.com/huggingface/transformers/) libraries, monitor experiments via [WandB](https://wandb.ai/site)/[Comet](https://www.comet.com/site/)/TensorBoard, and evaluation via our [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness).
 
 ## News
 **[9/9/2024]** We now support preference learning via [DPO](https://arxiv.org/abs/2305.18290), [KTO](https://arxiv.org/abs/2402.01306), and reward modeling
@@ -108,6 +108,7 @@ To install the remaining basic dependencies, run:
 pip install -r requirements/requirements.txt
 pip install -r requirements/requirements-wandb.txt # optional, if logging using WandB
 pip install -r requirements/requirements-tensorboard.txt # optional, if logging via tensorboard
+pip install -r requirements/requirements-comet.txt # optional, if logging via Comet
 ```
 
 from the repository root.
@@ -306,7 +307,7 @@ You can then run any job you want from inside the container.
 Concerns when running for a long time or in detached mode include
  - You will have to terminate the container manually when you are no longer using it
  - If you want processes to continue running when your shell session ends, you will need to background them.
- - If you then want logging, you will have to make sure to pipe logs to disk or set up wandb.
+ - If you then want logging, you will have to make sure to pipe logs to disk, and set up wandb and/or Comet logging.
 
 If you prefer to run the prebuilt container image from dockerhub, you can run the docker compose commands with ```-f docker-compose-dockerhub.yml``` instead, e.g.,
 
@@ -528,7 +529,7 @@ You can pass in an arbitrary number of configs which will all be merged at runti
 
 You can also optionally pass in a config prefix, which will assume all your configs are in the same folder and append that prefix to their path.
 
-E.G:
+For example:
 
 ```bash
 python ./deepy.py train.py -d configs 125M.yml local_setup.yml
@@ -645,24 +646,28 @@ To convert from a Hugging Face model into a NeoX-loadable, run `tools/ckpts/conv
 
 # Monitoring
 
-In addition to storing logs locally, we provide built-in support for two popular experiment monitoring frameworks: [Weights & Biases](https://wandb.ai/site) and [TensorBoard](https://www.tensorflow.org/tensorboard/)
+In addition to storing logs locally, we provide built-in support for two popular experiment monitoring frameworks: [Weights & Biases](https://wandb.ai/site), [TensorBoard](https://www.tensorflow.org/tensorboard/), and [Comet](https://www.comet.com/site)
 
 ## Weights and Biases
 
-EleutherAI is currently using [Weights & Biases to record our experiments](https://wandb.ai/eleutherai/neox). If you are logged into Weights & Biases on your machine&mdash;you can do this by executing `wandb login`&mdash;your runs will automatically be recorded. There are two optional fields associated with Weights & Biases: <code><var>wandb_group</var></code> allows you to name the run group and <code><var>wandb_team</var></code> allows you to assign your runs to an organization or team account.
+[Weights & Biases to record our experiments](https://wandb.ai/eleutherai/neox) is a machine learning monitoring platform. To use wandb to monitor your gpt-neox experiments:
+1. Create an account at https://wandb.ai/site to generate your API key
+2. Log into Weights & Biases on your machine&mdash;you can do this by executing `wandb login`&mdash;your runs will automatically be recorded.
+3. Dependencies required for wandb monitoring can be found in and installed from `./requirements/requirements-wandb.txt`. An example config is provided in `./configs/local_setup_wandb.yml`.
+4. There are two optional fields associated with Weights & Biases: <code><var>wandb_group</var></code> allows you to name the run group and <code><var>wandb_team</var></code> allows you to assign your runs to an organization or team account. An example config is provided in `./configs/local_setup_wandb.yml`.
 
 ## TensorBoard
 
-We also support using TensorBoard via the <code><var>tensorboard-dir</var></code> field. Dependencies required for TensorBoard monitoring can be found in and installed from  `./requirements/requirements-tensorboard.txt`.
+We support using TensorBoard via the <code><var>tensorboard-dir</var></code> field. Dependencies required for TensorBoard monitoring can be found in and installed from  `./requirements/requirements-tensorboard.txt`.
 
-## Comet ML
+## Comet
 
-[Comet ML](https://www.comet.com/) is a machine learning monitoring platform. To use comet to monitor your gpt-neox experiments:
-1. Create an account at https://www.comet.com/login to generate your API key. Either create a workspace or pass your default workspace in your gpt-neox config under the `comet_workspace` config arg.
-2. Once generated, link your API key at runtime by passing `export COMET_API_KEY=<your-key-here>`
+[Comet](https://www.comet.com/site) is a machine learning monitoring platform. To use comet to monitor your gpt-neox experiments:
+1. Create an account at https://www.comet.com/login to generate your API key.
+2. Once generated, link your API key at runtime by running `comet login` or passing `export COMET_API_KEY=<your-key-here>`
 3. Install `comet_ml` and any dependency libraries via `pip install -r requirements/requirements-comet.txt`
-4. Pass `use_comet: True` and your workspace name under `comet)wor in your config. A full example config with comet enabled is provided in `configs/local_setup_comet.yml`
-5. Run your experiment, and monitor in comet workspace that you passed!
+4. Enable Comet with `use_comet: True`. You can also customize where data is being logged with `comet_workspace` and `comet_project`. A full example config with comet enabled is provided in `configs/local_setup_comet.yml`.
+5. Run your experiment, and monitor metrics in the Comet workspace that you passed!
 
 # Running on multi-node
 

diff --git a/configs/README.md b/configs/README.md
@@ -235,7 +235,32 @@ Additional DeepSpeed settings besides those mentioned above should be wrapped in
    "eval_iters": 10,
 ```
 
-However, if you want to use DPO style training you'll need to set pos/neg data paths instead of a single one, e.g.
+For KTO style training, you'll need to add the reward & label data path, e.g.:
+
+```yaml
+   "data_impl": "mmap",
+   # Suggested data paths when using GPT-NeoX locally
+   "train_data_path": "data/enwik8/enwik8_text_document",
+   "train_label_data_path": "data/enwik8/enwik8_text_label_document",
+   "train_reward_data_path": "data/enwik8/enwik8_text_reward_document",
+   "test_data_path": "data/enwik8/enwik8_text_document",
+   "test_label_data_path": "data/enwik8/enwik8_text_label_document",
+   "test_reward_data_path": "data/enwik8/enwik8_text_reward_document",
+   "valid_data_path": "data/enwik8/enwik8_text_document",
+   "valid_label_data_path": "data/enwik8/enwik8_text_label_document",
+   "valid_reward_data_path": "data/enwik8/enwik8_text_reward_document",
+   "vocab_file": "data/gpt2-vocab.json",
+   "merge_file": "data/gpt2-merges.txt",
+   "save": "checkpoints",
+   "load": "checkpoints",
+   "tensorboard_dir": "tensorboard",
+   "log_dir": "logs",
+   "checkpoint_factor": 10000,
+   "eval_interval": 1000,
+   "eval_iters": 10,
+```
+
+For DPO style training, you'll need to set pos/neg data paths instead of a single one, e.g.
 
 ```yaml
    "dataset_impl": "pairwise",

diff --git a/configs/local_setup.yml b/configs/local_setup.yml
@@ -24,7 +24,4 @@
 
   "tensorboard_dir": "tensorboard",
   "log_dir": "logs",
-  "use_wandb": True,
-  "wandb_host": "https://api.wandb.ai",
-  "wandb_project": "neox"
 }
diff --git a/configs/local_setup_comet.yml b/configs/local_setup_comet.yml
@@ -25,7 +25,7 @@
   "tensorboard_dir": "tensorboard",
   "log_dir": "logs",
   "use_comet": True,
-  "comet_workspace": "test_workspace", # CHANGE ME
+  # "comet_workspace": "test_workspace", # CHANGE ME
   "comet_project": "test_project",
   "comet_experiment_name": "test_experiment",
   "comet_tags": ["test_tag1", "test_tag2"],

diff --git a/configs/local_setup_wandb.yml b/configs/local_setup_wandb.yml
@@ -0,0 +1,30 @@
+# Suggested data paths when using GPT-NeoX locally
+{
+  "data_path": "data/enwik8/enwik8_text_document",
+
+  # or for weighted datasets:
+  # "train-data-paths": ["data/enwik8/enwik8_text_document", "data/enwik8/enwik8_text_document"],
+  # "test-data-paths": ["data/enwik8/enwik8_text_document", "data/enwik8/enwik8_text_document"],
+  # "valid-data-paths": ["data/enwik8/enwik8_text_document", "data/enwik8/enwik8_text_document"],
+  # "train-data-weights": [1., 2.],
+  # "test-data-weights": [2., 1.],
+  # "valid-data-weights": [0.5, 0.4],
+
+  # If weight_by_num_documents is True, Builds dataset weights from a multinomial distribution over groups of data according to the number of documents in each group.
+  # WARNING: setting this to True will override any user provided weights
+  # "weight_by_num_documents": false,
+  # "weighted_sampler_alpha": 0.3,
+
+  "vocab_file": "data/gpt2-vocab.json",
+  "merge_file": "data/gpt2-merges.txt",
+
+  "save": "checkpoints",
+  "load": "checkpoints",
+  "checkpoint_validation_with_forward_pass": False,
+
+  "tensorboard_dir": "tensorboard",
+  "log_dir": "logs",
+  "use_wandb": True,
+  "wandb_host": "https://api.wandb.ai",
+  "wandb_project": "neox"
+}