From 42c477494ba0296201c14f5ae67bbc1cc6989d86 Mon Sep 17 00:00:00 2001
From: Quentin Anthony <qganthony@yahoo.com>
Date: Mon, 2 Oct 2023 18:41:02 -0400
Subject: [PATCH] flesh out ckpts README

---
 tools/ckpts/README.md | 127 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 118 insertions(+), 9 deletions(-)

diff --git a/tools/ckpts/README.md b/tools/ckpts/README.md
index 0558897bd..acae069b1 100644
--- a/tools/ckpts/README.md
+++ b/tools/ckpts/README.md
@@ -1,24 +1,133 @@
 # Checkpoint Scripts
 
 
-# Utilities
+## Utilities
 
-* `inspect_checkpoints.py` reports information about a saved checkpoint.
-* `merge_mp_partitions.py` reduce model (aka tensor) parallelism of a saved checkpoint.
+### `inspect_checkpoints.py` 
+Reports information about a saved checkpoint.
+```
+usage: inspect_checkpoints.py [-h] [--attributes [ATTRIBUTES ...]] [--interactive] [--compare] [--diff] dir
 
+positional arguments:
+  dir                   The checkpoint dir to inspect. Must be either: - a directory containing pickle binaries saved with 'torch.save' ending in .pt or .ckpt - a single path to a .pt or .ckpt file - two comma separated directories -
+                        in which case the script will *compare* the two checkpoints
+
+options:
+  -h, --help            show this help message and exit
+  --attributes [ATTRIBUTES ...]
+                        Name of one or several attributes to query. To access an attribute within a nested structure, use '/' as separator.
+  --interactive, -i     Drops into interactive shell after printing the summary.
+  --compare, -c         If true, script will compare two directories separated by commas
+  --diff, -d            In compare mode, only print diffs
+```
 
 ## HuggingFace Scripts
 
-* `convert_hf_to_sequential.py` converts a HuggingFace model to a NeoX compatible format
-* `convert_module_to_hf.py` converts a NeoX model with pipeline parallelism greater than 1 to a HuggingFace transformers `GPTNeoXForCausalLM` model
-* `convert_sequential_to_hf.py` converts a NeoX model without pipeline parallelism to a HuggingFace transformers `GPTNeoXForCausalLM` model.
-* `upload.py` uploads a _converted_ checkpoint to the HuggingFace hub.
+### `convert_hf_to_sequential.py` 
+```
+A script for converting publicly available Huggingface (HF) checkpoints NeoX format.
+
+Note that this script requires access to corresponding config files for equivalent NeoX models to those found in Hugging face.
+
+Example usage: (Converts the 70M Pythia model to NeoX format)
+================================================================
+OMPI_COMM_WORLD_RANK=0 CUDA_VISIBLE_DEVICES=0 python tools/ckpts/convert_hf_to_sequential.py \
+    --hf-model-name pythia-70m-v0 \
+    --revision 143000 \
+    --output-dir checkpoints/neox_converted/pythia/70m \
+    --cache-dir checkpoints/HF \
+    --config configs/pythia/70M.yml configs/local_setup.yml \
+    --test
+
+
+For multi-gpu support we must initialize deepspeed:
+NOTE: This requires manually changing the arguments below.
+================================================================
+CUDA_VISIBLE_DEVICES=0,1,2,3 python ./deepy.py tools/ckpts/convert_hf_to_sequential.py \
+    -d configs pythia/70M.yml local_setup.yml
+```
+### `convert_module_to_hf.py` 
+Converts a NeoX model with pipeline parallelism greater than 1 to a HuggingFace transformers `GPTNeoXForCausalLM` model
+
+Note that this script does not support all NeoX features.
+Please investigate carefully whether your model is compatible with all architectures supported by the GPTNeoXForCausalLM class in HF.
+
+(e.g. position embeddings such as AliBi may not be supported by Huggingface's GPT-NeoX architecture)
+
+```
+usage: convert_module_to_hf.py [-h] [--input_dir INPUT_DIR] [--config_file CONFIG_FILE] [--output_dir OUTPUT_DIR] [--upload]
+
+Merge MP partitions and convert to HF Model.
 
+options:
+  -h, --help            show this help message and exit
+  --input_dir INPUT_DIR
+                        Path to NeoX checkpoint, e.g. /path/to/model/global_step143000
+  --config_file CONFIG_FILE
+                        Path to config file for the input NeoX checkpoint.
+  --output_dir OUTPUT_DIR
+                        Output dir, where to save the HF Model, tokenizer, and configs
+  --upload              Set to true in order to upload to the HF Hub directly.
+```
 
+### `convert_sequential_to_hf.py` 
+Converts a NeoX model without pipeline parallelism to a HuggingFace transformers `GPTNeoXForCausalLM` model.
+
+```
+usage: convert_sequential_to_hf.py [-h] [--input_dir INPUT_DIR] [--config_file CONFIG_FILE] [--output_dir OUTPUT_DIR] [--upload]
+
+Merge MP partitions and convert to HF Model.
+
+options:
+  -h, --help            show this help message and exit
+  --input_dir INPUT_DIR
+                        Path to NeoX checkpoint, e.g. /path/to/model/global_step143000
+  --config_file CONFIG_FILE
+                        Path to config file for the input NeoX checkpoint.
+  --output_dir OUTPUT_DIR
+                        Output dir, where to save the HF Model, tokenizer, and configs
+  --upload              Set to true in order to upload to the HF Hub directly.
+```
+### `upload.py` 
+Uploads a _converted_ checkpoint to the HuggingFace hub.
+
+```
+python upload.py <converted-ckpt-dir> <repo-name> <branch-name>
+```
 ## NeoX-20B Scripts
 
-* `merge20b.py` reduces model and pipeline parallelism of a 20B checkpoint to 1 and 1.
+### `merge20b.py` 
+Reduces model and pipeline parallelism of a 20B checkpoint to 1 and 1.
+
+```
+usage: merge20b.py [-h] [--input_dir INPUT_DIR] [--output_dir OUTPUT_DIR]
 
+Merge 20B checkpoint.
+
+options:
+  -h, --help            show this help message and exit
+  --input_dir INPUT_DIR
+                        Checkpoint dir, which should contain (e.g. a folder named "global_step150000")
+  --output_dir OUTPUT_DIR
+                        Output dir, to save the 1-GPU weights configs
+```
 ## Llama Scripts
 
-* `convert_raw_llama_weights_to_neox.py` takes a Llama checkpoint and puts it into a NeoX-compatible format.
+### `convert_raw_llama_weights_to_neox.py` 
+Takes a Llama checkpoint and puts it into a NeoX-compatible format.
+
+```
+usage: convert_raw_llama_weights_to_neox.py [-h] [--input_dir INPUT_DIR] [--model_size {7B,13B,30B,65B,tokenizer_only}] [--output_dir OUTPUT_DIR] [--num_output_shards NUM_OUTPUT_SHARDS] [--pipeline_parallel]
+
+Convert raw LLaMA checkpoints to GPT-NeoX format.
+
+options:
+  -h, --help            show this help message and exit
+  --input_dir INPUT_DIR
+                        Location of LLaMA weights, which contains tokenizer.model and model folders
+  --model_size {7B,13B,30B,65B,tokenizer_only}
+  --output_dir OUTPUT_DIR
+                        Location to write GPT-NeoX mode
+  --num_output_shards NUM_OUTPUT_SHARDS
+  --pipeline_parallel   Only use if PP>1
+```