Skip to content

Commit

Permalink
Merge pull request #218 from kelbrown20/update-train-readme
Browse files Browse the repository at this point in the history
Docs: Update training README
  • Loading branch information
mergify[bot] authored Sep 26, 2024
2 parents dfe4cc3 + d5b1ee3 commit 7bc49bb
Showing 1 changed file with 114 additions and 92 deletions.
206 changes: 114 additions & 92 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,43 +5,56 @@
![Release](https://img.shields.io/github/v/release/instructlab/training)
![License](https://img.shields.io/github/license/instructlab/training)

In order to simplify the process of fine-tuning models through the LAB
method, this library provides a simple training interface.
- [Installing](#installing-the-library)
- [Additional Nvidia packages](#additional-nvidia-packages)
- [Using the library](#using-the-library)
- [Learning about the training arguments](#learning-about-training-arguments)
- [`TrainingArgs`](#trainingargs)
- [`DeepSpeedOptions`](#deepspeedoptions)
- [`FSDPOptions`](#fsdpoptions)
- [`loraOptions`](#loraoptions)
- [Learning about `TorchrunArgs` arguments](#learning-about-torchrunargs-arguments)
- [Example training run with arguments](#example-training-run-with-arguments)

## Installation
To simplify the process of fine-tuning models with the [LAB
method](https://arxiv.org/abs/2403.01081), this library provides a simple training interface.

To get started with the library, you must clone this repo and install it from source via `pip`:
## Installing the library

```bash
# clone the repo and switch to the directory
git clone https://github.com/instructlab/training
cd training
To get started with the library, you must clone this repository and install it via `pip`.

Install the library:

# install the library
pip install .
```bash
pip install instructlab-training
```

For development, install it instead with `pip install -e .` instead
to make local changes while using this library elsewhere.
You can then install the library for development:

### Installing Additional NVIDIA packages
```bash
pip install -e ./training
```

We make use of `flash-attn` and other packages which rely on NVIDIA-specific
CUDA tooling to be installed.
### Additional NVIDIA packages

If you are using NVIDIA hardware with CUDA, please install the additional dependencies via:
This library uses the `flash-attn` package as well as other packages, which rely on NVIDIA-specific CUDA tooling to be installed.
If you are using NVIDIA hardware with CUDA, you need to install the following additional dependencies.

Basic install

```bash
# for a regular install
pip install .[cuda]
```

# or, for an editable install (development)
Editable install (development)

```bash
pip install -e .[cuda]
```

## Usage
## Using the library

Using the library is fairly straightforward, import the necessary items,
You can utilize this training library by importing the necessary items.

```py
from instructlab.training import (
Expand All @@ -52,65 +65,18 @@ from instructlab.training import (
)
```

Then, define the training arguments which will serve as the
parameters for our training run:
You can then define various training arguments. They will serve as the parameters for your training runs. See:

```py
# define training-specific arguments
training_args = TrainingArgs(
# define data-specific arguments
model_path = "ibm-granite/granite-7b-base",
data_path = "path/to/dataset.jsonl",
ckpt_output_dir = "data/saved_checkpoints",
data_output_dir = "data/outputs",

# define model-trianing parameters
max_seq_len = 4096,
max_batch_len = 60000,
num_epochs = 10,
effective_batch_size = 3840,
save_samples = 250000,
learning_rate = 2e-6,
warmup_steps = 800,
is_padding_free = True, # set this to true when using Granite-based models
random_seed = 42,
)
```
- [Learning about the training argument](#learning-about-training-arguments)
- [Example training run with arguments](#example-training-run-with-arguments)

We'll also need to define the settings for running a multi-process job
via `torchrun`. To do this, create a `TorchrunArgs` object.

> [!TIP]
> Note, for single-GPU jobs, you can simply set `nnodes = 1` and `nproc_per_node=1`.
```py
torchrun_args = TorchrunArgs(
nnodes = 1, # number of machines
nproc_per_node = 8, # num GPUs per machine
node_rank = 0, # node rank for this machine
rdzv_id = 123,
rdzv_endpoint = '127.0.0.1:12345'
)
```

Finally, you can just call `run_training` and this library will handle
the rest 🙂.

```py
run_training(
torchrun_args=torchrun_args,
training_args=training_args,
)

```

### Customizing `TrainingArgs`
## Learning about training arguments

The `TrainingArgs` class provides most of the customization options
for the training job itself. There are a number of options you can specify, such as setting
DeepSpeed config values or running a LoRA training job instead of a full fine-tune.
for training jobs. There are a number of options you can specify, such as setting
`DeepSpeed` config values or running a `LoRA` training job instead of a full fine-tune.

Here is a breakdown of the general options:
### `TrainingArgs`

| Field | Description |
| --- | --- |
Expand All @@ -137,9 +103,9 @@ Here is a breakdown of the general options:
| distributed_backend | Specifies which distributed training backend to use. Supported options are "fsdp" and "deepspeed". |
| disable_flash_attn | Disables flash attention when set to true. This allows for training on older devices. |

#### `DeepSpeedOptions`
### `DeepSpeedOptions`

We only currently support a few options in `DeepSpeedOptions`:
This library only currently support a few options in `DeepSpeedOptions`:
The default is to run with DeepSpeed, so these options only currently
allow you to customize aspects of the ZeRO stage 2 optimizer.

Expand All @@ -150,6 +116,8 @@ allow you to customize aspects of the ZeRO stage 2 optimizer.
| cpu_offload_optimizer_pin_memory | If true, offload to page-locked CPU memory. This could boost throughput at the cost of extra memory overhead. |
| save_samples | The number of samples to see before saving a DeepSpeed checkpoint. |

For more information about DeepSpeed, see [deepspeed.ai](https://www.deepspeed.ai/)

#### `FSDPOptions`

Like DeepSpeed, we only expose a number of parameters for you to modify with FSDP.
Expand All @@ -162,8 +130,19 @@ They are listed below:

> [!NOTE]
> For `sharding_strategy` - Only `SHARD_GRAD_OP` has been extensively tested and is actively supported by this library.
### `loraOptions`

Check failure on line 133 in README.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Headings should be surrounded by blank lines

README.md:133 MD022/blanks-around-headings Headings should be surrounded by blank lines [Expected: 1; Actual: 0; Above] [Context: "### `loraOptions`"] https://github.com/DavidAnson/markdownlint/blob/v0.35.0/doc/md022.md

#### `LoraOptions`
LoRA options currently supported:

| Field | Description |
| --- | --- |
| rank | The rank parameter for LoRA training. |
| alpha | The alpha parameter for LoRA training. |
| dropout | The dropout rate for LoRA training. |
| target_modules | The list of target modules for LoRA training. |
| quantize_data_type | The data type for quantization in LoRA training. Valid options are `None` and `"nf4"` |

#### Example run with LoRa options

If you'd like to do a LoRA train, you can specify a LoRA
option to `TrainingArgs` via the `LoraOptions` object.
Expand All @@ -181,23 +160,12 @@ training_args = TrainingArgs(
)
```

Here is the definition for what we currently support today:

| Field | Description |
| --- | --- |
| rank | The rank parameter for LoRA training. |
| alpha | The alpha parameter for LoRA training. |
| dropout | The dropout rate for LoRA training. |
| target_modules | The list of target modules for LoRA training. |
| quantize_data_type | The data type for quantization in LoRA training. Valid options are `None` and `"nf4"` |

### Customizing `TorchrunArgs`
### Learning about `TorchrunArgs` arguments

When running the training script, we always invoke `torchrun`.

If you are running a single-GPU system or something that doesn't
otherwise require distributed training configuration, you can
just create a default object:
otherwise require distributed training configuration, you can create a default object:

```python
run_training(
Expand All @@ -209,12 +177,14 @@ run_training(
```

However, if you want to specify a more complex configuration,
we currently expose all of the options that [torchrun accepts
the library currently supports all the options that [torchrun accepts
today](https://pytorch.org/docs/stable/elastic/run.html#definitions).

> ![NOTE]
> [!NOTE]
> For more information about the `torchrun` arguments, please consult the [torchrun documentation](https://pytorch.org/docs/stable/elastic/run.html#definitions).
#### Example training run with `TorchrunArgs` arguments

For example, in a 8-GPU, 2-machine system, we would
specify the following torchrun config:

Expand Down Expand Up @@ -257,3 +227,55 @@ run_training(
train_args=training_args
)
```

## Example training run with arguments

Define the training arguments which will serve as the
parameters for our training run:

```py
# define training-specific arguments
training_args = TrainingArgs(
# define data-specific arguments
model_path = "ibm-granite/granite-7b-base",
data_path = "path/to/dataset.jsonl",
ckpt_output_dir = "data/saved_checkpoints",
data_output_dir = "data/outputs",

# define model-trianing parameters
max_seq_len = 4096,
max_batch_len = 60000,
num_epochs = 10,
effective_batch_size = 3840,
save_samples = 250000,
learning_rate = 2e-6,
warmup_steps = 800,
is_padding_free = True, # set this to true when using Granite-based models
random_seed = 42,
)
```

We'll also need to define the settings for running a multi-process job
via `torchrun`. To do this, create a `TorchrunArgs` object.

> [!TIP]
> Note, for single-GPU jobs, you can simply set `nnodes = 1` and `nproc_per_node=1`.
```py
torchrun_args = TorchrunArgs(
nnodes = 1, # number of machines
nproc_per_node = 8, # num GPUs per machine
node_rank = 0, # node rank for this machine
rdzv_id = 123,
rdzv_endpoint = '127.0.0.1:12345'
)
```

Finally, you can just call `run_training` and this library will handle
the rest 🙂.

```py
run_training(
torchrun_args=torchrun_args,
training_args=training_args,
)

0 comments on commit 7bc49bb

Please sign in to comment.