Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various bugs with ORPO #2105

Open
6 of 8 tasks
ccdv-ai opened this issue Nov 26, 2024 · 2 comments
Open
6 of 8 tasks

Various bugs with ORPO #2105

ccdv-ai opened this issue Nov 26, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@ccdv-ai
Copy link

ccdv-ai commented Nov 26, 2024

Please check that this issue hasn't been reported before.

  • I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

Training a model using ORPO and found some strange behaviors.

Current behaviour

  1. Mapping RL Dataset is slow and done multiple times (1 time per device). At some point the model is loaded and at the same time another mapping is done :
Mapping RL Dataset:  12%|█▏        | 61/500 [00:01<00:08, 51.17 examples/s]
Mapping RL Dataset:  13%|█▎        | 67/500 [00:01<00:08, 50.86 examples/s]
Loading checkpoint shards:  25%|██▌       | 1/4 [00:01<00:03,  1.27s/it]
Mapping RL Dataset:  15%|█▍        | 73/500 [00:01<00:08, 49.66 examples/s]
Mapping RL Dataset:  16%|█▌        | 79/500 [00:01<00:08, 49.02 examples/s]
  1. This behavior also apply multiple loras:
    [2024-11-26 10:26:55,772] [INFO] [peft.tuners.tuners_utils.__init__:171] [PID:2760612] Already found a `peft_config` attribute in the model. This will lead to having multiple adapters in the model. Make sure to know what you are doing! max_steps is given, it will override any value given in num_train_epochs
    (max_steps is empty in the config)
  2. The trainer fails to estimate the number of tokens:
    Could not estimate the number of tokens of the input, floating-point operations will not be computed
  3. warmup_ratio param doesn't work, I have to rely on warmup_steps
  4. System prompt is not used
  5. Logging is done every step

Steps to reproduce

base_model: Qwen/Qwen2.5-7B-Instruct
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
trust_remote_code: false
special_tokens:
  eos_token: "<|im_end|>"
  pad_token: "<|endoftext|>"

neftune_noise_alpha: 
gradient_checkpointing: unsloth

plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_swiglu: true
liger_fused_linear_cross_entropy: true
liger_cross_entropy: false

resume_from_checkpoint:
# If resume_from_checkpoint isn't set and you simply want it to start where it left off.
# Be careful with this being turned on between different models.
auto_resume_from_checkpoints: true

save_steps: 25
save_total_limit: 1
# Maximum number of iterations to train for. It precedes num_epochs which means that
# if both are set, num_epochs will not be guaranteed.
# e.g., when 1 epoch is 1000 steps => `num_epochs: 2` and `max_steps: 100` will train for 100 steps
max_steps: 

load_in_8bit: false
load_in_4bit: true
strict: true

# Saves the desired chat template to the tokenizer_config.json for easier inferencing
# Currently supports chatml and inst (mistral/mixtral)
chat_template: 

rl: orpo
orpo_alpha: 0.1
remove_unused_columns: false
dataset_num_proc: 8
datasets:
  - path: custom_dataset
    type: chat_template.argilla

# Changes the default system message
default_system_message: "You are a helpful assistant."

shuffle_merged_datasets: true
dataset_prepared_path:
val_set_size: 0.0
output_dir: qwen-7b-orpo/

sequence_len: 2048
sample_packing: false
pad_to_sequence_len: false
eval_sample_packing: false

adapter: lora
#loraplus_lr_ratio: 4
lora_r: 64
lora_alpha: 64
lora_dropout: 0.00
peft_use_rslora: false
#lora_target_modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
lora_target_linear: true


gradient_accumulation_steps: 4
micro_batch_size: 4
num_epochs: 1
optimizer: adamw_torch
lr_scheduler: cosine
cosine_min_lr_ratio: 0.5
learning_rate: 0.00001
max_grad_norm: 1

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 5
xformers_attention:
flash_attention: true

#warmup_ratio: 0.05
warmup_steps: 20
debug:

deepspeed: deepspeed_configs/ds_1.json
weight_decay: 0.0

Config yaml

No response

Possible solution

No response

Which Operating Systems are you using?

  • Linux
  • macOS
  • Windows

Python Version

3.11

axolotl branch-commit

last

Acknowledgements

  • My issue title is concise, descriptive, and in title casing.
  • I have searched the existing issues to make sure this bug has not been reported yet.
  • I am using the latest version of axolotl.
  • I have provided enough information for the maintainers to reproduce and diagnose the issue.
@ccdv-ai ccdv-ai added the bug Something isn't working label Nov 26, 2024
@ccdv-ai
Copy link
Author

ccdv-ai commented Nov 27, 2024

To improve the mapping speed (1.), .map() can be improved with the num_proc arg:

def map_dataset(cfg, data_set, ds_transform_fn, tokenizer):
sig = inspect.signature(ds_transform_fn)
if "tokenizer" in sig.parameters:
if not tokenizer:
tokenizer = load_tokenizer(cfg)
ds_transform_fn = partial(ds_transform_fn, tokenizer=tokenizer)
if isinstance(data_set, DatasetDict):
data_set = data_set["train"]
data_set = data_set.map(
ds_transform_fn,
desc="Mapping RL Dataset",
)
return data_set

to:

data_set = data_set.map( 
         ds_transform_fn, 
         desc="Mapping RL Dataset", 
         num_proc=os.cpu_count() # or whatever value
     )

However this change requires to install addict==2.3.0 instead of 2.4.0:
pip install addict==2.3.0

@NanoCode012
Copy link
Collaborator

Hey, thanks for pointing these out.

  1. Mapping RL Dataset: I just looked at found like you said, we forgot to pass the num_processes there

    data_set = data_set.map(
    ds_transform_fn,
    desc="Mapping RL Dataset",
    )

  2. a. I'm unsure about the peft config issue. Will look into this.
    b. We set the max_steps as so and it should not cut your training short.

    max_steps=self.cfg.max_steps or total_num_steps,

  3. Will need to look into this.

  4. We didn't config warmup_ratio for HF RL Trainer

    def build_training_arguments(self, total_num_steps):

  5. For this, you did set default_system_message and we didn't properly handle it. I'm not sure how we can apply this to every template. Currently, we handle it for just chatml by hardcoding

    if cfg.default_system_message and cfg.chat_template == "chatml":
    chat_template_string = chat_template_string.replace(
    "You are a helpful assistant.", cfg.default_system_message
    )

  6. It seems to be hardcoded to 1 instead of reading from cfg

I'll create a PR to address these.

Btw, this dataset_num_proc: 8 config should be dataset_processes: 8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants