Various bugs with ORPO #2105

ccdv-ai · 2024-11-26T10:47:42Z

Please check that this issue hasn't been reported before.

I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

Training a model using ORPO and found some strange behaviors.

Current behaviour

Mapping RL Dataset is slow and done multiple times (1 time per device). At some point the model is loaded and at the same time another mapping is done :

Mapping RL Dataset:  12%|█▏        | 61/500 [00:01<00:08, 51.17 examples/s]
Mapping RL Dataset:  13%|█▎        | 67/500 [00:01<00:08, 50.86 examples/s]
Loading checkpoint shards:  25%|██▌       | 1/4 [00:01<00:03,  1.27s/it]
Mapping RL Dataset:  15%|█▍        | 73/500 [00:01<00:08, 49.66 examples/s]
Mapping RL Dataset:  16%|█▌        | 79/500 [00:01<00:08, 49.02 examples/s]

This behavior also apply multiple loras:
[2024-11-26 10:26:55,772] [INFO] [peft.tuners.tuners_utils.__init__:171] [PID:2760612] Already found a `peft_config` attribute in the model. This will lead to having multiple adapters in the model. Make sure to know what you are doing! max_steps is given, it will override any value given in num_train_epochs
(max_steps is empty in the config)
The trainer fails to estimate the number of tokens:
Could not estimate the number of tokens of the input, floating-point operations will not be computed
warmup_ratio param doesn't work, I have to rely on warmup_steps
System prompt is not used
Logging is done every step

Steps to reproduce

base_model: Qwen/Qwen2.5-7B-Instruct
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
trust_remote_code: false
special_tokens:
  eos_token: "<|im_end|>"
  pad_token: "<|endoftext|>"

neftune_noise_alpha: 
gradient_checkpointing: unsloth

plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_swiglu: true
liger_fused_linear_cross_entropy: true
liger_cross_entropy: false

resume_from_checkpoint:
# If resume_from_checkpoint isn't set and you simply want it to start where it left off.
# Be careful with this being turned on between different models.
auto_resume_from_checkpoints: true

save_steps: 25
save_total_limit: 1
# Maximum number of iterations to train for. It precedes num_epochs which means that
# if both are set, num_epochs will not be guaranteed.
# e.g., when 1 epoch is 1000 steps => `num_epochs: 2` and `max_steps: 100` will train for 100 steps
max_steps: 

load_in_8bit: false
load_in_4bit: true
strict: true

# Saves the desired chat template to the tokenizer_config.json for easier inferencing
# Currently supports chatml and inst (mistral/mixtral)
chat_template: 

rl: orpo
orpo_alpha: 0.1
remove_unused_columns: false
dataset_num_proc: 8
datasets:
  - path: custom_dataset
    type: chat_template.argilla

# Changes the default system message
default_system_message: "You are a helpful assistant."

shuffle_merged_datasets: true
dataset_prepared_path:
val_set_size: 0.0
output_dir: qwen-7b-orpo/

sequence_len: 2048
sample_packing: false
pad_to_sequence_len: false
eval_sample_packing: false

adapter: lora
#loraplus_lr_ratio: 4
lora_r: 64
lora_alpha: 64
lora_dropout: 0.00
peft_use_rslora: false
#lora_target_modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
lora_target_linear: true


gradient_accumulation_steps: 4
micro_batch_size: 4
num_epochs: 1
optimizer: adamw_torch
lr_scheduler: cosine
cosine_min_lr_ratio: 0.5
learning_rate: 0.00001
max_grad_norm: 1

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 5
xformers_attention:
flash_attention: true

#warmup_ratio: 0.05
warmup_steps: 20
debug:

deepspeed: deepspeed_configs/ds_1.json
weight_decay: 0.0

Config yaml

No response

Possible solution

No response

Which Operating Systems are you using?

Linux
macOS
Windows

Python Version

3.11

axolotl branch-commit

last

Acknowledgements

My issue title is concise, descriptive, and in title casing.
I have searched the existing issues to make sure this bug has not been reported yet.
I am using the latest version of axolotl.
I have provided enough information for the maintainers to reproduce and diagnose the issue.

The text was updated successfully, but these errors were encountered:

ccdv-ai · 2024-11-27T18:48:58Z

To improve the mapping speed (1.), .map() can be improved with the num_proc arg:

axolotl/src/axolotl/utils/data/rl.py

Lines 60 to 75 in 724b660

    
           def map_dataset(cfg, data_set, ds_transform_fn, tokenizer): 
        
               sig = inspect.signature(ds_transform_fn) 
        
               if "tokenizer" in sig.parameters: 
        
                   if not tokenizer: 
        
                       tokenizer = load_tokenizer(cfg) 
        
                   ds_transform_fn = partial(ds_transform_fn, tokenizer=tokenizer) 
        
               if isinstance(data_set, DatasetDict): 
        
                   data_set = data_set["train"] 
        
               data_set = data_set.map( 
        
                   ds_transform_fn, 
        
                   desc="Mapping RL Dataset", 
        
               ) 
        
               return data_set

to:

data_set = data_set.map( 
         ds_transform_fn, 
         desc="Mapping RL Dataset", 
         num_proc=os.cpu_count() # or whatever value
     )

However this change requires to install addict==2.3.0 instead of 2.4.0:
pip install addict==2.3.0

NanoCode012 · 2024-12-05T17:50:20Z

Hey, thanks for pointing these out.

Mapping RL Dataset: I just looked at found like you said, we forgot to pass the num_processes there

axolotl/src/axolotl/utils/data/rl.py

Lines 70 to 73 in d7d2fd3

    
           data_set = data_set.map( 
        
               ds_transform_fn, 
        
               desc="Mapping RL Dataset", 
        
           )

a. I'm unsure about the peft config issue. Will look into this.
b. We set the max_steps as so and it should not cut your training short.

axolotl/src/axolotl/core/trainer_builder.py

Line 2063 in d7d2fd3

max_steps=self.cfg.max_steps or total_num_steps,
Will need to look into this.
We didn't config warmup_ratio for HF RL Trainer

axolotl/src/axolotl/core/trainer_builder.py

Line 1925 in d7d2fd3

def build_training_arguments(self, total_num_steps):

For this, you did set default_system_message and we didn't properly handle it. I'm not sure how we can apply this to every template. Currently, we handle it for just chatml by hardcoding

axolotl/src/axolotl/utils/models.py

Lines 306 to 309 in d7d2fd3

    
           if cfg.default_system_message and cfg.chat_template == "chatml": 
        
               chat_template_string = chat_template_string.replace( 
        
                   "You are a helpful assistant.", cfg.default_system_message 
        
               )

It seems to be hardcoded to 1 instead of reading from cfg

axolotl/src/axolotl/core/trainer_builder.py

Line 2068 in d7d2fd3

logging_steps=1,

I'll create a PR to address these.

Btw, this dataset_num_proc: 8 config should be dataset_processes: 8

ccdv-ai added the bug Something isn't working label Nov 26, 2024

NanoCode012 mentioned this issue Dec 6, 2024

Fix: RL base feature parity #2133

Draft

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various bugs with ORPO #2105

Various bugs with ORPO #2105

ccdv-ai commented Nov 26, 2024

ccdv-ai commented Nov 27, 2024

NanoCode012 commented Dec 5, 2024

Various bugs with ORPO #2105

Various bugs with ORPO #2105

Comments

ccdv-ai commented Nov 26, 2024

Please check that this issue hasn't been reported before.

Expected Behavior

Current behaviour

Steps to reproduce

Config yaml

Possible solution

Which Operating Systems are you using?

Python Version

axolotl branch-commit

Acknowledgements

ccdv-ai commented Nov 27, 2024

NanoCode012 commented Dec 5, 2024