Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When I used galore on orpo, the learning rate was set to 8e-6, but the training rate was 0.01 #46

Open
Minami-su opened this issue May 10, 2024 · 1 comment

Comments

@Minami-su
Copy link

trainer = ORPOTrainer(
        model=model,
        train_dataset=dataset["train"],
        eval_dataset=dataset["test"],
        
        #peft_config=peft_config,
        tokenizer=tokenizer,
        args= ORPOConfig(
            max_length=cutoff_len,
            max_prompt_length=cutoff_len//2,
            beta=0.1,
            per_device_train_batch_size=micro_batch_size,
            gradient_accumulation_steps=gradient_accumulation_steps,
            warmup_steps=0,
            num_train_epochs=num_epochs,
            lr_scheduler_type="cosine",
            learning_rate=8e-6,
            bf16=True,
            logging_steps=10,
            optim = "galore_adamw_8bit_layerwise",
            optim_target_modules=[r".*attn.*", r".*mlp.*"],
            optim_args="rank=1024, update_proj_gap=500, scale=0.25",
            evaluation_strategy="steps" if val_set_size > 0 else "no",
            save_strategy="steps",
            eval_steps=100 if val_set_size > 0 else None,
            save_steps=100,
            output_dir=output_dir,
            save_total_limit=2,
            gradient_checkpointing=True, 
            gradient_checkpointing_kwargs={'use_reentrant':True},
            load_best_model_at_end=True if val_set_size > 0 else False,
            ddp_find_unused_parameters=False if ddp else None,
            report_to="wandb" if use_wandb else None,
            run_name=wandb_run_name if use_wandb else None,
            do_train=True,
            remove_unused_columns=False,
        )
    )


Activated GaLoRE fine-tuning, depending on your model size and hardware, the training might take a while before starting. Please be patient !
  0%|                                                                                                                                     | 0/495 [00:00<?, ?it/s]Could not estimate the number of tokens of the input, floating-point operations will not be computed
{'loss': 0.3557, 'grad_norm': 0.0, 'learning_rate': 0.001, 'rewards/chosen': -0.015678538009524345, 'rewards/rejected': -0.012379011139273643, 'rewards/accuracies': 0.19999998807907104, 'rewards/margins': -0.003299527335911989, 'logps/rejected': -0.12379010766744614, 'logps/chosen': -0.15678536891937256, 'logits/rejected': 0.7921055555343628, 'logits/chosen': 0.791210412979126, 'nll_loss': 0.2719877064228058, 'log_odds_ratio': -0.8374900817871094, 'log_odds_chosen': -0.25091928243637085, 'epoch': 0.06}
{'loss': 0.2634, 'grad_norm': 0.0, 'learning_rate': 0.001, 'rewards/chosen': -0.012010233476758003, 'rewards/rejected': -0.009977776557207108, 'rewards/accuracies': 0.29999998211860657, 'rewards/margins': -0.0020324576180428267, 'logps/rejected': -0.09977775812149048, 'logps/chosen': -0.12010233104228973, 'logits/rejected': 0.7489851713180542, 'logits/chosen': 0.7482139468193054, 'nll_loss': 0.1832979917526245, 'log_odds_ratio': -0.8010236620903015, 'log_odds_chosen': -0.16869042813777924, 'epoch': 0.12}
{'loss': 0.2482, 'grad_norm': 0.0, 'learning_rate': 0.001, 'rewards/chosen': -0.011346157640218735, 'rewards/rejected': -0.01022450439631939, 'rewards/accuracies': 0.4833333492279053, 'rewards/margins': -0.0011216530110687017, 'logps/rejected': -0.102245032787323, 'logps/chosen': -0.11346157640218735, 'logits/rejected': 0.7105721831321716, 'logits/chosen': 0.7108334898948669, 'nll_loss': 0.17242279648780823, 'log_odds_ratio': -0.7573043704032898, 'log_odds_chosen': -0.08471358567476273, 'epoch': 0.18}
{'loss': 0.2444, 'grad_norm': 0.0, 'learning_rate': 0.001, 'rewards/chosen': -0.012975988909602165, 'rewards/rejected': -0.013058923184871674, 'rewards/accuracies': 0.550000011920929, 'rewards/margins': 8.293241262435913e-05, 'logps/rejected': -0.13058921694755554, 'logps/chosen': -0.12975989282131195, 'logits/rejected': 0.6808757781982422, 'logits/chosen': 0.6832461953163147, 'nll_loss': 0.1756206750869751, 'log_odds_ratio': -0.687309741973877, 'log_odds_chosen': 0.04155167192220688, 'epoch': 0.24}
@Minami-su
Copy link
Author

I guess because it's not trl's orpo.

@Minami-su Minami-su reopened this Jun 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant