the Difference of Memory Usage between Llama-factory and Transformer Trainer #6435
Closed
1 task done
Labels
solved
This problem has been already solved
Reminder
System Info
llamafactory
version: 0.9.2.dev0Reproduction
I used Llama-factory and Transformer Trainer to fine-tune Qwen2.5 using LoRA in the same configuration, on my own dataset.
model
model_name_or_path: Qwen2.5-7B-Instruct
method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all
dataset
dataset: my_dataset
template: qwen
cutoff_len: 1024
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
output
output_dir: ./
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
train
per_device_train_batch_size: 2
gradient_accumulation_steps: 8
learning_rate: 1.0e-5
num_train_epochs: 3.0
lr_scheduler_type: cosine
max_grad_norm: 1.0
optim: adamw_torch
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
Transformers Trainer:
model_inputs = tokenizer(
prompt,
max_length=1024,
truncation=True,
padding="max_length",
)
peft_config = LoraConfig(
inference_mode=False,
lora_alpha=16,
lora_dropout=0.1,
target_modules=['q_proj', 'k_proj', "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
r=8,
bias="none",
task_type="CAUSAL_LM",
)
model = get_peft_model(model, peft_config)
model = model.to(torch.bfloat16)
TrainingArguments
training_args = TrainingArguments(
output_dir=OUTPUT_DIR,
logging_dir=f"{OUTPUT_DIR}/logs",
num_train_epochs=3,
per_device_train_batch_size=2,
gradient_accumulation_steps=8,
max_grad_norm=1.0,
lr_scheduler_type="cosine",
learning_rate=2e-5,
warmup_ratio=0.1,
bf16=True,
save_steps=100,
logging_steps=50,
save_strategy="epoch",
prediction_loss_only=True,
remove_unused_columns=False,
optim="adamw_torch",
save_safetensors=False,
)
train
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_dataset,
)
I didn't use any acceleration method like sparse-attn. (I didn't install these libraries in my environment...)
However, when I used llama factory, the memory usage was about 27GB, while the mem usage for Transformer Trainer was about 35GB. The cut_off len and fine-tuning configuration are same, I just would like to know how does llama-factory reduce memory usage??
Expected behavior
use similar memory between llama-factory and transformers trainer.
Others
No response
The text was updated successfully, but these errors were encountered: