Replies: 1 comment
-
I managed to solve my problem with the generation stop differently. Despite the logs: I decided to check which tokens the pipeline is using. BOS_TOKEN: 128257 After that, I added special tokens to the .yml file: And it worked. Both during inference and training. |
Beta Was this translation helpful? Give feedback.
-
Hello Axolotl Community,
I am facing an issue with the Llama-3 model after training. While it generates good text, it does so uncontrollably, stopping the generation when the context window ends.
I have used Axolotl for training several times and have a well-established pipeline, but it has stopped working.
I have checked the data after preprocessing, and the token <|eot_id|> is in place. I have tried training both with the parameter:
special_tokens:
pad_token: <|end_of_text|>
and without it.
I am using a pre-trained (fine-tuned) Model A, which has a special_token_map.json.
{
"bos_token": {
"content": "<|begin_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|end_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|end_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}
I am ruling out inference issues since my pre-trained Model A works as expected. Additionally, there is a fine-tuned Model B on top of Model A, which I trained 14 days ago, and it also works correctly.
Today, I also tried running inference in the open model Docker container:
(python -m axolotl.cli.inference config.yml --gradio)
with the following settings:
base_model: "mlabonne/NeuralDaredevil-8B-abliterated"
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
load_in_8bit: false
load_in_4bit: false
strict: false
chat_template: llama3
datasets:
type: chat_template
chat_template: llama3
field_messages: conversation
message_field_role: role
message_field_content: content
roles:
user:
- user
assistant:
- assistant
dataset_prepared_path: last_run_prepared
val_set_size: 0.05
output_dir: ./outputs/out
sequence_len: 4096
sample_packing: false
pad_to_sequence_len: true
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 16
micro_batch_size: 4
num_epochs: 2
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 2e-5
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 100
evals_per_epoch: 2
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
pad_token: <|end_of_text|>
The behavior remains the same - the model generates text until the context window ends.
I would appreciate any guidance or suggestions to resolve this issue.
Beta Was this translation helpful? Give feedback.
All reactions