Issue with Llama-3 Model Generating Text #1812

sklyar61 · 2024-08-06T17:03:51Z

sklyar61
Aug 6, 2024

Hello Axolotl Community,

I am facing an issue with the Llama-3 model after training. While it generates good text, it does so uncontrollably, stopping the generation when the context window ends.

I have used Axolotl for training several times and have a well-established pipeline, but it has stopped working.

I have checked the data after preprocessing, and the token <|eot_id|> is in place. I have tried training both with the parameter:
special_tokens:
pad_token: <|end_of_text|>

and without it.

I am using a pre-trained (fine-tuned) Model A, which has a special_token_map.json.

{
"bos_token": {
"content": "<|begin_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|end_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|end_of_text|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

I am ruling out inference issues since my pre-trained Model A works as expected. Additionally, there is a fine-tuned Model B on top of Model A, which I trained 14 days ago, and it also works correctly.

Today, I also tried running inference in the open model Docker container:
(python -m axolotl.cli.inference config.yml --gradio)
with the following settings:

base_model: "mlabonne/NeuralDaredevil-8B-abliterated"
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: false
strict: false

chat_template: llama3
datasets:

path: "/home/homepath"
type: chat_template
chat_template: llama3
field_messages: conversation
message_field_role: role
message_field_content: content
roles:
user:
- user
assistant:
- assistant

dataset_prepared_path: last_run_prepared
val_set_size: 0.05
output_dir: ./outputs/out

sequence_len: 4096
sample_packing: false
pad_to_sequence_len: true

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 16
micro_batch_size: 4
num_epochs: 2
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 2e-5

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 100
evals_per_epoch: 2
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
pad_token: <|end_of_text|>

The behavior remains the same - the model generates text until the context window ends.

I would appreciate any guidance or suggestions to resolve this issue.

sklyar61 · 2024-08-07T13:10:08Z

sklyar61
Aug 7, 2024
Author

I managed to solve my problem with the generation stop differently.

Despite the logs:
[2024-08-07 06:49:05,699] [DEBUG] [axolotl.load_tokenizer:282] [PID:5667] [RANK:0] EOS: 128001 / <|end_of_text|>
[2024-08-07 06:49:05,699] [DEBUG] [axolotl.load_tokenizer:283] [PID:5667] [RANK:0] BOS: 128000 / <|begin_of_text|>
[2024-08-07 06:49:05,699] [DEBUG] [axolotl.load_tokenizer:284] [PID:5667] [RANK:0] PAD: 128001 / <|end_of_text|>
[2024-08-07 06:49:05,699] [DEBUG] [axolotl.load_tokenizer:285] [PID:5667] [RANK:0] UNK: None / None

I decided to check which tokens the pipeline is using.
In the end, I saw the following tokens:

BOS_TOKEN: 128257
EOS_TOKEN: 128258
PAD_TOKEN: 128001

And it worked. Both during inference and training.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with Llama-3 Model Generating Text #1812

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Issue with Llama-3 Model Generating Text #1812

sklyar61 Aug 6, 2024

Replies: 1 comment

sklyar61 Aug 7, 2024 Author

sklyar61
Aug 6, 2024

sklyar61
Aug 7, 2024
Author