Issues on H2O benchmark performance #4

fantasysee · 2024-10-23T00:16:31Z

Hello,

I tried to modify the pipeline_config of h2o on narrativeqa dataset in longbench using the llama3-8b-instruct model.

The first experiment is that I add a 1x_heavy.json that force heavy_ratio = 1.0 and recent_ratio = 0.0, as the following:

{
    "pipeline_params": {
        "method": "h2o_longbench",
        "model_name": "./LLMs/Meta-Llama-3-8B-Instruct",
        "tokenizer_name": "./LLMs/Meta-Llama-3-8B-Instruct",
        "chat_template": "llama3",
        "model_max_len": 7500,
        "use_flash_attn": true,
        "truncation_mode": "middle",
        "batch_size": 1,
        "out_of_max_len_allowed": true,
        "rope_theta_factor": 1.0,
        "heavy_ratio": 1.0,
        "recent_ratio": 0.0
    }
}

The second experiment is that add a 1x_recent.json that force heavy_ratio = 0.0 and recent_ratio = 1.0, as the following:

{
    "pipeline_params": {
        "method": "h2o_longbench",
        "model_name": "./LLMs/Meta-Llama-3-8B-Instruct",
        "tokenizer_name": "./LLMs/Meta-Llama-3-8B-Instruct",
        "chat_template": "llama3",
        "model_max_len": 7500,
        "use_flash_attn": true,
        "truncation_mode": "middle",
        "batch_size": 1,
        "out_of_max_len_allowed": true,
        "rope_theta_factor": 1.0,
        "heavy_ratio": 0.0,
        "recent_ratio": 1.0
    }
}

If I understood correctly, the results of these two experiments should be the same, equal to the baseline.

The baseline we got "qa_f1_score": 21.71. The first one we got "qa_f1_score": 21.71, while the second one we got "qa_f1_score": 19.6.

Could you please let me know if the designed experiment is correct or if some reasons cause this difference?

Regards!
Chao

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues on H2O benchmark performance #4

Issues on H2O benchmark performance #4

fantasysee commented Oct 23, 2024 •

edited

Loading

Issues on H2O benchmark performance #4

Issues on H2O benchmark performance #4

Comments

fantasysee commented Oct 23, 2024 • edited Loading

fantasysee commented Oct 23, 2024 •

edited

Loading