Could not override 'config.task_specific_params.pooling_method' when lauching wandb agent #108

Rhett-Ying · 2024-05-24T08:03:42Z

Hi I am trying to train models on MIMIC-IV via following tutorial. But failed. please help on it. thanks.

Here're the commands I ran:

clone ESGPT and MIMIC related repo.
built dataset and task dataframes.
under MIMICIV_FMs_public ./scripts/pretrain.sh +cohort_name=try240521_0927
./scripts/launch_hyperparameter_tuning.sh cohort_name=try240521_0927. ERROR https://github.com/mmcdermott/EventStreamGPT/blob/main/scripts/launch_wandb_hp_sweep.py does not exist any more. So I replace it with launch_pretraining_wandb_hp_sweep.py.
./scripts/launch_wandb_agent.sh --project MIMIC_FMs_public --entity xxxx cdby50ub

Step 5 failed with below error:

Could not override 'config.task_specific_params.pooling_method'.
To append to your config use +config.task_specific_params.pooling_method=max
Key 'pooling_method' is not in struct
    full_key: config.task_specific_params.pooling_method
    object_type=dict

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
2024-05-24 07:44:17,407 - wandb.wandb_agent - INFO - Running runs: ['cdby50ub']
2024-05-24 07:44:17,408 - wandb.wandb_agent - ERROR - Detected 3 failed runs in the first 60 seconds, shutting down.

The text was updated successfully, but these errors were encountered:

mmcdermott · 2024-06-17T13:54:15Z

Thanks @Rhett-Ying -- apologies for the delay in response. I'll look into this soon -- don't hesitate to re-ping here as needed.

Rhett-Ying · 2024-06-28T01:22:46Z

@mmcdermott do you have any findings for this issue? And one more question, how much GPU RAM is required to train or fine-tune the GPT model on my own? what kind of GPU and how much time it takes? A100?

bnestor · 2024-11-11T21:50:09Z

@Rhett-Ying I used launch_hyperparameter_tuning.sh instead.
In MIMIC_FMs_public/configs/hyperparameter_sweep.yaml I changed:

- hyperparameter_sweep_base
+ pretraining_hyperparameter_sweep_base

As you mentioned, I also changed MIMIC_FMs_public/scripts/launch_hyperparameter_tuning.sh:

- $EVENT_STREAM_PATH/scripts/launch_wandb_hp_sweep \
+ $EVENT_STREAM_PATH/scripts/launch_pretraining_wandb_hp_sweep.py \

My pretraining command was then: ./scripts/launch_hyperparameter_tuning.sh cohort_name=mimiciv_esgpt parameters.optimization_config.num_dataloader_workers.value=15
Followed by launching workers (one per 24GB RTX6000 GPU): srun -p <your_partition> --mem=48G --gres=gpu:1 -c 16 ./scripts/launch_wandb_agent.sh healthyml/MIMIC_FMS_public/<your_sweep_id>
Note, depending on the hyperparameter search, some were OOM, and others were fine. Surprisingly, these all finished within ~5 hours per run (median around 1.5 hours). It will depend upon your hyperparameter search config/number of parameters/learning rate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could not override 'config.task_specific_params.pooling_method' when lauching wandb agent #108

Could not override 'config.task_specific_params.pooling_method' when lauching wandb agent #108

Rhett-Ying commented May 24, 2024

mmcdermott commented Jun 17, 2024

Rhett-Ying commented Jun 28, 2024

bnestor commented Nov 11, 2024

Could not override 'config.task_specific_params.pooling_method' when lauching wandb agent #108

Could not override 'config.task_specific_params.pooling_method' when lauching wandb agent #108

Comments

Rhett-Ying commented May 24, 2024

mmcdermott commented Jun 17, 2024

Rhett-Ying commented Jun 28, 2024

bnestor commented Nov 11, 2024