Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not override 'config.task_specific_params.pooling_method' when lauching wandb agent #108

Open
Rhett-Ying opened this issue May 24, 2024 · 3 comments

Comments

@Rhett-Ying
Copy link

Hi I am trying to train models on MIMIC-IV via following tutorial. But failed. please help on it. thanks.

Here're the commands I ran:

  1. clone ESGPT and MIMIC related repo.
  2. built dataset and task dataframes.
  3. under MIMICIV_FMs_public ./scripts/pretrain.sh +cohort_name=try240521_0927
  4. ./scripts/launch_hyperparameter_tuning.sh cohort_name=try240521_0927. ERROR https://github.com/mmcdermott/EventStreamGPT/blob/main/scripts/launch_wandb_hp_sweep.py does not exist any more. So I replace it with launch_pretraining_wandb_hp_sweep.py.
  5. ./scripts/launch_wandb_agent.sh --project MIMIC_FMs_public --entity xxxx cdby50ub

Step 5 failed with below error:

Could not override 'config.task_specific_params.pooling_method'.
To append to your config use +config.task_specific_params.pooling_method=max
Key 'pooling_method' is not in struct
    full_key: config.task_specific_params.pooling_method
    object_type=dict

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
2024-05-24 07:44:17,407 - wandb.wandb_agent - INFO - Running runs: ['cdby50ub']
2024-05-24 07:44:17,408 - wandb.wandb_agent - ERROR - Detected 3 failed runs in the first 60 seconds, shutting down.
@mmcdermott
Copy link
Owner

Thanks @Rhett-Ying -- apologies for the delay in response. I'll look into this soon -- don't hesitate to re-ping here as needed.

@Rhett-Ying
Copy link
Author

@mmcdermott do you have any findings for this issue? And one more question, how much GPU RAM is required to train or fine-tune the GPT model on my own? what kind of GPU and how much time it takes? A100?

@bnestor
Copy link
Collaborator

bnestor commented Nov 11, 2024

@Rhett-Ying I used launch_hyperparameter_tuning.sh instead.
In MIMIC_FMs_public/configs/hyperparameter_sweep.yaml I changed:

- hyperparameter_sweep_base
+ pretraining_hyperparameter_sweep_base

As you mentioned, I also changed MIMIC_FMs_public/scripts/launch_hyperparameter_tuning.sh:

- $EVENT_STREAM_PATH/scripts/launch_wandb_hp_sweep \
+ $EVENT_STREAM_PATH/scripts/launch_pretraining_wandb_hp_sweep.py \

My pretraining command was then: ./scripts/launch_hyperparameter_tuning.sh cohort_name=mimiciv_esgpt parameters.optimization_config.num_dataloader_workers.value=15
Followed by launching workers (one per 24GB RTX6000 GPU): srun -p <your_partition> --mem=48G --gres=gpu:1 -c 16 ./scripts/launch_wandb_agent.sh healthyml/MIMIC_FMS_public/<your_sweep_id>
Note, depending on the hyperparameter search, some were OOM, and others were fine. Surprisingly, these all finished within ~5 hours per run (median around 1.5 hours). It will depend upon your hyperparameter search config/number of parameters/learning rate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants