You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Could not override 'config.task_specific_params.pooling_method'.
To append to your config use +config.task_specific_params.pooling_method=max
Key 'pooling_method' is not in struct
full_key: config.task_specific_params.pooling_method
object_type=dict
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
2024-05-24 07:44:17,407 - wandb.wandb_agent - INFO - Running runs: ['cdby50ub']
2024-05-24 07:44:17,408 - wandb.wandb_agent - ERROR - Detected 3 failed runs in the first 60 seconds, shutting down.
The text was updated successfully, but these errors were encountered:
@mmcdermott do you have any findings for this issue? And one more question, how much GPU RAM is required to train or fine-tune the GPT model on my own? what kind of GPU and how much time it takes? A100?
My pretraining command was then: ./scripts/launch_hyperparameter_tuning.sh cohort_name=mimiciv_esgpt parameters.optimization_config.num_dataloader_workers.value=15
Followed by launching workers (one per 24GB RTX6000 GPU): srun -p <your_partition> --mem=48G --gres=gpu:1 -c 16 ./scripts/launch_wandb_agent.sh healthyml/MIMIC_FMS_public/<your_sweep_id>
Note, depending on the hyperparameter search, some were OOM, and others were fine. Surprisingly, these all finished within ~5 hours per run (median around 1.5 hours). It will depend upon your hyperparameter search config/number of parameters/learning rate.
Hi I am trying to train models on MIMIC-IV via following tutorial. But failed. please help on it. thanks.
Here're the commands I ran:
./scripts/pretrain.sh +cohort_name=try240521_0927
./scripts/launch_hyperparameter_tuning.sh cohort_name=try240521_0927
. ERROR https://github.com/mmcdermott/EventStreamGPT/blob/main/scripts/launch_wandb_hp_sweep.py does not exist any more. So I replace it withlaunch_pretraining_wandb_hp_sweep.py
../scripts/launch_wandb_agent.sh --project MIMIC_FMs_public --entity xxxx cdby50ub
Step 5 failed with below error:
The text was updated successfully, but these errors were encountered: