Modeling Configs Should be Re-arranged for Clarity, Better Defaults, and Ease-of-use #61

mmcdermott · 2023-08-25T12:09:08Z

The problem

Right now, a hydra config for pre-training a simple model has the following shape:

defaults:
  - pretrain_config
  - _self_

# Global parameters, which are a mess of parameters across categories.
do_overwrite: false
# ...

config:
  do_use_learnable_sinusoidal_ATE: false
  # ... other model params
optimization_config:
  init_lr: 1e-3
  # ... other optimization, training, and scheduler parameters
data_config:
  save_dir: ???
  # ... data parameters
pretraining_metrics_config:
  n_auc_thresholds: 50
  # ... pre-training metrics params
final_validation_metrics_config:
  n_auc_thresholds: 25
  # ... final validation metrics parameters
trainer_config:
  accelerator: auto
  # ... lightning trainer parameters, which are not in optimization config

wandb_logger_kwargs:
  project: MIMIC_FMs_public
  # specfic guitarecetwandb logger parameters.

This should be re-organized for clarity as follows. This suggested re-organization is partially inspired by https://github.com/ashleve/lightning-hydra-template/ and conversations with @Guitaricet (though any poor suggestions herein are solely mine).

Necessary Changes

Motivations / Constraints

Configs should be type-safe and compatible with hydra.
Configs should permit easy hyperparameter tuning. This entails:
a. Having as few inter-parameter dependencies as possible, so that hyperparameter selections can be made semi-independently.
b. Ensuring that parameters that permit smooth control are specified in a way that allows smooth control. For example, rather than specifying hidden_size which must equal head_dim * nun_attention_heads, it is better to specify head_dim and num_attention_heads separately.
Configs have parameters (specified or derived) that may depend on one another or on dataset specific parameters. For example, the number of expected training steps is dependent on the size of the dataset. The total vocabulary size for the model (and broken down by different measurement types) is also dependent on the dataset. Normalization parameters for TTE generation layers may depend on statistics of the dataset (log mean and log std of observed intra-event times, for example). The config should be broken down into sub-objects that can easily acquire these additional parameters via a set_to_dataset method, or equivalent. This constrains, for example, the optimization_config to know the number of desired training epochs, rather than having that parameter be in a separate object.
We will potentially move away from lightning as a trainer framework in the future, so ideally configuration values can be modularized so that there are minimal required API changes if any larger changes happen in the future.

Structured Config Source Files

Firstly, independent of config content re-organization, the config dataclass structures themselves should be placed in a clear file, de-duplicated (e.g., across pre-training, fine-tuning, generation, and embedding), and documentation therein should be greatly expanded.

Config Structure Changes

I suggest the following config structure (realized via structured configs and/or yaml files, but shown here with yaml files)

# Hydra stuff first...
defaults:
  - pretrain_config
  - _self_

# Overall, contains flat parameters for experiment configuration---e.g., save directory, seed, overwrite behavior, etc.

data:
  # This contains dataset (not dataloader) parameters. It can remain (nearly) identical to the current data_config contents, just renamed.

model:
  # This contains model specific parameters. It will mostly be identical to the current parameters in the "config" group.

training:
  # This contains parameters for optimization.
  # This has flat parameters for batching (e.g., batch sizes, gradient accumulation), number of dataloader workers, etc.
   
  optimizer:
    # This contains parameters for various base optimizers (e.g., AdamW, etc.). No parameters for other optimization aspects.
  LR_scheduler:
    # This contains parameters for LR scheduler options.
  logger:
    # parameters for logger objects (not necessarily just wandb). See https://github.com/ashleve/lightning-hydra-template/tree/main/configs/logger
  trainer:
    # parameters for trainer objects. See https://github.com/ashleve/lightning-hydra-template/tree/main/configs/trainer though some parameters will be different.
  callbacks:
    # parameters for lightning callbacks. https://github.com/ashleve/lightning-hydra-template/tree/main/configs/callbacks


evaluation:
  # this contains parameters for metrics tracking:
  training:
    # Metrics during training -- not necessarily just on the train split, but metrics tracked each epoch during training itself.
    # Contents will be similar to existing *metrics_config configs.
  final_validation:
    # Metrics during final validation -- e.g., post training.

Each of these nested sub-layers can be imputed via hydra config groups. E.g., there can be a default config option for a wandb logger or a csv logger, so that on the command line one could simply say logger=wandb or logger=csv. See https://github.com/ashleve/lightning-hydra-template/ for examples of this.

Each of these config groups and options can be specified via yaml base configs or hydra structured config objects, equivalently. The latter will be favored as it permits type safety and integrated documentation. Hyperparameter search configs (specifying parameter distributions over these configs) will still be separate config objects, but documentation will be improved to differentiate hyperparameter tuning configs from single-model-run configs.

The text was updated successfully, but these errors were encountered:

mmcdermott added the enhancement New feature or request label Aug 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modeling Configs Should be Re-arranged for Clarity, Better Defaults, and Ease-of-use #61

Modeling Configs Should be Re-arranged for Clarity, Better Defaults, and Ease-of-use #61

mmcdermott commented Aug 25, 2023

Modeling Configs Should be Re-arranged for Clarity, Better Defaults, and Ease-of-use #61

Modeling Configs Should be Re-arranged for Clarity, Better Defaults, and Ease-of-use #61

Comments

mmcdermott commented Aug 25, 2023

The problem

Necessary Changes

Motivations / Constraints

Structured Config Source Files

Config Structure Changes