Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add config notes #810

Merged
merged 10 commits into from
Sep 19, 2024
Merged

Conversation

shorvath-noaa
Copy link
Contributor

@shorvath-noaa shorvath-noaa commented Jul 24, 2024

Addition of docstrings for t-route's config module. This is essentially a version of v3_doc.yaml for V4 that lives in the config module.

This should pair nicely with CLI tool in PR#792

python gen_config_docs.py troute.config.Config output
log_parameters: LoggingParameters = Field(default_factory=LoggingParameters)
  # Python logging level. Can either be a string or an integer from the list below optional,
#     defaults to DEBUG (10). All logging statements at or above the level specified will be
#     displayed.
# 
#     To check the contents of the YAML configuration file, set the logging level at or lower than
#     DEBUG For the most verbose output, set log_level to DEBUG
  log_level: LogLevel = 'DEBUG'
  # logical, it True a timing summary is provided that reports the total time required for each
#     major process in the simulation sequence.  optional, defaults to None and no timing summary is
#     reported
  showtiming: bool = False
  # Path to location where a logging file will be saved.
  log_directory: Optional[Path] = None
network_topology_parameters: Optional[NetworkTopologyParameters] = None
  preprocessing_parameters: 'PreprocessingParameters' = Field(default_factory=dict)
    # If True, then network graph objects will be created, saved to disk, and then the execution will stop.
    preprocess_only: bool = False
    # Directory to save preprocessed data to.
#     NOTE: required if preprocess_only = True
    preprocess_output_folder: Optional[DirectoryPath] = None
    # Name to save preprocessed file to (do not include file extension).
    preprocess_output_filename: str = 'preprocess_output'
    # If True, used preprocessed network data istead of reading from geo_file_path.
    use_preprocessed_data: bool = False
    # Filepath of preprocessed data.
#     NOTE: required if use_preprocessed_data = True
    preprocess_source_file: Optional[FilePath] = None
  supernetwork_parameters: 'SupernetworkParameters'
    # Used for simulation identification. Appears in csv filename, if csv oupt is used.
#     Otherwise, this variable is of little use.
    title_string: Optional[str] = None
    # Path to the hydrofabric. Currently accepts geopackage (assumes HYFeatures), geojson (assumes HYFeatures), 
#     json (assumes HYFeatures), netcdf (assumes NHD).
    geo_file_path: FilePath
    # Specify if this is an NHD network or a HYFeatures network.
    network_type: Literal['HYFeaturesNetwork', 'NHDNetwork'] = 'HYFeaturesNetwork'
    # File containing dictionary of connections between segment IDs and nexus IDs.
#     NOTE: Only used if using geojson files for hydrofabric.
    flowpath_edge_list: Optional[str] = None
    # File containing channel mask file.
#     NOTE: Not implemented for HYFeatures.
    mask_file_path: Optional[FilePath] = None
    mask_layer_string: str = ''
    mask_driver_string: Optional[str] = None
    mask_key: int = 0
    # Attribute names in channel geometry file.
#     Default values depend on newtork type.
    columns: Optional['Columns'] = None
      # unique segment identifier
      key: str
      # unique identifier of downstream segment
      downstream: str
      # segment length
      dx: str
      # manning's roughness of main channel
      n: str
      # mannings roughness of compound channel
      ncc: str
      # channel slope
      s0: str
      # channel bottom width
      bw: str
      # waterbody identifier
      waterbody: Optional[str]
      # channel top width
      tw: str
      # compound channel top width
      twcc: str
      # channel bottom altitude
      alt: Optional[str]
      # muskingum K parameter
      musk: str
      # muskingum X parameter
      musx: str
      # channel sideslope
      cs: str
      # gage ID
      gages: Optional[str]
      # mainstem ID
      mainstem: Optional[str]
    # Synthetic waterbody segment IDs that are used to construct the Great Lakes
#     NOTE: required for CONUS-scale simulations with NWM 2.1 or 3.0 Route_Link.nc data
    synthetic_wb_segments: Optional[List[int]] = Field(default_factory=lambda)
    # Arbitrary large number appended to synthetic_wb_segments in their handling process
    synthetic_wb_id_offset: float = 999000000000.0
    # Coding in channel geometry dataset for segments draining to ocean. A '0' ID indicates there is nothing downstream.
    terminal_code: int = 0
    driver_string: Union[str, Literal['NetCDF']] = 'NetCDF'
    layer_string: int = 0
  waterbody_parameters: 'WaterbodyParameters' = Field(default_factory=dict)
    # If True, waterbodies will be treated as reservoirs. If False, the underlying flowpaths will be used for channel routing.
    break_network_at_waterbodies: bool = False
    level_pool: Optional['LevelPool'] = None
      # Filepath for NetCDF file containing lake parameters (LAKEPARM). Only used for NHD networks.
      level_pool_waterbody_parameter_file_path: Optional[FilePath] = None
      # Column name for waterbody ID.
      level_pool_waterbody_id: Union[str, Literal['lake_id']] = 'lake_id'
    # NULL value to use in flowpath-waterbody crosswalk.
    waterbody_null_code: int = -9999
compute_parameters: ComputeParameters = Field(default_factory=dict)
  # parallel computing scheme used during simulation, options below
#     - "serial": no parallelization
#     - "by-network": parallelization across independent drainage basins
#     - "by-subnetwork-jit": parallelization across subnetworks 
#     - "by-subnetwork-jit-clustered": parallelization across subnetworks, with clustering to optimize scaling
  parallel_compute_method: ParallelComputeMethod = 'by-network'
  # routing engine used for simulation
#     - "V02-structured" - Muskingum Cunge
#     NOTE: There are two other options that were previously being developed for use with the diffusive kernel, 
#     but they are now depricated:
#     - "diffusive" - Diffusive with adaptive timestepping
#     - "diffusice_cnt" - Diffusive with CNT numerical solution
#     TODO: Remove these additional options? And this parameter altogether as there is only one option?
  compute_kernel: ComputeKernel = 'V02-structured'
  # If True the short timestep assumption used in WRF hyro is used. if False, the assumption is dropped.
  assume_short_ts: bool = False
  # The target number of segments per subnetwork, only needed for "by-subnetwork..." parallel schemes.
#     The magnitude of this parameter affects parallel scaling. This is to improve efficiency. Default value has 
#     been tested as the fastest for CONUS simultions. For smaller domains this can be reduced.
  subnetwork_target_size: int = 10000
  # Number of CPUs used for parallel computations
#     If parallel_compute_method is anything but 'serial', this determines how many cpus to use for parallel processing.
  cpu_pool: Optional[int] = 1
  # If True, Courant metrics are returnd with simulations. This only works for MC simulations
  return_courant: bool = False
  restart_parameters: 'RestartParameters' = Field(default_factory=dict)
    # Time of model initialization (timestep zero). Datetime format should be %Y-%m-%d_%H:%M, e.g., 2023-04-25_00:00
#     This start time will control which forcing files and TimeSlice files are required for the simulation. 
#     If the start time is erroneously enertered, such that there are no available forcing files, then the simulation will fail. 
#     Likewise, if there are no TimeSlice files available, then data assimilation will not occur.
#     NOTE: The default is 'None' because the start date can be determined from restart files
#     such as 'lite_channel_restart_file' or 'wrf_hydro_channel_restart_file'. But if no restart
#     file is provided, this parameter is required.
    start_datetime: Optional[datetime] = None
    # Filepath to a 'lite' channel restart file create by a previous t-route simulation. If a file is specified, then it will be 
#     given preference over WRF restart files for a simulation restart.
    lite_channel_restart_file: Optional[FilePath] = None
    # Filepath to a 'lite' waterbody restart file create by a previous t-route simulation. If a file is specified, then it will be 
#     given preference over WRF restart files for a simulation restart.
    lite_waterbody_restart_file: Optional[FilePath] = None
    # Filepath to WRF Hydro HYDRO_RST file. This file does not need to be timed with start_datetime, which allows initial states
#     from one datetime to initialize a simulation with forcings starting at a different datetime. However, if the start_datetime 
#     parameter is not specified, then the time attribute in the channel restart file will be used as the starting time of the simulation.
    wrf_hydro_channel_restart_file: Optional[FilePath] = None
    # Filepath to channel geometry file.
#     NOTE: if `wrf_hydro_channel_restart_file` is given, `wrf_hydro_channel_ID_crosswalk_file` is required
    wrf_hydro_channel_ID_crosswalk_file: Optional[FilePath] = None
    # Field name of segment IDs in restart file.
    wrf_hydro_channel_ID_crosswalk_file_field_name: Optional[str] = None
    # Field name of upstream flow in restart file.
    wrf_hydro_channel_restart_upstream_flow_field_name: Optional[str] = None
    # Field name of downstream flow in restart file.
    wrf_hydro_channel_restart_downstream_flow_field_name: Optional[str] = None
    # Field name of depth in restart file.
    wrf_hydro_channel_restart_depth_flow_field_name: Optional[str] = None
    # Filepath to waterbody restart file. This is often the same as wrf_hydro_channel_restart_file.
    wrf_hydro_waterbody_restart_file: Optional[FilePath] = None
    # Filepath to lake parameter file.
#     NOTE: required if `wrf_hydro_waterbody_restart_file`
    wrf_hydro_waterbody_ID_crosswalk_file: Optional[FilePath] = None
    # Field name of waterbody ID.
    wrf_hydro_waterbody_ID_crosswalk_file_field_name: Optional[str] = None
    # Filepath to channel geometry file.
    wrf_hydro_waterbody_crosswalk_filter_file: Optional[FilePath] = None
    # Fieldname of waterbody IDs in channel geometry file.
    wrf_hydro_waterbody_crosswalk_filter_file_field_name: Optional[str] = None
  hybrid_parameters: 'HybridParameters' = Field(default_factory=dict)
    # Boolean parameter whether or not hybrid routing is actived. If it is set to True, the hybrid routing is activated. 
#     If false, MC is solely used for channel flow routing.
#     NOTE: required for hybrid simulations
    run_hybrid_routing: bool = False
    # Filepath to diffusive domain dictionary file. This file can be either JSON or yaml and contain a dictionary
#     of diffusive network segments, organized by tailwater ID (keys). This is a file such as: 
#     https://github.com/NOAA-OWP/t-route/blob/master/test/LowerColorado_TX_v4/domain/coastal_domain_tw.yaml
#     This file defines tailwater and head water flowpath IDs for the diffusive domain. See file for more info.
#     NOTE: required for hybrid simulations
    diffusive_domain: Optional[FilePath] = None
    # Boolean parameter whether or not natural cross section data is used. If it is set to True, diffusive model 
#     uses natural cross section data. If False, diffusive model uses synthetic cross section defined by RouteLink.nc
    use_natl_xsections: bool = False
    # Filepath to topobathy data for channel cross sections. Currently (June 25, 2024), 3D cross section data
#     is contained in a separate file, which this parameter should point to. In the future this data may simply be
#     included in the hydrofabric.
#     Topobathy data of a channel cross section is defined by comid.
#     NOTE: Required for diffusive routing for natural cross sections.
    topobathy_domain: Optional[FilePath] = None
    # Boolean parameter whether or not to run the diffusive module on a refactored network. This was necessary on
#     the NHD network due to short segments causing computational issues. Not needed for HYFeatures.
    run_refactored_network: bool = False
    # A file with refactored flowpaths to eliminate short segments.
#     NOTE: Only needed for NHD network.
    refactored_domain: Optional[FilePath] = None
    # A file with refactored topobathy data.
#     NOTE: Only needed for NHD network.
    refactored_topobathy_domain: Optional[FilePath] = None
    # File containing crosswalk between diffusive tailwater segment IDs and coastal model output node IDs. 
#     This is needed if t-route will use outputs from a coastal model as the downstream boundary condition for
#     the diffusive module. See example:
#     https://github.com/NOAA-OWP/t-route/blob/master/test/LowerColorado_TX_v4/domain/coastal_domain_crosswalk.yaml
#     NOTE: This is related to the ForcingParameters -> coastal_boundary_input_file parameter.
    coastal_boundary_domain: Optional[FilePath] = None
  forcing_parameters: 'ForcingParameters' = Field(default_factory=dict)
    # The number of routing simulation timesteps per qlateral time interval. For example, if dt_qlateral = 3600 secs, 
#     and dt = 300 secs, then qts_subdivisions = 3600/300 = 12
    qts_subdivisions: int = 12
    # Time step size (seconds). Default is 5 mintues
    dt: int = 300
    qlat_input_folder: Optional[DirectoryPath] = None
    # Number of timesteps. This value, multiplied by 'dt', gives the total simulation time in seconds.
    nts: Optional[int] = 288
    # Value is in hours. To handle memory issues, t-route can divvy it's simulation time into chunks, reducing the amount 
#     of forcing and data assimilation files it reads into memory at once. This is the size of those time loops.
    max_loop_size: int = 24
    # Name of column containing flowpath/nexus IDs
    qlat_file_index_col: str = 'feature_id'
    # Name of column containing q_lateral data.
    qlat_file_value_col: str = 'q_lateral'
    # Groundwater bucket flux (to channel) variable name in forcing file.
#     NOTE: Only needed if using WRF-Hydro output files (CHRTOUT) as forcing files.
    qlat_file_gw_bucket_flux_col: str = 'qBucket'
    # Surface terrain runoff (to channel) variable name in forcing file.
#     NOTE: Only needed if using WRF-Hydro output files (CHRTOUT) as forcing files.
    qlat_file_terrain_runoff_col: str = 'qSfcLatRunoff'
    # Globbing file pattern to identify q_lateral forcing files.
    qlat_file_pattern_filter: Optional[str] = '*NEXOUT'
    qlat_forcing_sets: Optional[List[QLateralForcingSet]] = None
      # Number of timesteps in loop iteration 1. This corresponds to the number of files listed in qlat_files.
#     This parameter is repeated for as many iterations as are desired.
      nts: 'QLateralFiles'
        # List of forcing file names to be used in a single iteration.
        qlat_files: List[FilePath]
    # Directory to save converted forcing files. Only needed if running t-route as part of ngen suite AND if t-route is having memory issues.
#     NOTE: Exlpanation: Ngen outputs q_lateral files as 1 file per nexus containing all timesteps. t-route requires 1 file per timestep 
#     containing all locations. If this parameter is omitted or left blank, t-route will simply read in all of ngen's output q_lateral files 
#     into memory and will attempt routing. If the simulation is large (temporally and/or spatially), t-route might crash due to memory issues. 
#     By providing a directory to this parameter, t-route will convert ngen's output q_lateral files into parquet files in the format t-route 
#     needs. Then, during routing, t-route will only read the required parquet files as determined by 'max_loop_size', thus reducing memory.
    binary_nexus_file_folder: Optional[DirectoryPath] = None
    # File containing coastal model output.
#     NOTE: Only used if running diffusive routing.
    coastal_boundary_input_file: Optional[FilePath] = None
  data_assimilation_parameters: 'DataAssimilationParameters' = Field(default_factory=dict)
    # Directory path to usgs timeslice files.
#     NOTE: required for streamflow nudging and/or USGS reservoir DA
    usgs_timeslices_folder: Optional[DirectoryPath] = None
    # Directory path to usace timeslice files.
#     NOTE: required for USACE reservoir DA
    usace_timeslices_folder: Optional[DirectoryPath] = None
    # Directory path to canadian timeslice files. 
#     NOTE: required for Lake Erie DA (and streamflow nudging using Canadian gages, though that has not been 
#     implemented as of June 25, 2024).
    canada_timeslices_folder: Optional[DirectoryPath] = None
    # CSV file containing DA values for Lake Ontario. Needs to be obtained and pre-processed from https://ijc.org/en/loslrb/watershed/flows.
#     NOTE: Required for Lake Ontario DA.
    LakeOntario_outflow: Optional[FilePath] = None
    # Number of hours to look back in time (from simulation time) for USGS, USACE, and Canadian timeslice data assimilation files.
    timeslice_lookback_hours: int = 24
    # Limit on how many missing values can be replaced by linear interpolation from timeslice files.
    interpolation_limit_min: int = 59
    # Lead time of lastobs relative to simulation start time (secs).
#     NOTE: Only relevant if using a WRF-Hydro lastobs restart file.
    wrf_hydro_lastobs_lead_time_relative_to_simulation_start_time: int = 0
    wrf_lastobs_type: str = 'obs-based'
    streamflow_da: StreamflowDA = None
      # Boolean, determines whether or not streamflow nudging is performed.
#     NOTE: Mandatory for streamflow DA
      streamflow_nudging: bool = False
      # File relating stream gage IDs to segment links in the model domain. This is typically the RouteLink file.
#     NOTE: Mandatory for streamflow DA on NHDNetwork. Not necessary on HYFeatures as this information is included
#     in the hydrofabric.
      gage_segID_crosswalk_file: Optional[FilePath] = None
      # Column name for gages in gage_segID_crosswalk_file.
#     NOTE: Not necessary on HYFeatures.
      crosswalk_gage_field: Optional[str] = 'gages'
      # Column name for flowpaths/links in gage_segID_crosswalk_file.
#     NOTE: Not necessary on HYFeatures.
      crosswalk_segID_field: Optional[str] = 'link'
      # File containing information on the last streamflow observations that were assimilated from a previous t-route run. 
#     This is used for a 'warm' restart. Mostly used for operational NWM settings.
      lastobs_file: Optional[FilePath] = None
      # If True, enable streamflow data assimilation in diffusive module. 
#     NOTE: Not yet implemented, leave as False. (June 25, 2024)
      diffusive_streamflow_nudging: bool = False
    reservoir_da: Optional[ReservoirDA] = None
      reservoir_persistence_da: Optional[ReservoirPersistenceDA] = None
        # If True, USGS reservoirs will perform data assimilation.
        reservoir_persistence_usgs: bool = False
        # If True, USACE reservoirs will perform data assimilation.
        reservoir_persistence_usace: bool = False
        # If True, Great Lakes will perform data assimilation.
        reservoir_persistence_greatLake: bool = False
        # Column name designation in files for USGS gages.
        crosswalk_usgs_gage_field: str = 'usgs_gage_id'
        # Column name designation in files for USACE gages.
        crosswalk_usace_gage_field: str = 'usace_gage_id'
        # Column name designation in files for USGS lake IDs.
        crosswalk_usgs_lakeID_field: str = 'usgs_lake_id'
        # Column name designation in files for USACE lake IDs.
        crosswalk_usace_lakeID_field: str = 'usace_lake_id'
      reservoir_rfc_da: Optional[Union[ReservoirRfcParameters, ReservoirRfcParametersDisabled]] = Field(None, discriminator='reservoir_rfc_forecasts')
      # File conaining reservoir parameters (e.g., reservoir_index_AnA.nc).
#     NOTE: Needed for NHDNetwork, but not HYFeatures as this information is included in the hydrofabric.
      reservoir_parameter_file: Optional[FilePath] = None
    # Threshold for determining which observations are deemed acceptable for DA and which are not. If the values is set to 1, 
#     then only the very best observations are retained. On the other hand, if the value is set to 0, then all observations will be 
#     used for assimilation, even those markesd as very poor quality.
    qc_threshold: float = Field(1, ge=0, le=1)
output_parameters: OutputParameters = Field(default_factory=dict)
  chanobs_output: Optional['ChanobsOutput'] = None
    # Directory to save CHANOBS output files. If this is None, no CHANOBS will be written.
    chanobs_output_directory: Optional[DirectoryPath] = None
    # Filename of CHANOBS output file.
    chanobs_filepath: Optional[Path] = None
  csv_output: Optional['CsvOutput'] = None
    # Directory to save csv output files. If this is None, no csv will be written.
    csv_output_folder: Optional[DirectoryPath] = None
    # Subset of segment IDs to include in the output file.
    csv_output_segments: Optional[List[str]] = None
  chrtout_output: Optional['ChrtoutOutput'] = None
    # Directory to save CHRTOUT files. No files will be written if this is None.
    wrf_hydro_channel_output_source_folder: Optional[DirectoryPath] = None
  lite_restart: Optional['LiteRestart'] = None
    # Directory to save lite_restart files. No files will be written if this is None.
    lite_restart_output_directory: Optional[DirectoryPath] = None
  hydro_rst_output: Optional['HydroRstOutput'] = None
    # Directory to save state files.
    wrf_hydro_restart_dir: Optional[DirectoryPath] = None
    # File pattern for state files.
    wrf_hydro_channel_restart_pattern_filter: str = 'HYDRO_RST.*'
    # DEPRECATED?
    wrf_hydro_channel_restart_source_directory: Optional[DirectoryPath] = None
    # DEPRECATED?
    wrf_hydro_channel_output_source_folder: Optional[DirectoryPath] = None
  wrf_hydro_parity_check: Optional['WrfHydroParityCheck'] = None
    # 
    parity_check_input_folder: Optional[DirectoryPath] = None
    # 
    parity_check_file_index_col: str
    # 
    parity_check_file_value_col: str
    # 
    parity_check_compare_node: str
    parity_check_compare_file_sets: Optional[List['ParityCheckCompareFileSet']] = None
      # 
      validation_files: List[FilePath]
  lakeout_output: Optional[DirectoryPath] = None
  test_output: Optional[Path] = None
  stream_output: Optional['StreamOutput'] = None
    # Directory to save flowveldepth outputs. If this is not None, this form of output will be written.
    stream_output_directory: Optional[DirectoryPath] = None
    # Value is in simulation time hours. This tells t-route how frequently to make output files. '1' would be 1 file per hour 
#     of simulation time.
    stream_output_time: int = 1
    # Output file type.
    stream_output_type: streamOutput_allowedTypes = '.nc'
    # Value is in minutes. This tells t-route the frequency of t-route's timesteps to include in the output file. For instance, 
#     a value of '5' here would output flow, velocity, and depth values every 5 minutes of simulation time. A value of '30' would 
#     output values every 30 mintues of simulation time.
#     NOTE: This value should not be smaller than dt, and should be a multiple of dt (keep in mind dt is in seconds, while this value 
#     is in minutes). So if dt=300(sec), this value cannot be smaller than 5(min) and should be a multiple of 5.
    stream_output_internal_frequency: Annotated[int, Field(strict=True, ge=5)] = 5
  lastobs_output: Optional[DirectoryPath] = None
bmi_parameters: Optional[BMIParameters] = None
  flowpath_columns: Optional[List[str]] = Field(default_factory=lambda)
  attributes_columns: Optional[List[str]] = Field(default_factory=lambda)
  waterbody_columns: Optional[List[str]] = Field(default_factory=lambda)
  network_columns: Optional[List[str]] = Field(default_factory=lambda)

Additions

troute.config

  • Docstrings for most parameters. Some that are quite old (and possibly deprecated) don't have any notes.
  • Docstrings for all root_validators in config.py

Removals

Changes

Testing

Screenshots

Notes

Todos

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Visually tested in supported browsers and devices (see checklist below 👇)
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Testing checklist

Target Environment support

  • Windows
  • Linux
  • Browser

Accessibility

  • Keyboard friendly
  • Screen reader friendly

Other

  • Is useable without CSS
  • Is useable without JS
  • Flexible from small to large screens
  • No linting errors or warnings
  • JavaScript tests are passing

kumdonoaa
kumdonoaa previously approved these changes Sep 5, 2024
@shorvath-noaa shorvath-noaa merged commit eed1f8a into NOAA-OWP:master Sep 19, 2024
4 checks passed
aaraney pushed a commit to aaraney/t-route that referenced this pull request Jan 7, 2025
* add notes for compute parameters

* add notes for network topology parameters

* add notes for output parameters

* add notes for logging parameters

* add notes for config

* add type hinting for root_validators

* remove v4_config_outline file

* add sample, simple configuration files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants