Add StepLR scheduler #109

gitttt-1234 · 2024-10-28T22:11:17Z

Currently, we support ReduceLROnPlateau for reducing learning rate by monitoring the val_loss. In this PR, we add support for StepLR, to have a fixed learning rate schedule (for eg, reducing lr by 0.5 every 30 epochs).

Summary by CodeRabbit

New Features
- Enhanced configuration options for learning rate scheduling and early stopping mechanisms.
- Introduced new parameters for data processing, including chunk_size and user_instances_only.
- Added structured configuration for learning rate adjustments using ReduceLROnPlateau.
Bug Fixes
- Improved error handling for unsupported learning rate schedulers in the training process.
Documentation
- Updated configuration documentation to reflect new parameters and improved clarity.
Tests
- Expanded test suite for ModelTrainer to validate new learning rate scheduler configurations and error handling.

coderabbitai · 2024-10-28T22:11:24Z

Walkthrough

The pull request introduces significant enhancements to the configuration documentation and files related to the sleap_nn.ModelTrainer class. It includes new parameters for learning rate scheduling and early stopping in the documentation, and modifies several YAML configuration files to incorporate structured settings for these features. Additionally, it updates the CenteredInstanceStreamingDataset class to streamline the cropping size calculation and refines the configure_optimizers method in the model_trainer.py file to support various learning rate schedulers.

Changes

File Path	Change Summary
docs/config.md	- Added `trainer_config.lr_scheduler` section with new `scheduler` parameter. - Expanded `lr_scheduler` with `step_lr` and `reduce_lr_on_plateau` subsections, including multiple parameters. - Introduced `trainer_config.early_stopping` section.
docs/config_bottomup.yaml	- Added `chunk_size: 100` in `data_config`. - Updated `lr_scheduler` to a nested structure under `reduce_lr_on_plateau` with several parameters.
docs/config_centroid.yaml	- Added `chunk_size: 100` in `data_config`. - Updated `lr_scheduler` to `scheduler: ReduceLROnPlateau` with parameters nested under `reduce_lr_on_plateau`.
docs/config_topdown_centered_instance.yaml	- Added `chunk_size: 100` and `min_crop_size` in `data_config`. - Updated `lr_scheduler` to `scheduler: ReduceLROnPlateau` with parameters nested under `reduce_lr_on_plateau`.
initial_config.yaml	- Introduced comprehensive configuration structure with `data_config`, `model_config`, and `trainer_config` sections.
sleap_nn/data/streaming_datasets.py	- Updated `CenteredInstanceStreamingDataset` to recalculate `crop_hw` during initialization based on `input_scale`.
sleap_nn/training/model_trainer.py	- Refactored `configure_optimizers` to support multiple learning rate schedulers, including error handling for invalid configurations.
tests/assets/minimal_instance/initial_config.yaml	- Added `user_instances_only`, `chunk_size`, and `min_crop_size` in `data_config`. - Introduced `bin_files_path` in `trainer_config`. - Restructured `lr_scheduler` configuration.
tests/assets/minimal_instance/training_config.yaml	- Added `user_instances_only`, `chunk_size`, and `min_crop_size` in `data_config`. - Introduced `bin_files_path` in `trainer_config`. - Restructured `lr_scheduler` configuration.
tests/assets/minimal_instance_bottomup/initial_config.yaml	- Added `user_instances_only`, `chunk_size`, and `bin_files_path` in `data_config` and `trainer_config`. - Restructured `lr_scheduler` configuration.
tests/assets/minimal_instance_bottomup/training_config.yaml	- Added `user_instances_only`, `chunk_size`, and `bin_files_path` in `data_config` and `trainer_config`. - Restructured `lr_scheduler` configuration.
tests/assets/minimal_instance_centroid/initial_config.yaml	- Added `user_instances_only`, `chunk_size`, and `bin_files_path` in `data_config` and `trainer_config`. - Restructured `lr_scheduler` configuration.
tests/assets/minimal_instance_centroid/training_config.yaml	- Added `user_instances_only`, `chunk_size`, and `bin_files_path` in `data_config` and `trainer_config`. - Restructured `lr_scheduler` configuration.
tests/fixtures/datasets.py	- Updated `provider` to `"LabelsReader"` and added `user_instances_only` and `chunk_size` in `data_config`. - Restructured `lr_scheduler` configuration.
tests/training/test_model_trainer.py	- Updated tests to validate new learning rate scheduling configurations and exception handling for invalid configurations.

Possibly related PRs

Refactor model pipeline #51: Refactor model pipeline - Modifies trainer_config including learning rate scheduler configuration, related to this PR.
Remove Part-names and Edge dependency in config #54: Remove Part-names and Edge dependency in config - Involves changes to trainer_config, intersecting with this PR's modifications.
Refactor model config #61: Refactor model config - Updates configuration structure, relevant to enhancements in this PR.
Add threshold mode in config for learning rate scheduler #106: Add threshold mode in config for learning rate scheduler - Directly adds a new parameter to the learning rate scheduler configuration, a key focus of this PR.

Suggested reviewers

talmo

Poem

In the fields where bunnies play,
New configs hop along the way.
With learning rates that dance and twirl,
Our models now can truly whirl!
So let’s configure, tweak, and train,
For in this code, we’ll find our gain! 🐇✨

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

codecov · 2024-10-28T22:22:04Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.37%. Comparing base (f093ce2) to head (fc4d20a).
Report is 20 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #109      +/-   ##
==========================================
+ Coverage   96.64%   97.37%   +0.73%     
==========================================
  Files          23       38      +15     
  Lines        1818     3701    +1883     
==========================================
+ Hits         1757     3604    +1847     
- Misses         61       97      +36

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

docs/config.md

docs/config_bottomup.yaml

docs/config_centroid.yaml

docs/config_topdown_centered_instance.yaml

sleap_nn/training/model_trainer.py

tests/assets/minimal_instance_bottomup/training_config.yaml

tests/assets/minimal_instance_centroid/initial_config.yaml

tests/assets/minimal_instance_centroid/training_config.yaml

tests/fixtures/datasets.py

tests/training/test_model_trainer.py

coderabbitai

Actionable comments posted: 11

🧹 Outside diff range and nitpick comments (13)

tests/assets/minimal_instance/initial_config.yaml (1)

13-14: Consider parameterizing the crop dimensions.

The crop dimensions are hardcoded to 160x160. Consider making these values configurable through environment variables or command-line arguments for better flexibility.

tests/assets/minimal_instance_bottomup/initial_config.yaml (2)

5-6: Document the new data configuration parameters.

Please add comments explaining:

The impact of user_instances_only: True on data processing

How chunk_size: 100 affects performance and memory usage

70-70: Document the purpose of bin_files_path.

Please add a comment explaining the purpose of this configuration field and whether an empty value is expected in this test configuration.

docs/config_bottomup.yaml (1)

6-6: Document the purpose of chunk_size parameter.

The newly added chunk_size parameter lacks documentation explaining its purpose and impact on data processing. Consider adding a comment to clarify its usage and recommended values.

docs/config_topdown_centered_instance.yaml (1)

6-6: Document new parameters and provide default value for min_crop_size.

The newly added parameters need documentation:

chunk_size: 100 - What does this value represent and how was it determined?

min_crop_size - This parameter is empty. Please provide a default value or document if it's optional.

Also applies to: 15-15

initial_config.yaml (1)

1-104: Fix file format consistency.

The file uses incorrect newline characters.

Ensure the file uses Unix-style line endings (\n) instead of Windows-style (\r\n).

🧰 Tools

🪛 yamllint

[error] 1-1: wrong new line character: expected \n

(new-lines)
sleap_nn/data/streaming_datasets.py (1)
154-155: Consider adding input validation for input_scale.

To prevent potential issues, consider adding validation for the input_scale parameter in __init__ to ensure it's positive.
 def __init__(
     self,
     confmap_head: DictConfig,
     crop_hw: Tuple[int],
     max_stride: int,
     apply_aug: bool = False,
     augmentation_config: DictConfig = None,
     input_scale: float = 1.0,
     *args,
     **kwargs,
 ):
     """Construct a CenteredInstanceStreamingDataset."""
     super().__init__(*args, **kwargs)
+    if input_scale <= 0:
+        raise ValueError("input_scale must be positive")
 
     self.confmap_head = confmap_head
     self.crop_hw = crop_hw
     self.max_stride = max_stride
     self.apply_aug = apply_aug
     self.aug_config = augmentation_config
     self.input_scale = input_scale
tests/training/test_model_trainer.py (2)
103-110: Enhance test coverage for StepLR scheduler.

While the test configures the StepLR scheduler, it doesn't verify that the learning rate actually decreases according to the schedule. Consider adding assertions to check the learning rate values at different epochs.

Add assertions like this after training:
     trainer = ModelTrainer(config)
     trainer.train()
+    
+    # Verify learning rate schedule
+    df = pd.read_csv(Path(config.trainer_config.save_ckpt_path).joinpath("lightning_logs/version_0/metrics.csv"))
+    initial_lr = config.trainer_config.optimizer.lr
+    # Check that LR is halved every 10 epochs
+    assert abs(df.loc[0, "learning_rate"] - initial_lr) <= 1e-4
+    assert abs(df.loc[10, "learning_rate"] - initial_lr * 0.5) <= 1e-4
337-342: Enhance exception testing for invalid schedulers.

The test only verifies one invalid scheduler case. Consider testing multiple invalid cases and verifying the error message content.

Add more test cases:
     # check exception for lr scheduler
-    OmegaConf.update(config, "trainer_config.lr_scheduler.scheduler", "ReduceLR")
-    with pytest.raises(ValueError):
-        trainer = ModelTrainer(config)
-        trainer.train()
+    invalid_schedulers = ["ReduceLR", "CustomLR", "InvalidScheduler"]
+    for invalid_scheduler in invalid_schedulers:
+        OmegaConf.update(config, "trainer_config.lr_scheduler.scheduler", invalid_scheduler)
+        with pytest.raises(ValueError, match=f"Unsupported scheduler: {invalid_scheduler}"):
+            trainer = ModelTrainer(config)
+            trainer.train()
docs/config.md (3)
179-189: Fix markdown formatting inconsistencies.

The indentation of list items is inconsistent with the rest of the document. Please adjust the indentation to match the document's style:

Reduce indentation of scheduler and main sections to 4 spaces

Reduce indentation of subsections to 8 spaces

Apply this formatting:
    - `lr_scheduler`
        - `scheduler`: (str) Name of the scheduler to use. Valid schedulers: `"StepLR"`, `"ReduceLROnPlateau"`.
        - `step_lr`:
            - `step_size`: (int) Period of learning rate decay. If `step_size`=10, then every 10 epochs, learning rate will be reduced by a factor of `gamma`.
            - `gamma`: (float) Multiplicative factor of learning rate decay.*Default*: 0.1.
        - `reduce_lr_on_plateau`:
            - `threshold`: (float) Threshold for measuring the new optimum, to only focus on significant changes. *Default*: 1e-4.
            - `threshold_mode`: (str) One of "rel", "abs". In rel mode, dynamic_threshold = best * ( 1 + threshold ) in max mode or best * ( 1 - threshold ) in min mode. In abs mode, dynamic_threshold = best + threshold in max mode or best - threshold in min mode. *Default*: "rel".
            - `cooldown`: (int) Number of epochs to wait before resuming normal operation after lr has been reduced. *Default*: 0
            - `patience`: (int) Number of epochs with no improvement after which learning rate will be reduced. For example, if patience = 2, then we will ignore the first 2 epochs with no improvement, and will only decrease the LR after the third epoch if the loss still hasn't improved then. *Default*: 10.
            - `factor`: (float) Factor by which the learning rate will be reduced. new_lr = lr * factor. *Default*: 0.1.
            - `min_lr`: (float or List[float]) A scalar or a list of scalars. A lower bound on the learning rate of all param groups or each group respectively. *Default*: 0.
🧰 Tools

🪛 LanguageTool

[uncategorized] ~180-~180: Loose punctuation mark.
Context: ...ReduceLROnPlateau". - step_lr: - step_size`: (int) Period...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~183-~183: Loose punctuation mark.
Context: ...*: 0.1. - reduce_lr_on_plateau: - threshold: (float) Thre...

(UNLIKELY_OPENING_PUNCTUATION)

🪛 Markdownlint

179-179: Expected: 4; Actual: 8
Unordered list indentation

(MD007, ul-indent)

180-180: Expected: 4; Actual: 8
Unordered list indentation

(MD007, ul-indent)

181-181: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)

182-182: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)

183-183: Expected: 4; Actual: 8
Unordered list indentation

(MD007, ul-indent)

184-184: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)

185-185: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)

186-186: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)

187-187: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)

188-188: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)

189-189: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)

185-185: null
Spaces inside emphasis markers

(MD037, no-space-in-emphasis)

185-185: null
Spaces inside emphasis markers

(MD037, no-space-in-emphasis)

185-185: Fix emphasis markers in threshold_mode description.

Remove spaces inside emphasis markers in the description of threshold_mode.

Apply this change:
- In * rel * mode, dynamic_threshold = best * ( 1 + threshold ) in max mode or best * ( 1 - threshold ) in min mode.
+ In *rel* mode, dynamic_threshold = best * ( 1 + threshold ) in max mode or best * ( 1 - threshold ) in min mode.
🧰 Tools

🪛 Markdownlint

185-185: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)

185-185: null
Spaces inside emphasis markers

(MD037, no-space-in-emphasis)

185-185: null
Spaces inside emphasis markers

(MD037, no-space-in-emphasis)

179-189: Consider adding examples for each scheduler configuration.

To improve documentation clarity, consider adding example configurations for both StepLR and ReduceLROnPlateau schedulers. This would help users understand how to properly configure these schedulers for their use cases.

Would you like me to generate example configurations for both schedulers?

🧰 Tools

🪛 LanguageTool

[uncategorized] ~180-~180: Loose punctuation mark.
Context: ...ReduceLROnPlateau". - step_lr: - step_size`: (int) Period...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~183-~183: Loose punctuation mark.
Context: ...*: 0.1. - reduce_lr_on_plateau: - threshold: (float) Thre...

(UNLIKELY_OPENING_PUNCTUATION)

🪛 Markdownlint

179-179: Expected: 4; Actual: 8
Unordered list indentation

(MD007, ul-indent)

180-180: Expected: 4; Actual: 8
Unordered list indentation

(MD007, ul-indent)

181-181: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)

182-182: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)

183-183: Expected: 4; Actual: 8
Unordered list indentation

(MD007, ul-indent)

184-184: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)

185-185: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)

186-186: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)

187-187: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)

188-188: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)

189-189: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)

185-185: null
Spaces inside emphasis markers

(MD037, no-space-in-emphasis)

185-185: null
Spaces inside emphasis markers

(MD037, no-space-in-emphasis)
sleap_nn/training/model_trainer.py (1)
Line range hint 675-684: Consider simplifying the return statement structure.

The current implementation has two separate return statements with similar structures. This could be simplified to improve readability and maintainability.
-        if self.trainer_config.lr_scheduler.scheduler is None:
-            return {
-                "optimizer": optimizer,
-            }
-
-        return {
-            "optimizer": optimizer,
-            "lr_scheduler": {
-                "scheduler": scheduler,
-                "monitor": "val_loss",
-            },
-        }
+        config = {"optimizer": optimizer}
+        if self.trainer_config.lr_scheduler.scheduler is not None:
+            config["lr_scheduler"] = {
+                "scheduler": scheduler,
+                "monitor": "val_loss",
+            }
+        return config

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 0cf48a8 and fb59c03.

📒 Files selected for processing (15)

docs/config.md (1 hunks)
docs/config_bottomup.yaml (2 hunks)
docs/config_centroid.yaml (2 hunks)
docs/config_topdown_centered_instance.yaml (3 hunks)
initial_config.yaml (1 hunks)
sleap_nn/data/streaming_datasets.py (1 hunks)
sleap_nn/training/model_trainer.py (1 hunks)
tests/assets/minimal_instance/initial_config.yaml (3 hunks)
tests/assets/minimal_instance/training_config.yaml (4 hunks)
tests/assets/minimal_instance_bottomup/initial_config.yaml (3 hunks)
tests/assets/minimal_instance_bottomup/training_config.yaml (3 hunks)
tests/assets/minimal_instance_centroid/initial_config.yaml (3 hunks)
tests/assets/minimal_instance_centroid/training_config.yaml (3 hunks)
tests/fixtures/datasets.py (2 hunks)
tests/training/test_model_trainer.py (5 hunks)

🧰 Additional context used

🪛 LanguageTool

docs/config.md

[uncategorized] ~180-~180: Loose punctuation mark.
Context: ...ReduceLROnPlateau". - step_lr: - step_size`: (int) Period...

(UNLIKELY_OPENING_PUNCTUATION)

[uncategorized] ~183-~183: Loose punctuation mark.
Context: ...*: 0.1. - reduce_lr_on_plateau: - threshold: (float) Thre...

(UNLIKELY_OPENING_PUNCTUATION)

🪛 Markdownlint

docs/config.md

179-179: Expected: 4; Actual: 8
Unordered list indentation

(MD007, ul-indent)

180-180: Expected: 4; Actual: 8
Unordered list indentation

(MD007, ul-indent)

181-181: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)

182-182: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)

183-183: Expected: 4; Actual: 8
Unordered list indentation

(MD007, ul-indent)

184-184: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)

185-185: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)

186-186: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)

187-187: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)

188-188: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)

189-189: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)

185-185: null
Spaces inside emphasis markers

(MD037, no-space-in-emphasis)

185-185: null
Spaces inside emphasis markers

(MD037, no-space-in-emphasis)

🪛 yamllint

initial_config.yaml

[error] 1-1: wrong new line character: expected \n

(new-lines)

🔇 Additional comments (21)

tests/assets/minimal_instance_centroid/initial_config.yaml (3)

5-6: Verify if data config changes are intended for this PR.

The addition of user_instances_only and chunk_size parameters appears unrelated to the PR's objective of adding StepLR scheduler support. Please confirm if these changes should be part of this PR.

#!/bin/bash
# Check if these parameters are referenced in related changes
echo "Searching for related changes to user_instances_only and chunk_size..."
rg -l "user_instances_only|chunk_size" --type yaml

84-91: ⚠️ Potential issue

Restructure scheduler config to support both ReduceLROnPlateau and StepLR.

The current changes only implement ReduceLROnPlateau configuration, but the PR's objective is to add StepLR support. Consider restructuring to support both scheduler types:

  lr_scheduler:
-    scheduler: ReduceLROnPlateau
-    reduce_lr_on_plateau:
-      threshold: 1.0e-07
-      threshold_mode: abs
-      cooldown: 3
-      patience: 5
-      factor: 0.5
-      min_lr: 1.0e-08
+    type: "reduce_lr_on_plateau"  # or "step_lr"
+    params:
+      # Common parameters
+      factor: 0.5
+      # ReduceLROnPlateau specific
+      threshold: 1.0e-07
+      threshold_mode: abs
+      cooldown: 3
+      patience: 5
+      min_lr: 1.0e-08
+      # StepLR specific (commented until needed)
+      # step_size: 30
+      # gamma: 0.5

This structure:

Makes scheduler type explicit via the type field
Supports both scheduler types in a clean way
Keeps common parameters at the top level
Maintains backward compatibility

#!/bin/bash
# Check for any existing StepLR references or tests
echo "Searching for StepLR implementation..."
rg "StepLR|step_lr" --type python

64-64: Clarify the purpose of bin_files_path.

This new parameter is added without a value and seems unrelated to the StepLR scheduler implementation. Please provide documentation about its purpose or remove it if not needed for this PR.

tests/assets/minimal_instance/initial_config.yaml (3)

89-96: ⚠️ Potential issue

Add support for StepLR configuration.

The PR's objective is to add StepLR support, but the configuration only includes ReduceLROnPlateau. Consider:

Making the scheduler type configurable rather than hardcoded
Adding a parallel configuration section for StepLR parameters

Here's a suggested structure:

 lr_scheduler:
-    scheduler: ReduceLROnPlateau
+    scheduler: ${scheduler_type}  # Can be "ReduceLROnPlateau" or "StepLR"
     reduce_lr_on_plateau:
       threshold: 1.0e-07
       threshold_mode: abs
       cooldown: 3
       patience: 5
       factor: 0.5
       min_lr: 1.0e-08
+    step_lr:
+      step_size: 30
+      gamma: 0.5
+      last_epoch: -1

Let's check for any existing StepLR configurations in the codebase:

#!/bin/bash
# Search for StepLR-related configurations
rg -i "step_lr|steplr|step_size|gamma"

69-69: Specify a value for bin_files_path or document if it's optional.

The bin_files_path parameter is added but left empty. Please document whether this is an optional parameter.

Let's check how this parameter is used:

#!/bin/bash
# Search for bin_files_path usage to determine if it's required
rg -i "bin_files_path"

15-15: Specify a value for min_crop_size or document if it's optional.

The min_crop_size parameter is added but left empty. This could lead to validation issues if the parameter is required.

Let's check if this parameter is used elsewhere in the codebase:

✅ Verification successful

The empty min_crop_size parameter is valid as it's an optional configuration.

Based on the codebase analysis:

min_crop_size is an optional parameter that defaults to None or null in YAML configs
It's used in find_instance_crop_size() function to set a minimum boundary for automatic crop size calculation
The documentation in config.md confirms it's only used when crop_hw is None
Other test configurations also set this to null, indicating this is a common pattern

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for min_crop_size usage to determine if it's required
rg -i "min_crop_size"

Length of output: 1636

Script:

#!/bin/bash
# Search for min_crop_size usage with case-insensitive pattern and include context
rg -i "min.?crop.?size" -A 3 -B 3

Length of output: 8970

Script:

#!/bin/bash
# Search for crop-related configurations in YAML files
fd -e yaml -e yml -x grep -i "min.?crop.?size\|crop" {}

Length of output: 269

Script:

#!/bin/bash
# Look for data configuration or preprocessing files that might use crop size
fd -t f -e py | xargs grep -l -i "crop"

Length of output: 883

tests/assets/minimal_instance_bottomup/initial_config.yaml (1)

92-97: Verify the ReduceLROnPlateau parameters.

Please confirm:

Is threshold: 1.0e-07 sensitive enough for your loss function?
Is cooldown: 3 sufficient to observe stable improvements?
Is patience: 5 appropriate given your typical training dynamics?
Will min_lr: 1.0e-08 allow sufficient model refinement?

Let's check for consistency across other configurations:

✅ Verification successful

Based on the code analysis, I can now provide a final response:

The ReduceLROnPlateau parameters appear appropriate for MSE loss training

The codebase reveals:

The model uses MSE loss (nn.MSELoss()) as the primary loss function
The threshold of 1.0e-07 is suitable for MSE loss which typically produces small values
Cooldown of 3 epochs provides sufficient stabilization time between LR adjustments
Patience of 5 epochs allows reasonable time to observe genuine improvements vs fluctuations
The min_lr of 1.0e-08 is appropriate as it's an order of magnitude below the threshold

The parameters are also consistent across all configurations in the codebase, suggesting they've been validated through usage.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Compare lr_scheduler settings across configs
fd -e yaml -e yml . -x grep -A 6 "lr_scheduler:"

Length of output: 1682

Script:

#!/bin/bash
# Let's look for loss function implementations and typical values
rg -A 5 "loss|criterion" --type py

# Also check for any training logs or documentation that might show typical loss values
fd "train|log|readme" -e md -e txt -e log

Length of output: 42950

tests/assets/minimal_instance_centroid/training_config.yaml (2)

75-75: LGTM!

The empty bin_files_path is appropriate for a test configuration.

5-6: Verify the data loading parameters.

The new data loading parameters might affect training behavior:

user_instances_only: True restricts to user-created instances
chunk_size: 100 defines the size of data chunks

Please verify that these values are appropriate for the test configuration.

✅ Verification successful

Data loading parameters are consistent with project standards

The parameters user_instances_only: True and chunk_size: 100 are:

Used consistently across all test configurations and documentation
Properly documented in docs/config.md with clear default values
Correctly implemented in the codebase with proper handling in sleap_nn/training/model_trainer.py and data providers
Part of the standard data loading configuration across different model types (centroid, bottomup, etc.)

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other test configurations to ensure consistency
rg -l "user_instances_only|chunk_size" "tests/assets/**/training_config.yaml"

Length of output: 156

Script:

#!/bin/bash
# Let's first find the correct path to test configurations
fd "training_config.yaml" tests/

# Then search for usage of these parameters in the codebase
rg "user_instances_only|chunk_size" -A 2 -B 2

Length of output: 23238

tests/assets/minimal_instance_bottomup/training_config.yaml (3)

5-6: LGTM: Data configuration parameters are well-structured.

The addition of user_instances_only and chunk_size parameters is clear and follows the existing configuration pattern.

85-85: LGTM: Optional path configuration.

The bin_files_path parameter is appropriately added as an optional configuration field.

105-112: Verify scheduler implementation consistency across the codebase.

Let's ensure that both schedulers are properly supported across all configuration files and the corresponding implementation.

✅ Verification successful

Scheduler implementation is consistent across the codebase

The ModelTrainer implementation properly supports both ReduceLROnPlateau and StepLR schedulers, and all configuration files consistently use the same scheduler configuration structure. The changes in the test configuration align with the implementation.

All YAML configs use the same structure for scheduler configuration
ModelTrainer explicitly handles both schedulers and validates scheduler names
Test suite includes coverage for scheduler configuration validation

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for scheduler-related configurations and implementations

# Check all YAML configs for scheduler configurations
echo "Checking YAML configurations..."
fd -e yaml -e yml | xargs rg -l "scheduler|lr_scheduler"

# Check ModelTrainer implementation for scheduler support
echo -e "\nChecking ModelTrainer implementation..."
rg -A 10 "configure_optimizers|lr_scheduler" src/

Length of output: 777

Script:

#!/bin/bash
# Let's check the content of these config files to understand scheduler configurations
echo "Checking scheduler configurations in YAML files..."
for file in $(fd -e yaml -e yml); do
    echo -e "\n=== $file ==="
    rg -A 5 "scheduler:" "$file" || true
done

# Let's also search for Python files that might contain scheduler implementation
echo -e "\nSearching for Python files with scheduler implementation..."
fd -e py | xargs rg -l "scheduler|ReduceLROnPlateau|lr_scheduler"

Length of output: 4879

Script:

#!/bin/bash
# Let's examine the scheduler implementation in the model trainer
echo "Checking ModelTrainer implementation..."
rg -B 2 -A 10 "lr_scheduler|ReduceLROnPlateau" sleap_nn/training/model_trainer.py

echo -e "\nChecking test implementation..."
rg -B 2 -A 10 "lr_scheduler|ReduceLROnPlateau" tests/training/test_model_trainer.py

Length of output: 4612

tests/assets/minimal_instance/training_config.yaml (2)

5-6: Verify the necessity of new data configuration parameters.

These new parameters (user_instances_only, chunk_size, min_crop_size) appear unrelated to the PR's objective of adding StepLR scheduler support. Please clarify their purpose and whether they should be part of this PR.

Also applies to: 15-15

82-82: Verify the necessity of bin_files_path parameter.

The addition of bin_files_path parameter appears unrelated to the StepLR scheduler implementation. Please clarify its purpose and whether it should be part of this PR.

docs/config_topdown_centered_instance.yaml (1)

107-114: ⚠️ Potential issue

Restructure lr_scheduler config to support both ReduceLROnPlateau and StepLR.

The current configuration structure doesn't align with the PR objective of adding StepLR support. Consider restructuring to allow selecting between different scheduler types:

Here's a suggested structure:

lr_scheduler:
  type: "ReduceLROnPlateau"  # or "StepLR"
  params:
    # Common parameters
    factor: 0.5
    min_lr: 1.0e-08
    # ReduceLROnPlateau specific
    threshold: 1.0e-07
    threshold_mode: abs
    cooldown: 3
    patience: 5
    # StepLR specific
    step_size: 30  # number of epochs between learning rate decays

This structure would:

Make it clear which scheduler type is being used
Support both ReduceLROnPlateau and StepLR configurations
Maintain backward compatibility while adding new functionality

Let's verify if any other configuration files need similar updates:

initial_config.yaml (1)

26-52: LGTM: Model configuration is well-structured.

The backbone and head configurations are properly defined with appropriate default values.

docs/config_centroid.yaml (1)

6-6: LGTM: Consistent chunk_size configuration

The addition of chunk_size parameter is well-placed and maintains consistency with other configuration files.

tests/fixtures/datasets.py (1)

51-55: Verify unrelated changes to data provider configuration.

These changes appear unrelated to the StepLR scheduler implementation:

Changed provider from "LabelsReaderDP" to "LabelsReader"
Added new configurations for user_instances_only and chunk_size

sleap_nn/data/streaming_datasets.py (1)

154-155: Performance improvement: Moved crop size scaling to initialization.

Good optimization! Moving the crop size scaling calculation from __getitem__ to __init__ avoids redundant calculations on every item fetch while maintaining the same functionality. This is especially beneficial when processing large datasets.

docs/config.md (1)

179-182: LGTM! Clear and well-documented StepLR configuration.

The new StepLR scheduler configuration is well-structured and includes essential parameters with clear descriptions. The implementation aligns with the PR objectives to add StepLR support.

🧰 Tools

🪛 LanguageTool

[uncategorized] ~180-~180: Loose punctuation mark.
Context: ...ReduceLROnPlateau". - step_lr: - step_size`: (int) Period...

(UNLIKELY_OPENING_PUNCTUATION)

🪛 Markdownlint

179-179: Expected: 4; Actual: 8
Unordered list indentation

(MD007, ul-indent)

180-180: Expected: 4; Actual: 8
Unordered list indentation

(MD007, ul-indent)

181-181: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)

182-182: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)

sleap_nn/training/model_trainer.py (1)

651-674: LGTM: Scheduler selection logic is well-structured.

The implementation correctly:

Handles both StepLR and ReduceLROnPlateau schedulers
Provides clear error messages for invalid scheduler types
Maintains consistent configuration access patterns

tests/assets/minimal_instance_bottomup/initial_config.yaml

tests/assets/minimal_instance_centroid/training_config.yaml

tests/assets/minimal_instance_bottomup/training_config.yaml

tests/assets/minimal_instance/training_config.yaml

docs/config_bottomup.yaml

initial_config.yaml

docs/config_centroid.yaml

tests/fixtures/datasets.py

sleap_nn/training/model_trainer.py

coderabbitai

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (1)

sleap_nn/training/model_trainer.py (1)
660-662: Standardize quotation marks in error message

The error message uses inconsistent quotation marks around the scheduler names. For clarity and consistency, consider standardizing the quotation marks.

Apply this diff to improve the error message formatting:
     raise ValueError(
-        f"{self.trainer_config.lr_scheduler.scheduler} is not a valid scheduler. Valid schedulers: `'StepLR'`, `'ReduceLROnPlateau'`"
+        f"{self.trainer_config.lr_scheduler.scheduler} is not a valid scheduler. Valid schedulers: 'StepLR', 'ReduceLROnPlateau'"
     )

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between fb59c03 and fc4d20a.

📒 Files selected for processing (1)

sleap_nn/training/model_trainer.py (1 hunks)

sleap_nn/training/model_trainer.py

gitttt-1234 changed the base branch from main to divya/add-path-bin-files October 28, 2024 22:11

gitttt-1234 marked this pull request as ready for review October 29, 2024 00:39

gitttt-1234 requested a review from talmo October 29, 2024 00:39

talmo requested changes Oct 29, 2024

View reviewed changes

gitttt-1234 force-pushed the divya/add-path-bin-files branch from 466ae62 to 2dfa585 Compare October 30, 2024 21:29

gitttt-1234 changed the base branch from divya/add-path-bin-files to main October 30, 2024 22:02

gitttt-1234 force-pushed the divya/add-steplr branch 2 times, most recently from d365129 to 2e768aa Compare October 30, 2024 22:09

gitttt-1234 added 7 commits October 30, 2024 15:09

Fix crop size

4e27a65

Add StepLR scheduler

b29beef

Add test for steplr

a9861c4

Add tests for step lr

b424fac

Fix step lr test

c07da77

Fix lr_scheduler config name

ddbf8d7

Add tests for incorrect lr

4b31a91

gitttt-1234 force-pushed the divya/add-steplr branch from 2e768aa to 4b31a91 Compare October 30, 2024 22:14

gitttt-1234 requested a review from talmo October 30, 2024 22:14

Fix test with no scheduler

fb59c03

coderabbitai bot reviewed Oct 30, 2024

View reviewed changes

Remove redundant code

fc4d20a

coderabbitai bot reviewed Oct 30, 2024

View reviewed changes

sleap_nn/training/model_trainer.py Show resolved Hide resolved

sleap_nn/training/model_trainer.py Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add StepLR scheduler #109

Add StepLR scheduler #109

gitttt-1234 commented Oct 28, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 28, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

codecov bot commented Oct 28, 2024 •

edited

Loading

coderabbitai bot left a comment

coderabbitai bot left a comment

Add StepLR scheduler #109

Are you sure you want to change the base?

Add StepLR scheduler #109

Conversation

gitttt-1234 commented Oct 28, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Oct 28, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

codecov bot commented Oct 28, 2024 • edited Loading

Codecov Report

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

gitttt-1234 commented Oct 28, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 28, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov bot commented Oct 28, 2024 •

edited

Loading