Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add StepLR scheduler #109

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Add StepLR scheduler #109

wants to merge 9 commits into from

Conversation

gitttt-1234
Copy link
Contributor

@gitttt-1234 gitttt-1234 commented Oct 28, 2024

Currently, we support ReduceLROnPlateau for reducing learning rate by monitoring the val_loss. In this PR, we add support for StepLR, to have a fixed learning rate schedule (for eg, reducing lr by 0.5 every 30 epochs).

Summary by CodeRabbit

  • New Features

    • Enhanced configuration options for learning rate scheduling and early stopping mechanisms.
    • Introduced new parameters for data processing, including chunk_size and user_instances_only.
    • Added structured configuration for learning rate adjustments using ReduceLROnPlateau.
  • Bug Fixes

    • Improved error handling for unsupported learning rate schedulers in the training process.
  • Documentation

    • Updated configuration documentation to reflect new parameters and improved clarity.
  • Tests

    • Expanded test suite for ModelTrainer to validate new learning rate scheduler configurations and error handling.

Copy link
Contributor

coderabbitai bot commented Oct 28, 2024

Walkthrough

The pull request introduces significant enhancements to the configuration documentation and files related to the sleap_nn.ModelTrainer class. It includes new parameters for learning rate scheduling and early stopping in the documentation, and modifies several YAML configuration files to incorporate structured settings for these features. Additionally, it updates the CenteredInstanceStreamingDataset class to streamline the cropping size calculation and refines the configure_optimizers method in the model_trainer.py file to support various learning rate schedulers.

Changes

File Path Change Summary
docs/config.md - Added trainer_config.lr_scheduler section with new scheduler parameter.
- Expanded lr_scheduler with step_lr and reduce_lr_on_plateau subsections, including multiple parameters.
- Introduced trainer_config.early_stopping section.
docs/config_bottomup.yaml - Added chunk_size: 100 in data_config.
- Updated lr_scheduler to a nested structure under reduce_lr_on_plateau with several parameters.
docs/config_centroid.yaml - Added chunk_size: 100 in data_config.
- Updated lr_scheduler to scheduler: ReduceLROnPlateau with parameters nested under reduce_lr_on_plateau.
docs/config_topdown_centered_instance.yaml - Added chunk_size: 100 and min_crop_size in data_config.
- Updated lr_scheduler to scheduler: ReduceLROnPlateau with parameters nested under reduce_lr_on_plateau.
initial_config.yaml - Introduced comprehensive configuration structure with data_config, model_config, and trainer_config sections.
sleap_nn/data/streaming_datasets.py - Updated CenteredInstanceStreamingDataset to recalculate crop_hw during initialization based on input_scale.
sleap_nn/training/model_trainer.py - Refactored configure_optimizers to support multiple learning rate schedulers, including error handling for invalid configurations.
tests/assets/minimal_instance/initial_config.yaml - Added user_instances_only, chunk_size, and min_crop_size in data_config.
- Introduced bin_files_path in trainer_config.
- Restructured lr_scheduler configuration.
tests/assets/minimal_instance/training_config.yaml - Added user_instances_only, chunk_size, and min_crop_size in data_config.
- Introduced bin_files_path in trainer_config.
- Restructured lr_scheduler configuration.
tests/assets/minimal_instance_bottomup/initial_config.yaml - Added user_instances_only, chunk_size, and bin_files_path in data_config and trainer_config.
- Restructured lr_scheduler configuration.
tests/assets/minimal_instance_bottomup/training_config.yaml - Added user_instances_only, chunk_size, and bin_files_path in data_config and trainer_config.
- Restructured lr_scheduler configuration.
tests/assets/minimal_instance_centroid/initial_config.yaml - Added user_instances_only, chunk_size, and bin_files_path in data_config and trainer_config.
- Restructured lr_scheduler configuration.
tests/assets/minimal_instance_centroid/training_config.yaml - Added user_instances_only, chunk_size, and bin_files_path in data_config and trainer_config.
- Restructured lr_scheduler configuration.
tests/fixtures/datasets.py - Updated provider to "LabelsReader" and added user_instances_only and chunk_size in data_config.
- Restructured lr_scheduler configuration.
tests/training/test_model_trainer.py - Updated tests to validate new learning rate scheduling configurations and exception handling for invalid configurations.

Possibly related PRs

Suggested reviewers

  • talmo

Poem

In the fields where bunnies play,
New configs hop along the way.
With learning rates that dance and twirl,
Our models now can truly whirl!
So let’s configure, tweak, and train,
For in this code, we’ll find our gain! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@gitttt-1234 gitttt-1234 changed the base branch from main to divya/add-path-bin-files October 28, 2024 22:11
Copy link

codecov bot commented Oct 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.37%. Comparing base (f093ce2) to head (fc4d20a).
Report is 20 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #109      +/-   ##
==========================================
+ Coverage   96.64%   97.37%   +0.73%     
==========================================
  Files          23       38      +15     
  Lines        1818     3701    +1883     
==========================================
+ Hits         1757     3604    +1847     
- Misses         61       97      +36     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gitttt-1234 gitttt-1234 marked this pull request as ready for review October 29, 2024 00:39
docs/config.md Outdated Show resolved Hide resolved
docs/config_bottomup.yaml Outdated Show resolved Hide resolved
docs/config_centroid.yaml Outdated Show resolved Hide resolved
docs/config_topdown_centered_instance.yaml Outdated Show resolved Hide resolved
sleap_nn/training/model_trainer.py Outdated Show resolved Hide resolved
tests/assets/minimal_instance_centroid/initial_config.yaml Outdated Show resolved Hide resolved
tests/fixtures/datasets.py Outdated Show resolved Hide resolved
tests/training/test_model_trainer.py Outdated Show resolved Hide resolved
@gitttt-1234 gitttt-1234 changed the base branch from divya/add-path-bin-files to main October 30, 2024 22:02
@gitttt-1234 gitttt-1234 force-pushed the divya/add-steplr branch 2 times, most recently from d365129 to 2e768aa Compare October 30, 2024 22:09
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Outside diff range and nitpick comments (13)
tests/assets/minimal_instance/initial_config.yaml (1)

13-14: Consider parameterizing the crop dimensions.

The crop dimensions are hardcoded to 160x160. Consider making these values configurable through environment variables or command-line arguments for better flexibility.

tests/assets/minimal_instance_bottomup/initial_config.yaml (2)

5-6: Document the new data configuration parameters.

Please add comments explaining:

  • The impact of user_instances_only: True on data processing
  • How chunk_size: 100 affects performance and memory usage

70-70: Document the purpose of bin_files_path.

Please add a comment explaining the purpose of this configuration field and whether an empty value is expected in this test configuration.

docs/config_bottomup.yaml (1)

6-6: Document the purpose of chunk_size parameter.

The newly added chunk_size parameter lacks documentation explaining its purpose and impact on data processing. Consider adding a comment to clarify its usage and recommended values.

docs/config_topdown_centered_instance.yaml (1)

6-6: Document new parameters and provide default value for min_crop_size.

The newly added parameters need documentation:

  1. chunk_size: 100 - What does this value represent and how was it determined?
  2. min_crop_size - This parameter is empty. Please provide a default value or document if it's optional.

Also applies to: 15-15

initial_config.yaml (1)

1-104: Fix file format consistency.

The file uses incorrect newline characters.

Ensure the file uses Unix-style line endings (\n) instead of Windows-style (\r\n).

🧰 Tools
🪛 yamllint

[error] 1-1: wrong new line character: expected \n

(new-lines)

sleap_nn/data/streaming_datasets.py (1)

154-155: Consider adding input validation for input_scale.

To prevent potential issues, consider adding validation for the input_scale parameter in __init__ to ensure it's positive.

 def __init__(
     self,
     confmap_head: DictConfig,
     crop_hw: Tuple[int],
     max_stride: int,
     apply_aug: bool = False,
     augmentation_config: DictConfig = None,
     input_scale: float = 1.0,
     *args,
     **kwargs,
 ):
     """Construct a CenteredInstanceStreamingDataset."""
     super().__init__(*args, **kwargs)
+    if input_scale <= 0:
+        raise ValueError("input_scale must be positive")
 
     self.confmap_head = confmap_head
     self.crop_hw = crop_hw
     self.max_stride = max_stride
     self.apply_aug = apply_aug
     self.aug_config = augmentation_config
     self.input_scale = input_scale
tests/training/test_model_trainer.py (2)

103-110: Enhance test coverage for StepLR scheduler.

While the test configures the StepLR scheduler, it doesn't verify that the learning rate actually decreases according to the schedule. Consider adding assertions to check the learning rate values at different epochs.

Add assertions like this after training:

     trainer = ModelTrainer(config)
     trainer.train()
+    
+    # Verify learning rate schedule
+    df = pd.read_csv(Path(config.trainer_config.save_ckpt_path).joinpath("lightning_logs/version_0/metrics.csv"))
+    initial_lr = config.trainer_config.optimizer.lr
+    # Check that LR is halved every 10 epochs
+    assert abs(df.loc[0, "learning_rate"] - initial_lr) <= 1e-4
+    assert abs(df.loc[10, "learning_rate"] - initial_lr * 0.5) <= 1e-4

337-342: Enhance exception testing for invalid schedulers.

The test only verifies one invalid scheduler case. Consider testing multiple invalid cases and verifying the error message content.

Add more test cases:

     # check exception for lr scheduler
-    OmegaConf.update(config, "trainer_config.lr_scheduler.scheduler", "ReduceLR")
-    with pytest.raises(ValueError):
-        trainer = ModelTrainer(config)
-        trainer.train()
+    invalid_schedulers = ["ReduceLR", "CustomLR", "InvalidScheduler"]
+    for invalid_scheduler in invalid_schedulers:
+        OmegaConf.update(config, "trainer_config.lr_scheduler.scheduler", invalid_scheduler)
+        with pytest.raises(ValueError, match=f"Unsupported scheduler: {invalid_scheduler}"):
+            trainer = ModelTrainer(config)
+            trainer.train()
docs/config.md (3)

179-189: Fix markdown formatting inconsistencies.

The indentation of list items is inconsistent with the rest of the document. Please adjust the indentation to match the document's style:

  • Reduce indentation of scheduler and main sections to 4 spaces
  • Reduce indentation of subsections to 8 spaces

Apply this formatting:

    - `lr_scheduler`
        - `scheduler`: (str) Name of the scheduler to use. Valid schedulers: `"StepLR"`, `"ReduceLROnPlateau"`.
        - `step_lr`:
            - `step_size`: (int) Period of learning rate decay. If `step_size`=10, then every 10 epochs, learning rate will be reduced by a factor of `gamma`.
            - `gamma`: (float) Multiplicative factor of learning rate decay.*Default*: 0.1.
        - `reduce_lr_on_plateau`:
            - `threshold`: (float) Threshold for measuring the new optimum, to only focus on significant changes. *Default*: 1e-4.
            - `threshold_mode`: (str) One of "rel", "abs". In rel mode, dynamic_threshold = best * ( 1 + threshold ) in max mode or best * ( 1 - threshold ) in min mode. In abs mode, dynamic_threshold = best + threshold in max mode or best - threshold in min mode. *Default*: "rel".
            - `cooldown`: (int) Number of epochs to wait before resuming normal operation after lr has been reduced. *Default*: 0
            - `patience`: (int) Number of epochs with no improvement after which learning rate will be reduced. For example, if patience = 2, then we will ignore the first 2 epochs with no improvement, and will only decrease the LR after the third epoch if the loss still hasn't improved then. *Default*: 10.
            - `factor`: (float) Factor by which the learning rate will be reduced. new_lr = lr * factor. *Default*: 0.1.
            - `min_lr`: (float or List[float]) A scalar or a list of scalars. A lower bound on the learning rate of all param groups or each group respectively. *Default*: 0.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~180-~180: Loose punctuation mark.
Context: ...ReduceLROnPlateau". - step_lr: - step_size`: (int) Period...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~183-~183: Loose punctuation mark.
Context: ...*: 0.1. - reduce_lr_on_plateau: - threshold: (float) Thre...

(UNLIKELY_OPENING_PUNCTUATION)

🪛 Markdownlint

179-179: Expected: 4; Actual: 8
Unordered list indentation

(MD007, ul-indent)


180-180: Expected: 4; Actual: 8
Unordered list indentation

(MD007, ul-indent)


181-181: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)


182-182: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)


183-183: Expected: 4; Actual: 8
Unordered list indentation

(MD007, ul-indent)


184-184: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)


185-185: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)


186-186: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)


187-187: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)


188-188: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)


189-189: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)


185-185: null
Spaces inside emphasis markers

(MD037, no-space-in-emphasis)


185-185: null
Spaces inside emphasis markers

(MD037, no-space-in-emphasis)


185-185: Fix emphasis markers in threshold_mode description.

Remove spaces inside emphasis markers in the description of threshold_mode.

Apply this change:

- In * rel * mode, dynamic_threshold = best * ( 1 + threshold ) in max mode or best * ( 1 - threshold ) in min mode.
+ In *rel* mode, dynamic_threshold = best * ( 1 + threshold ) in max mode or best * ( 1 - threshold ) in min mode.
🧰 Tools
🪛 Markdownlint

185-185: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)


185-185: null
Spaces inside emphasis markers

(MD037, no-space-in-emphasis)


185-185: null
Spaces inside emphasis markers

(MD037, no-space-in-emphasis)


179-189: Consider adding examples for each scheduler configuration.

To improve documentation clarity, consider adding example configurations for both StepLR and ReduceLROnPlateau schedulers. This would help users understand how to properly configure these schedulers for their use cases.

Would you like me to generate example configurations for both schedulers?

🧰 Tools
🪛 LanguageTool

[uncategorized] ~180-~180: Loose punctuation mark.
Context: ...ReduceLROnPlateau". - step_lr: - step_size`: (int) Period...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~183-~183: Loose punctuation mark.
Context: ...*: 0.1. - reduce_lr_on_plateau: - threshold: (float) Thre...

(UNLIKELY_OPENING_PUNCTUATION)

🪛 Markdownlint

179-179: Expected: 4; Actual: 8
Unordered list indentation

(MD007, ul-indent)


180-180: Expected: 4; Actual: 8
Unordered list indentation

(MD007, ul-indent)


181-181: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)


182-182: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)


183-183: Expected: 4; Actual: 8
Unordered list indentation

(MD007, ul-indent)


184-184: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)


185-185: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)


186-186: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)


187-187: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)


188-188: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)


189-189: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)


185-185: null
Spaces inside emphasis markers

(MD037, no-space-in-emphasis)


185-185: null
Spaces inside emphasis markers

(MD037, no-space-in-emphasis)

sleap_nn/training/model_trainer.py (1)

Line range hint 675-684: Consider simplifying the return statement structure.

The current implementation has two separate return statements with similar structures. This could be simplified to improve readability and maintainability.

-        if self.trainer_config.lr_scheduler.scheduler is None:
-            return {
-                "optimizer": optimizer,
-            }
-
-        return {
-            "optimizer": optimizer,
-            "lr_scheduler": {
-                "scheduler": scheduler,
-                "monitor": "val_loss",
-            },
-        }
+        config = {"optimizer": optimizer}
+        if self.trainer_config.lr_scheduler.scheduler is not None:
+            config["lr_scheduler"] = {
+                "scheduler": scheduler,
+                "monitor": "val_loss",
+            }
+        return config
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 0cf48a8 and fb59c03.

📒 Files selected for processing (15)
  • docs/config.md (1 hunks)
  • docs/config_bottomup.yaml (2 hunks)
  • docs/config_centroid.yaml (2 hunks)
  • docs/config_topdown_centered_instance.yaml (3 hunks)
  • initial_config.yaml (1 hunks)
  • sleap_nn/data/streaming_datasets.py (1 hunks)
  • sleap_nn/training/model_trainer.py (1 hunks)
  • tests/assets/minimal_instance/initial_config.yaml (3 hunks)
  • tests/assets/minimal_instance/training_config.yaml (4 hunks)
  • tests/assets/minimal_instance_bottomup/initial_config.yaml (3 hunks)
  • tests/assets/minimal_instance_bottomup/training_config.yaml (3 hunks)
  • tests/assets/minimal_instance_centroid/initial_config.yaml (3 hunks)
  • tests/assets/minimal_instance_centroid/training_config.yaml (3 hunks)
  • tests/fixtures/datasets.py (2 hunks)
  • tests/training/test_model_trainer.py (5 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/config.md

[uncategorized] ~180-~180: Loose punctuation mark.
Context: ...ReduceLROnPlateau". - step_lr: - step_size`: (int) Period...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~183-~183: Loose punctuation mark.
Context: ...*: 0.1. - reduce_lr_on_plateau: - threshold: (float) Thre...

(UNLIKELY_OPENING_PUNCTUATION)

🪛 Markdownlint
docs/config.md

179-179: Expected: 4; Actual: 8
Unordered list indentation

(MD007, ul-indent)


180-180: Expected: 4; Actual: 8
Unordered list indentation

(MD007, ul-indent)


181-181: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)


182-182: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)


183-183: Expected: 4; Actual: 8
Unordered list indentation

(MD007, ul-indent)


184-184: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)


185-185: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)


186-186: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)


187-187: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)


188-188: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)


189-189: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)


185-185: null
Spaces inside emphasis markers

(MD037, no-space-in-emphasis)


185-185: null
Spaces inside emphasis markers

(MD037, no-space-in-emphasis)

🪛 yamllint
initial_config.yaml

[error] 1-1: wrong new line character: expected \n

(new-lines)

🔇 Additional comments (21)
tests/assets/minimal_instance_centroid/initial_config.yaml (3)

5-6: Verify if data config changes are intended for this PR.

The addition of user_instances_only and chunk_size parameters appears unrelated to the PR's objective of adding StepLR scheduler support. Please confirm if these changes should be part of this PR.

#!/bin/bash
# Check if these parameters are referenced in related changes
echo "Searching for related changes to user_instances_only and chunk_size..."
rg -l "user_instances_only|chunk_size" --type yaml

84-91: ⚠️ Potential issue

Restructure scheduler config to support both ReduceLROnPlateau and StepLR.

The current changes only implement ReduceLROnPlateau configuration, but the PR's objective is to add StepLR support. Consider restructuring to support both scheduler types:

  lr_scheduler:
-    scheduler: ReduceLROnPlateau
-    reduce_lr_on_plateau:
-      threshold: 1.0e-07
-      threshold_mode: abs
-      cooldown: 3
-      patience: 5
-      factor: 0.5
-      min_lr: 1.0e-08
+    type: "reduce_lr_on_plateau"  # or "step_lr"
+    params:
+      # Common parameters
+      factor: 0.5
+      # ReduceLROnPlateau specific
+      threshold: 1.0e-07
+      threshold_mode: abs
+      cooldown: 3
+      patience: 5
+      min_lr: 1.0e-08
+      # StepLR specific (commented until needed)
+      # step_size: 30
+      # gamma: 0.5

This structure:

  1. Makes scheduler type explicit via the type field
  2. Supports both scheduler types in a clean way
  3. Keeps common parameters at the top level
  4. Maintains backward compatibility
#!/bin/bash
# Check for any existing StepLR references or tests
echo "Searching for StepLR implementation..."
rg "StepLR|step_lr" --type python

64-64: Clarify the purpose of bin_files_path.

This new parameter is added without a value and seems unrelated to the StepLR scheduler implementation. Please provide documentation about its purpose or remove it if not needed for this PR.

tests/assets/minimal_instance/initial_config.yaml (3)

89-96: ⚠️ Potential issue

Add support for StepLR configuration.

The PR's objective is to add StepLR support, but the configuration only includes ReduceLROnPlateau. Consider:

  1. Making the scheduler type configurable rather than hardcoded
  2. Adding a parallel configuration section for StepLR parameters

Here's a suggested structure:

 lr_scheduler:
-    scheduler: ReduceLROnPlateau
+    scheduler: ${scheduler_type}  # Can be "ReduceLROnPlateau" or "StepLR"
     reduce_lr_on_plateau:
       threshold: 1.0e-07
       threshold_mode: abs
       cooldown: 3
       patience: 5
       factor: 0.5
       min_lr: 1.0e-08
+    step_lr:
+      step_size: 30
+      gamma: 0.5
+      last_epoch: -1

Let's check for any existing StepLR configurations in the codebase:

#!/bin/bash
# Search for StepLR-related configurations
rg -i "step_lr|steplr|step_size|gamma"

69-69: Specify a value for bin_files_path or document if it's optional.

The bin_files_path parameter is added but left empty. Please document whether this is an optional parameter.

Let's check how this parameter is used:

#!/bin/bash
# Search for bin_files_path usage to determine if it's required
rg -i "bin_files_path"

15-15: Specify a value for min_crop_size or document if it's optional.

The min_crop_size parameter is added but left empty. This could lead to validation issues if the parameter is required.

Let's check if this parameter is used elsewhere in the codebase:

✅ Verification successful

The empty min_crop_size parameter is valid as it's an optional configuration.

Based on the codebase analysis:

  • min_crop_size is an optional parameter that defaults to None or null in YAML configs
  • It's used in find_instance_crop_size() function to set a minimum boundary for automatic crop size calculation
  • The documentation in config.md confirms it's only used when crop_hw is None
  • Other test configurations also set this to null, indicating this is a common pattern
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for min_crop_size usage to determine if it's required
rg -i "min_crop_size"

Length of output: 1636


Script:

#!/bin/bash
# Search for min_crop_size usage with case-insensitive pattern and include context
rg -i "min.?crop.?size" -A 3 -B 3

Length of output: 8970


Script:

#!/bin/bash
# Search for crop-related configurations in YAML files
fd -e yaml -e yml -x grep -i "min.?crop.?size\|crop" {}

Length of output: 269


Script:

#!/bin/bash
# Look for data configuration or preprocessing files that might use crop size
fd -t f -e py | xargs grep -l -i "crop"

Length of output: 883

tests/assets/minimal_instance_bottomup/initial_config.yaml (1)

92-97: Verify the ReduceLROnPlateau parameters.

Please confirm:

  1. Is threshold: 1.0e-07 sensitive enough for your loss function?
  2. Is cooldown: 3 sufficient to observe stable improvements?
  3. Is patience: 5 appropriate given your typical training dynamics?
  4. Will min_lr: 1.0e-08 allow sufficient model refinement?

Let's check for consistency across other configurations:

✅ Verification successful

Based on the code analysis, I can now provide a final response:

The ReduceLROnPlateau parameters appear appropriate for MSE loss training

The codebase reveals:

  1. The model uses MSE loss (nn.MSELoss()) as the primary loss function
  2. The threshold of 1.0e-07 is suitable for MSE loss which typically produces small values
  3. Cooldown of 3 epochs provides sufficient stabilization time between LR adjustments
  4. Patience of 5 epochs allows reasonable time to observe genuine improvements vs fluctuations
  5. The min_lr of 1.0e-08 is appropriate as it's an order of magnitude below the threshold

The parameters are also consistent across all configurations in the codebase, suggesting they've been validated through usage.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Compare lr_scheduler settings across configs
fd -e yaml -e yml . -x grep -A 6 "lr_scheduler:"

Length of output: 1682


Script:

#!/bin/bash
# Let's look for loss function implementations and typical values
rg -A 5 "loss|criterion" --type py

# Also check for any training logs or documentation that might show typical loss values
fd "train|log|readme" -e md -e txt -e log

Length of output: 42950

tests/assets/minimal_instance_centroid/training_config.yaml (2)

75-75: LGTM!

The empty bin_files_path is appropriate for a test configuration.


5-6: Verify the data loading parameters.

The new data loading parameters might affect training behavior:

  • user_instances_only: True restricts to user-created instances
  • chunk_size: 100 defines the size of data chunks

Please verify that these values are appropriate for the test configuration.

✅ Verification successful

Data loading parameters are consistent with project standards

The parameters user_instances_only: True and chunk_size: 100 are:

  • Used consistently across all test configurations and documentation
  • Properly documented in docs/config.md with clear default values
  • Correctly implemented in the codebase with proper handling in sleap_nn/training/model_trainer.py and data providers
  • Part of the standard data loading configuration across different model types (centroid, bottomup, etc.)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other test configurations to ensure consistency
rg -l "user_instances_only|chunk_size" "tests/assets/**/training_config.yaml"

Length of output: 156


Script:

#!/bin/bash
# Let's first find the correct path to test configurations
fd "training_config.yaml" tests/

# Then search for usage of these parameters in the codebase
rg "user_instances_only|chunk_size" -A 2 -B 2

Length of output: 23238

tests/assets/minimal_instance_bottomup/training_config.yaml (3)

5-6: LGTM: Data configuration parameters are well-structured.

The addition of user_instances_only and chunk_size parameters is clear and follows the existing configuration pattern.


85-85: LGTM: Optional path configuration.

The bin_files_path parameter is appropriately added as an optional configuration field.


105-112: Verify scheduler implementation consistency across the codebase.

Let's ensure that both schedulers are properly supported across all configuration files and the corresponding implementation.

✅ Verification successful

Scheduler implementation is consistent across the codebase

The ModelTrainer implementation properly supports both ReduceLROnPlateau and StepLR schedulers, and all configuration files consistently use the same scheduler configuration structure. The changes in the test configuration align with the implementation.

  • All YAML configs use the same structure for scheduler configuration
  • ModelTrainer explicitly handles both schedulers and validates scheduler names
  • Test suite includes coverage for scheduler configuration validation
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for scheduler-related configurations and implementations

# Check all YAML configs for scheduler configurations
echo "Checking YAML configurations..."
fd -e yaml -e yml | xargs rg -l "scheduler|lr_scheduler"

# Check ModelTrainer implementation for scheduler support
echo -e "\nChecking ModelTrainer implementation..."
rg -A 10 "configure_optimizers|lr_scheduler" src/

Length of output: 777


Script:

#!/bin/bash
# Let's check the content of these config files to understand scheduler configurations
echo "Checking scheduler configurations in YAML files..."
for file in $(fd -e yaml -e yml); do
    echo -e "\n=== $file ==="
    rg -A 5 "scheduler:" "$file" || true
done

# Let's also search for Python files that might contain scheduler implementation
echo -e "\nSearching for Python files with scheduler implementation..."
fd -e py | xargs rg -l "scheduler|ReduceLROnPlateau|lr_scheduler"

Length of output: 4879


Script:

#!/bin/bash
# Let's examine the scheduler implementation in the model trainer
echo "Checking ModelTrainer implementation..."
rg -B 2 -A 10 "lr_scheduler|ReduceLROnPlateau" sleap_nn/training/model_trainer.py

echo -e "\nChecking test implementation..."
rg -B 2 -A 10 "lr_scheduler|ReduceLROnPlateau" tests/training/test_model_trainer.py

Length of output: 4612

tests/assets/minimal_instance/training_config.yaml (2)

5-6: Verify the necessity of new data configuration parameters.

These new parameters (user_instances_only, chunk_size, min_crop_size) appear unrelated to the PR's objective of adding StepLR scheduler support. Please clarify their purpose and whether they should be part of this PR.

Also applies to: 15-15


82-82: Verify the necessity of bin_files_path parameter.

The addition of bin_files_path parameter appears unrelated to the StepLR scheduler implementation. Please clarify its purpose and whether it should be part of this PR.

docs/config_topdown_centered_instance.yaml (1)

107-114: ⚠️ Potential issue

Restructure lr_scheduler config to support both ReduceLROnPlateau and StepLR.

The current configuration structure doesn't align with the PR objective of adding StepLR support. Consider restructuring to allow selecting between different scheduler types:

Here's a suggested structure:

lr_scheduler:
  type: "ReduceLROnPlateau"  # or "StepLR"
  params:
    # Common parameters
    factor: 0.5
    min_lr: 1.0e-08
    # ReduceLROnPlateau specific
    threshold: 1.0e-07
    threshold_mode: abs
    cooldown: 3
    patience: 5
    # StepLR specific
    step_size: 30  # number of epochs between learning rate decays

This structure would:

  1. Make it clear which scheduler type is being used
  2. Support both ReduceLROnPlateau and StepLR configurations
  3. Maintain backward compatibility while adding new functionality

Let's verify if any other configuration files need similar updates:

initial_config.yaml (1)

26-52: LGTM: Model configuration is well-structured.

The backbone and head configurations are properly defined with appropriate default values.

docs/config_centroid.yaml (1)

6-6: LGTM: Consistent chunk_size configuration

The addition of chunk_size parameter is well-placed and maintains consistency with other configuration files.

tests/fixtures/datasets.py (1)

51-55: Verify unrelated changes to data provider configuration.

These changes appear unrelated to the StepLR scheduler implementation:

  1. Changed provider from "LabelsReaderDP" to "LabelsReader"
  2. Added new configurations for user_instances_only and chunk_size
sleap_nn/data/streaming_datasets.py (1)

154-155: Performance improvement: Moved crop size scaling to initialization.

Good optimization! Moving the crop size scaling calculation from __getitem__ to __init__ avoids redundant calculations on every item fetch while maintaining the same functionality. This is especially beneficial when processing large datasets.

docs/config.md (1)

179-182: LGTM! Clear and well-documented StepLR configuration.

The new StepLR scheduler configuration is well-structured and includes essential parameters with clear descriptions. The implementation aligns with the PR objectives to add StepLR support.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~180-~180: Loose punctuation mark.
Context: ...ReduceLROnPlateau". - step_lr: - step_size`: (int) Period...

(UNLIKELY_OPENING_PUNCTUATION)

🪛 Markdownlint

179-179: Expected: 4; Actual: 8
Unordered list indentation

(MD007, ul-indent)


180-180: Expected: 4; Actual: 8
Unordered list indentation

(MD007, ul-indent)


181-181: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)


182-182: Expected: 6; Actual: 12
Unordered list indentation

(MD007, ul-indent)

sleap_nn/training/model_trainer.py (1)

651-674: LGTM: Scheduler selection logic is well-structured.

The implementation correctly:

  • Handles both StepLR and ReduceLROnPlateau schedulers
  • Provides clear error messages for invalid scheduler types
  • Maintains consistent configuration access patterns

docs/config_bottomup.yaml Show resolved Hide resolved
initial_config.yaml Show resolved Hide resolved
initial_config.yaml Show resolved Hide resolved
docs/config_centroid.yaml Show resolved Hide resolved
tests/fixtures/datasets.py Show resolved Hide resolved
sleap_nn/training/model_trainer.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (1)
sleap_nn/training/model_trainer.py (1)

660-662: Standardize quotation marks in error message

The error message uses inconsistent quotation marks around the scheduler names. For clarity and consistency, consider standardizing the quotation marks.

Apply this diff to improve the error message formatting:

     raise ValueError(
-        f"{self.trainer_config.lr_scheduler.scheduler} is not a valid scheduler. Valid schedulers: `'StepLR'`, `'ReduceLROnPlateau'`"
+        f"{self.trainer_config.lr_scheduler.scheduler} is not a valid scheduler. Valid schedulers: 'StepLR', 'ReduceLROnPlateau'"
     )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between fb59c03 and fc4d20a.

📒 Files selected for processing (1)
  • sleap_nn/training/model_trainer.py (1 hunks)

sleap_nn/training/model_trainer.py Show resolved Hide resolved
sleap_nn/training/model_trainer.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants