Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sleap-io video backend #79

Open
wants to merge 2 commits into
base: aadi/batch-inference-eval
Choose a base branch
from

Conversation

aaprasad
Copy link
Contributor

@aaprasad aaprasad commented Aug 30, 2024

Here we switch to using sleap_io as the video backend for animal data

Summary by CodeRabbit

  • New Features

    • Introduced batch processing for model evaluation and tracking, enhancing efficiency.
    • Improved error handling and user prompts for configuration inputs.
  • Bug Fixes

    • Enhanced dataset and dataloader methods to handle empty datasets, improving robustness.
  • Refactor

    • Streamlined video handling and configuration management for better performance and clarity.
  • Style

    • Improved logging for better user insights during data processing and evaluation.

Copy link
Contributor

coderabbitai bot commented Aug 30, 2024

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes involve refactoring how video files and configurations are managed across various modules. Key modifications include transitioning from list-based to dictionary-based video reader management, enhancing error handling for dataset retrieval, and shifting to batch processing for model evaluations. Additionally, input data normalization and improved logging practices were introduced to enhance clarity and robustness in handling datasets and model outputs.

Changes

Files Change Summary
dreem/datasets/sleap_dataset.py Refactored video handling from a list to a dictionary for video readers, improved error handling in get_instances, and updated destructor for proper resource management.
dreem/inference/eval.py Modified configuration handling in run function to support batch processing with CSV input, improved user prompts for pod index, and retained checkpoint loading conditionally.
dreem/inference/track.py Updated run function to replace checkpoint loading with batch configuration handling, enhanced error prompts, and refined logging for hyperparameters.
dreem/io/config.py Enhanced get_dataset and get_dataloader methods to return None for empty datasets, added logging for warnings, and implemented consistent file sorting for labels and videos.
dreem/models/embedding.py Added normalization for the times tensor in the _learned_temp_embedding method, ensuring consistent input scaling.
dreem/models/gtr_runner.py Simplified video name extraction and updated ground truth track ID handling by using list comprehension instead of tensor methods, improving code clarity.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Config
    participant Model
    participant Dataset

    User->>Config: Request batch configuration
    Config->>Model: Load hyperparameters from CSV
    Model-->>User: Confirm hyperparameters loaded
    User->>Dataset: Request dataset
    Dataset-->>User: Return dataset or None
    User->>Model: Start evaluation with dataset
    Model-->>User: Return evaluation results
Loading

🐰 In the meadow where I hop and play,
New changes come to brighten the day!
With videos neat and datasets bright,
Batch processing makes everything right!
So let’s dance and twirl in delight,
For clearer paths and data in sight! 🌼✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@aaprasad aaprasad changed the base branch from main to aadi/batch-inference-eval August 30, 2024 14:44
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post

Actionable comments posted: 4

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 2af0dd5 and 41454f7.

Files selected for processing (6)
  • dreem/datasets/sleap_dataset.py (4 hunks)
  • dreem/inference/eval.py (1 hunks)
  • dreem/inference/track.py (2 hunks)
  • dreem/io/config.py (6 hunks)
  • dreem/models/embedding.py (1 hunks)
  • dreem/models/gtr_runner.py (2 hunks)
Additional context used
Ruff
dreem/inference/eval.py

30-30: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


56-56: f-string without any placeholders

Remove extraneous f prefix

(F541)

dreem/inference/track.py

100-100: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

dreem/datasets/sleap_dataset.py

163-163: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

dreem/io/config.py

200-200: Undefined name SleapDataset

(F821)


200-200: Undefined name MicroscopyDataset

(F821)


200-200: Undefined name CellTrackingDataset

(F821)


261-261: Undefined name SleapDataset

(F821)


261-261: Undefined name MicroscopyDataset

(F821)


261-261: Undefined name CellTrackingDataset

(F821)

Additional comments not posted (7)
dreem/inference/track.py (1)

137-144: Enhance logging for tracking and results saving.

The logging statements provide valuable insights into the tracking process and results saving. Ensuring that these logs are informative and clear is crucial for debugging and user understanding.

The logging statements are well-placed and provide necessary information about the process flow.

dreem/models/gtr_runner.py (1)

Line range hint 304-325: Simplify video name extraction and improve data storage logic.

The changes to simplify the extraction of video names and the handling of ground truth track IDs are well-thought-out. These modifications enhance clarity and maintainability of the code. However, ensure that the new method of extracting gt_track_id is thoroughly tested to prevent any data integrity issues.

The changes are appropriate and improve the code quality. Ensure thorough testing to confirm the functionality.

dreem/datasets/sleap_dataset.py (2)

109-109: Refactor: Initialize vid_readers dictionary.

The initialization of self.vid_readers as an empty dictionary is a good practice as it prepares the class to manage video readers efficiently. This change aligns with the shift from a list-based to a dictionary-based management system, enhancing performance and memory management.


370-370: Proper Resource Management in Destructor.

The update to the destructor method to iterate over self.vid_readers and close each reader is crucial for preventing resource leaks. This ensures that all video readers are properly closed before the object is garbage collected, aligning with best practices for resource management.

dreem/models/embedding.py (1)

326-326: Normalization Added to _learned_temp_embedding.

The addition of normalization to the times tensor by dividing it by its maximum value is a crucial enhancement. This ensures that the tensor is scaled between 0 and 1, which can prevent issues related to varying scales in the input data. This change is likely to improve the robustness and stability of the model's learning process.

dreem/io/config.py (2)

Line range hint 200-257: Enhanced Robustness in get_dataset.

The update to allow the get_dataset method to return None if the dataset is empty is a significant improvement in error handling. This change ensures that the application can gracefully handle cases where no valid data is available, preventing potential errors in downstream processing. The logging of a warning when the dataset is empty is also a good practice, as it informs the user or developer about the issue.

Tools
Ruff

261-261: Undefined name SleapDataset

(F821)


261-261: Undefined name MicroscopyDataset

(F821)


261-261: Undefined name CellTrackingDataset

(F821)


Line range hint 261-313: Improved Error Handling in get_dataloader.

The modifications to the get_dataloader method to handle cases where the dataset might be None or empty enhance the robustness of data handling. The addition of warnings when the dataset is None or empty is crucial for informing the user about potential issues, preventing errors during data loading. This change aligns with best practices in error handling and user feedback.

Comments failed to post (4)
dreem/inference/eval.py (2)

46-57: Ensure consistent logging and information display.

The logging statements are helpful for debugging and understanding the flow of batch processing. However, the use of an f-string without placeholders in line 56 is unnecessary and should be corrected.

Remove the extraneous f prefix from the logging statement to clean up the code:

- logger.info(f"Using the following tracker:")
+ logger.info("Using the following tracker:")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        hparams = {}

    checkpoint = eval_cfg.cfg.ckpt_path

    logger.info(f"Testing model saved at {checkpoint}")
    model = GTRRunner.load_from_checkpoint(checkpoint)

    model.tracker_cfg = eval_cfg.cfg.tracker
    model.tracker = Tracker(**model.tracker_cfg)

    logger.info("Using the following tracker:")
Tools
Ruff

56-56: f-string without any placeholders

Remove extraneous f prefix

(F541)


29-44: Refactor batch configuration handling and improve error messages.

The changes to handle batch configurations and improve error handling are significant. The use of cfg.keys() should be simplified to key in cfg as suggested by static analysis. Additionally, the error message can be made more user-friendly by providing a clearer prompt when the POD_INDEX is not found.

Apply these changes to improve code readability and error handling:

- if "batch_config" in cfg.keys():
+ if "batch_config" in cfg:

- input(f"{e}. Assuming single run!\nPlease input task index to run:")
+ input(f"Environment variable POD_INDEX not found. Assuming single run!\nPlease input task index to run:")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    # update with parameters for batch train job
    if "batch_config" in cfg:
        try:
            index = int(os.environ["POD_INDEX"])
        except KeyError as e:
            index = int(
                input("Environment variable POD_INDEX not found. Assuming single run!\nPlease input task index to run:")
            )

        hparams_df = pd.read_csv(cfg.batch_config)
        hparams = hparams_df.iloc[index].to_dict()
        _ = hparams.pop("Unnamed: 0", None)

        if eval_cfg.set_hparams(hparams):
            logger.info("Updated the following hparams to the following values")
            logger.info(hparams)
Tools
Ruff

30-30: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

dreem/inference/track.py (1)

99-114: Refactor batch configuration handling and improve error messages.

The changes to handle batch configurations and improve error handling are significant. The use of cfg.keys() should be simplified to key in cfg as suggested by static analysis. Additionally, the error message can be made more user-friendly by providing a clearer prompt when the POD_INDEX is not found.

Apply these changes to improve code readability and error handling:

- if "batch_config" in cfg.keys():
+ if "batch_config" in cfg:

- input(f"{e}. Assuming single run!\nPlease input task index to run:")
+ input(f"Environment variable POD_INDEX not found. Assuming single run!\nPlease input task index to run:")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    # update with parameters for batch train job
    if "batch_config" in cfg:
        try:
            index = int(os.environ["POD_INDEX"])
        except KeyError as e:
            index = int(
                input("Environment variable POD_INDEX not found. Assuming single run!\nPlease input task index to run:")
            )

        hparams_df = pd.read_csv(cfg.batch_config)
        hparams = hparams_df.iloc[index].to_dict()
        _ = hparams.pop("Unnamed: 0", None)

        if pred_cfg.set_hparams(hparams):
            logger.info("Updated the following hparams to the following values")
            logger.info(hparams)
Tools
Ruff

100-100: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

dreem/datasets/sleap_dataset.py (1)

162-167: Error Handling Improvement in get_instances.

The addition of a try-except block to handle FileNotFoundError when accessing video frames is a significant improvement. This change ensures that the application can gracefully handle missing files and continue operation by attempting to load the video again. However, the local variable e is assigned but never used, which is flagged by static analysis tools.

Consider removing the assignment to e since it's not used, or log the exception for debugging purposes:

- except FileNotFoundError as e:
+ except FileNotFoundError:
+     logger.error("File not found: {video_name}")

Committable suggestion was skipped due to low confidence.

Tools
Ruff

163-163: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change video backend from imageio PIL.Image and skimage.io to sleap_io.Video?
1 participant