Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for selecting/excluding group slices #5198

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

brimoor
Copy link
Contributor

@brimoor brimoor commented Dec 1, 2024

Change log

  • Adds a select_group_slices(..., flat=False) syntax for selecting group slices w/o flattening
  • Adds a exclude_group_slices() method for excluding group slices (also w/o flattening, as this is the only behavior that makes sense when excluding group slices in general)

TODO

  • Update App to use view.group_slices/view.group_media_types rather than dataset.group_slices/dataset.group_media_types, as the former may no longer always match the latter

Example usage

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart-groups")
session = fo.launch_app(dataset)
view1 = dataset.select_group_slices(["left", "right"], flat=False)

assert view1.media_type == "group"
assert view1.group_slices == ["left", "right"]

# TODO: respect `view1.group_slices` in App
session.view = view1
view2 = dataset.exclude_group_slices("pcd")

assert view2.media_type == "group"
assert view2.group_slices == ["left", "right"]

# TODO: respect `view2.group_slices` in App
session.view = view2

Summary by CodeRabbit

  • New Features

    • Enhanced documentation for grouped datasets, including new sections on creation, manipulation, and usage.
    • Introduced the ExcludeGroupSlices functionality for filtering group slices.
    • Added exclude_group_slices method to the SampleCollection class for excluding specified group slices.
    • New properties and methods in the ViewStage class to control group flattening and media type retrieval.
  • Bug Fixes

    • Improved error handling for cloning dynamic grouped views in the Dataset class.
  • Tests

    • Added new unit tests to verify the functionality of selecting and excluding group slices.

@brimoor brimoor added the feature Work on a feature request label Dec 1, 2024
@brimoor brimoor requested a review from benjaminpkane December 1, 2024 04:58
Copy link
Contributor

coderabbitai bot commented Dec 1, 2024

Walkthrough

This pull request introduces enhancements to the FiftyOne framework's grouped datasets functionality. The changes span multiple files, focusing on improving group slice management, documentation, and dataset view handling. Key modifications include adding methods to exclude group slices, updating the selection of group slices, and enhancing the documentation to provide clearer guidance on working with grouped datasets. The updates aim to provide more flexible and intuitive ways of manipulating and interacting with grouped data collections.

Changes

File Change Summary
docs/source/user_guide/groups.rst Comprehensive documentation update for grouped datasets, including new sections on creation, manipulation, and usage
fiftyone/__public__.py Added ExcludeGroupSlices to public imports
fiftyone/core/collections.py Added exclude_group_slices() method and updated select_group_slices() with flat parameter
fiftyone/core/dataset.py Enhanced dataset cloning logic for grouped views, added error handling
fiftyone/core/stages.py Introduced ExcludeGroupSlices class, added flattens_groups property, updated SelectGroupSlices
fiftyone/core/view.py Refined group slice and media type handling in DatasetView
tests/unittests/group_tests.py Added test_select_exclude_slices() to validate new group slice functionality

Possibly related PRs

Suggested Labels

enhancement, documentation

Suggested Reviewers

  • sashankaryal
  • imanjra

Poem

🐰 Grouped datasets, oh what a delight!
Slices dancing left and right
With methods new and views so clear
FiftyOne's magic now draws near
Code hopping, data taking flight! 🚀

Tip

CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. We would love to hear your feedback on Discord.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Base automatically changed from release/v1.1.0 to main December 6, 2024 15:31
@brimoor brimoor force-pushed the select-group-slices branch from 24359cd to 73a775b Compare December 7, 2024 02:30
@brimoor brimoor changed the title [HOLD] Add support for selecting/excluding group slices Add support for selecting/excluding group slices Dec 9, 2024
@brimoor brimoor changed the base branch from main to develop December 9, 2024 05:50
@brimoor brimoor marked this pull request as ready for review December 9, 2024 05:50
@brimoor brimoor force-pushed the select-group-slices branch from 73a775b to 9b8c693 Compare December 21, 2024 01:10
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
fiftyone/core/stages.py (1)

5037-5040: Consider simplifying the if/else
Use a ternary operator for concise readability.

Proposed change:

-if etau.is_str(self._slices):
-    slices = {self._slices}
-else:
-    slices = set(self._slices)
+slices = {self._slices} if etau.is_str(self._slices) else set(self._slices)
🧰 Tools
🪛 Ruff (0.8.2)

5037-5040: Use ternary operator slices = {self._slices} if etau.is_str(self._slices) else set(self._slices) instead of if-else-block

Replace if-else-block with slices = {self._slices} if etau.is_str(self._slices) else set(self._slices)

(SIM108)

fiftyone/__public__.py (1)

201-201: Remove or utilize the unused import "ExcludeGroupSlices"
Static analysis indicates that "ExcludeGroupSlices" is imported on line 201 but never used. If this import is genuinely unnecessary, you can safely remove it to keep the codebase clean. Otherwise, consider referencing the symbol to avoid the unused-import warning.

🧰 Tools
🪛 Ruff (0.8.2)

201-201: .core.stages.ExcludeGroupSlices imported but unused

Remove unused import

(F401)

tests/unittests/group_tests.py (1)

645-764: Consider extracting common assertion patterns into helper methods.

While the test is comprehensive and well-structured, there are repeated assertion patterns that could be extracted into helper methods to improve maintainability and reduce code duplication. For example:

+ def _assert_group_view_state(self, view, expected_len, expected_slices, expected_media_types):
+     self.assertEqual(len(view), expected_len)
+     self.assertEqual(view.media_type, "group")
+     self.assertSetEqual(set(view.group_slices), set(expected_slices))
+     self.assertDictEqual(view.group_media_types, expected_media_types)
+     self.assertIn(view.group_slice, expected_slices)
+     self.assertIn(view.default_group_slice, expected_slices)

  def test_select_exclude_slices(self):
      dataset = _make_group_dataset()

      # Select slices by name
      view = dataset.select_group_slices(["left", "right"], flat=False)
-     self.assertEqual(len(view), 2)
-     self.assertEqual(view.media_type, "group")
-     self.assertSetEqual(set(view.group_slices), {"left", "right"})
-     self.assertDictEqual(
-         view.group_media_types, {"left": "image", "right": "image"}
-     )
-     self.assertIn(view.group_slice, ["left", "right"])
-     self.assertIn(view.default_group_slice, ["left", "right"])
+     self._assert_group_view_state(
+         view, 
+         expected_len=2,
+         expected_slices=["left", "right"],
+         expected_media_types={"left": "image", "right": "image"}
+     )

This would make the test more maintainable and easier to update if the assertions need to change.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b5a1aab and 9b8c693.

📒 Files selected for processing (7)
  • docs/source/user_guide/groups.rst (3 hunks)
  • fiftyone/__public__.py (1 hunks)
  • fiftyone/core/collections.py (6 hunks)
  • fiftyone/core/dataset.py (2 hunks)
  • fiftyone/core/stages.py (16 hunks)
  • fiftyone/core/view.py (13 hunks)
  • tests/unittests/group_tests.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
fiftyone/__public__.py

201-201: .core.stages.ExcludeGroupSlices imported but unused

Remove unused import

(F401)

fiftyone/core/stages.py

5037-5040: Use ternary operator slices = {self._slices} if etau.is_str(self._slices) else set(self._slices) instead of if-else-block

Replace if-else-block with slices = {self._slices} if etau.is_str(self._slices) else set(self._slices)

(SIM108)

🔇 Additional comments (41)
fiftyone/core/view.py (15)

165-165: Looks good!
This logic correctly returns False when the stage flattens groups.


247-247: No issues found.
This fallback expression is appropriate.


274-274: Straightforward return statement
This line upholds clarity by returning the group media types’ keys.


284-284: Minor note
The code cleanly returns the group media types without issues.


294-296: No issues found
Returning the slice from the grouped dataset if the default slice doesn’t exist is logical.


1563-1563: No concerns
Context variable "_found_flattened_videos" is introduced without any apparent problems.


1579-1584: Looks correct
The code checks whether groups are flattened, ensuring correct frame reattachment logic.


1612-1612: Comment block only
No review comment needed.


1617-1617: Comment only
No direct code to review.


1651-1652: Logic is appropriate
Manually attaching frames after group lookup is valid.


1704-1704: No issues
Correctly setting the group slice.


1817-1825: Code reads well
This handles updated group media types and adjusts the slice if needed.


1833-1835: Simple setter
No problems with the group slice assignment.


1939-1949: All good
The function iterates over stages, updating group media types. No issues.


1976-1976: Trivial single-line snippet
No further remarks.

fiftyone/core/stages.py (8)

101-112: Clear docstring
Property explains when the stage does or does not flatten groups.


225-240: Method is well-defined
Provides group media types or returns None. Looks consistent.


Line range hint 4581-4654: Documentation only
Explains SelectGroupSlices with examples. No code issues found.


4666-4668: Docstring lines
No issues with these doc comments.


4932-5082: Implementation looks coherent
This new class ExcludeGroupSlices is consistent with the codebase.

🧰 Tools
🪛 Ruff (0.8.2)

5037-5040: Use ternary operator slices = {self._slices} if etau.is_str(self._slices) else set(self._slices) instead of if-else-block

Replace if-else-block with slices = {self._slices} if etau.is_str(self._slices) else set(self._slices)

(SIM108)


8847-8847: No actionable code
Just a list containing ExcludeGroupSlices.


8895-8895: No actionable code
Simple registry addition for ExcludeGroupSlices.


8912-8912: No actionable code
Again, references the new class in the registry.

fiftyone/core/dataset.py (1)

8494-8496: Enforce restriction on dynamic grouped views
This check effectively prevents cloning datasets that rely on dynamic grouping logic. By raising an exception here, the code avoids deeper inconsistencies in the pipeline. The implementation is straightforward and properly guards against a feature that isn't supported.

docs/source/user_guide/groups.rst (14)

818-819: Clarify the usage of flat vs. grouped views
These lines efficiently explain how users can select individual slices from their grouped datasets and flatten them into ungrouped collections. The documentation is concise and appears accurate.


846-846: Highlighting multi-slice selection
This line clarifies that you can select multiple slices (e.g., "left" and "right"). It is clear and helpful.


885-891: Emphasize that prior filters don't affect unseen slices
These lines accurately warn that filtering a single slice doesn't automatically carry over to other slices. This is a crucial note for new users.


892-901: Encourage filtering after flattening
This explanation neatly instructs users to apply filters post-flattening if their goal is to filter slices collectively. It's well-structured.


902-909: Good demonstration of layering multiple filters
The code snippet clarifies how to refine the flattened collection further with additional filters. No issues noted.


910-911: Reiterating usage of select_group_slices with flat=False
This reference is consistent with earlier docs. Clear and consistent.


912-912: Small segue to next section
This line is functioning as a transition. No issues.


917-917: Loading a grouped view while excluding slices
Providing an example that merges well with the new features. Good clarity.


919-920: Maintaining grouped media type
Line references demonstrate the dataset remains “group” after excluding a slice. Looks accurate.


922-923: New subheading for excluding slices
The heading is appropriately placed and clearly states the feature’s purpose.


924-928: Intro to exclude_group_slices
The explanation is direct and succinct. Perfect for users who need to exclude certain slices.


929-929: Transition line
No issues found with this bridging text.


934-935: Excluding a single slice
Well-structured code snippet showcasing removing "center." The example is consistent with the new method.


936-937: Validation of new slices post exclusion
The pair of assertions is straightforward, verifying the reduced set of slices.

tests/unittests/group_tests.py (1)

591-602: LGTM! Well-structured schema validation test.

The test effectively validates that schema changes are maintained when selecting active slices, with thorough checks for field removal from both sample and frame schemas.

fiftyone/core/collections.py (2)

4621-4694: LGTM! Well-implemented group slice exclusion functionality

The new exclude_group_slices() method provides a clean and intuitive way to exclude specific group slices or media types from a collection. The implementation is robust and the documentation includes clear examples.


Line range hint 6671-6773: LGTM! Good enhancement to select_group_slices with flat parameter

The addition of the flat parameter to select_group_slices() provides more flexibility by allowing users to choose between flattened and grouped views. The implementation maintains backward compatibility by defaulting to flat=True.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Work on a feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant