Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prune tracking graph API #111

Merged
merged 30 commits into from
Jan 10, 2024
Merged

Prune tracking graph API #111

merged 30 commits into from
Jan 10, 2024

Conversation

cmalinmayor
Copy link
Collaborator

This is a first attempt at streamlining the TrackingGraph API as mentioned in #89. See commit messages for details of my thought process.

Questions I want feedback on include:

  1. Now that I have removed the limit_to args in TrackingGraph.nodes and TrackingGraph.edges, these functions literally just wrap networkx.nodes and networkx.edges. Additionally, their usage is very limited in the whole library. E.g. getting the number of nodes for which we could implement __len__ and setting a NodeAttr to True for all nodes which needs to be changed anyways as described in API Update: TrackingGraph.set_node_attribute and set_edge_attribute are overloaded #110. Shall we remove them?
  2. The use of get_nodes_with_attribute in the IOU matcher here is essentially to find the node with the given label_id in the given time point so we can construct the matching tuples of node_ids. Thoughts on constructing a dictionary {time: {segmentation_id : node_id}} (technically two dictionaries, one for gt and one for pred) before we enter the for loop to improve efficiency and remove this function?
  3. TrackingGraph.get_node_attribute and TrackingGraph.get_edge_attribute making the assumption that not having a flag means the attribute is negative. Since most of our attributes are rare (exception being True Positive, which we technically don't need to annotate), this saves time setting lots of attributes to False. However, it does lose out on explicitness of the annotations. Would we prefer explicitly annotating every attribute on every node/edge?
  4. Shortening attribute to attr everywhere? Also, flag and attribute are now basically synonymous, since all our supported attributes are True/False. Should we pick one? Use Flag for our reserved ones and Attribute for custom annotations that can be added directly through networkx? General nomenclature thoughts welcome.

I used VSCode "Find all references" to inspect all the calls to all the
TrackingGraph functions and removed the following unused elements.
- `limit_to` args in nodes and edges. These now literally call
  networkx.nodes and networkx.edges and could be removed.
- `get_nodes_by_roi` function. This is probably unnecessary for metrics
  computation.
- `get_edges_with_attribute` function. This is never used, although
  `get_nodes_with_attribute` is used in the iou matcher. I suggest
refactoring the iou matcher to do the actual task more efficiently and
getting rid of both of these functions. This same part of the iou
matcher is the only place we use TrackingGraph.get_nodes_in_frame as
well.

Exceptions include:
- `get_locations` function. It was never called and is probably not
  necessary for computing metrics after matching is over. However, it
will be used in the point based matcher. We can revisit after that
matcher is implemented.
- `get_connected_components` and `get_tracklets` functions. They were
 never called but will likely be needed for metrics such as Cell Cycle
Accuracy. We can revisit after that metric is implemented.
- `get_node_attribute` and `get_edge_attribute`. These were just
  implemented and I will refactor the metrics to use them in the next
commit.
The advantage of using this over the prior approach is that these
functions assume that if the attribute is not present it is False.
Therefore, I also removed all instances where we set an attribute to
False for all nodes/edges before flipping some of them to True.
This accompanies the prior commit updating the metrics computations,
and additionally updates the tests, since they can no longer assume
that False attributes are explicitly annotated.
@cmalinmayor cmalinmayor added the enhancement New feature or request label Nov 4, 2023
@msschwartz21
Copy link
Collaborator

  1. Now that I have removed the limit_to args in TrackingGraph.nodes and TrackingGraph.edges, these functions literally just wrap networkx.nodes and networkx.edges. Additionally, their usage is very limited in the whole library. E.g. getting the number of nodes for which we could implement len and setting a NodeAttr to True for all nodes which needs to be changed anyways as described in API Update: TrackingGraph.set_node_attribute and set_edge_attribute are overloaded #110. Shall we remove them?

I vote for changing nodes and edges to properties so that they mirror the networkx api but allow us to avoid the extra .graph calls everywhere.

  1. The use of get_nodes_with_attribute in the IOU matcher here is essentially to find the node with the given label_id in the given time point so we can construct the matching tuples of node_ids. Thoughts on constructing a dictionary {time: {segmentation_id : node_id}} (technically two dictionaries, one for gt and one for pred) before we enter the for loop to improve efficiency and remove this function?

I'm in favor of removing the function. Agnostic to how the change is implemented in the iou matcher.

  1. TrackingGraph.get_node_attribute and TrackingGraph.get_edge_attribute making the assumption that not having a flag means the attribute is negative. Since most of our attributes are rare (exception being True Positive, which we technically don't need to annotate), this saves time setting lots of attributes to False. However, it does lose out on explicitness of the annotations. Would we prefer explicitly annotating every attribute on every node/edge?

I don't love the pattern of setting attributes to False by default and then flipping them to True later. It seems like it adds a lot of clutter to the graph. Assuming we stop annotating False attributes, having a function that explicitly encodes the assumption that absent annotations are False seems like a good idea to prevent mistakes down the road.

  1. Shortening attribute to attr everywhere? Also, flag and attribute are now basically synonymous, since all our supported attributes are True/False. Should we pick one? Use Flag for our reserved ones and Attribute for custom annotations that can be added directly through networkx? General nomenclature thoughts welcome.

I like flag for our special attributes and attribute for other annotations. Flag is a bit nicer to look at than attr

@msschwartz21
Copy link
Collaborator

Consider changing get_nodes_with_flags to get_nodes_by_flag

Copy link
Collaborator

@DragaDoncila DragaDoncila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love the pattern of setting attributes to False by default and then flipping them to True later. It seems like it adds a lot of clutter to the graph. Assuming we stop annotating False attributes, having a function that explicitly encodes the assumption that absent annotations are False seems like a good idea to prevent mistakes down the road.

I actually think it's dangerous to maintain ragged attributes on our graphs. Not only is it likely to lead to confusing KeyErrors down the track for users who grab the networkx graph and use it for downstream processing, but it might also lead someone to believe that a graph has had its errors computed when it hasn't (they query is_fp for a node and get a False back so they think errors are annotated). It's also inexplicit which usually leads to confusion down the line. Setting an attribute to a default value and then overriding it as required is a pretty common pattern, and I think it's what we need here.

I also don't think the get_node_attribute and get_edge_attribute functions should return False when the attribute doesn't exist, they should raise an error instead. Also nit, but if these two functions only work for the boolean error attributes, I think that should be reflected in the name of the function. Something like get_node_error_flag or something?

Thoughts on constructing a dictionary {time: {segmentation_id : node_id}} (technically two dictionaries, one for gt and one for pred) before we enter the for loop to improve efficiency and remove this function?

Big +1. We should definitely avoid trawling through the list of nodes wherever possible. Especially for matching, we should ideally be finding the correct node in O(1) since we're going to eventually look for all nodes, giving us quadratic growth otherwise.

src/traccuracy/_tracking_graph.py Outdated Show resolved Hide resolved
cmalinmayor and others added 6 commits November 28, 2023 15:02
Had to manually resolve merge conflicts with new
Matched object, mostly in tests.
Improve efficiency by creating a pre-computed dictionary
of time to segmentation_id to node_id.
This avoids calling TrackingGraph functions to find
each node with the given label_key in the given frame
when constructing the matching tuples, which under the
hood loops over all nodes.
This function was overly general and thus inefficient.
After refactoring the IOU matcher is was no longer used
and could be removed safely.
@codecov-commenter
Copy link

codecov-commenter commented Nov 28, 2023

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (6f6406e) 89.61% compared to head (f50eca2) 91.66%.

Files Patch % Lines
src/traccuracy/track_errors/_ctc.py 89.28% 3 Missing ⚠️
src/traccuracy/_tracking_graph.py 98.21% 1 Missing ⚠️
src/traccuracy/matchers/_ctc.py 50.00% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #111      +/-   ##
==========================================
+ Coverage   89.61%   91.66%   +2.04%     
==========================================
  Files          19       19              
  Lines         915      828      -87     
==========================================
- Hits          820      759      -61     
+ Misses         95       69      -26     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cmalinmayor
Copy link
Collaborator Author

I actually think it's dangerous to maintain ragged attributes on our graphs. Not only is it likely to lead to confusing KeyErrors down the track for users who grab the networkx graph and use it for downstream processing, but it might also lead someone to believe that a graph has had its errors computed when it hasn't (they query is_fp for a node and get a False back so they think errors are annotated). It's also inexplicit which usually leads to confusion down the line. Setting an attribute to a default value and then overriding it as required is a pretty common pattern, and I think it's what we need here.

Thanks for the feedback! I'll revert the change and add some documentation regarding this convention.

cmalinmayor and others added 10 commits November 29, 2023 14:10
Having one function was clunky (had to cast ids specifically to
a list to detect one vs many) and inefficient (couldn't leverage
networkx functions for setting all attributes).

This commit also renames the functions to the format
`set_flag_on_node` or `set_flag_on_all_nodes` for maximum clarity.
This naming is clear that we are setting one flag on one or many
nodes or edges, and gets a head start on using `flag` instead of
`attribute` for our custom flags.
We decided that we did not want to assume that missing flags
are false, and instead explicitly annotate all flags on all
nodes/edges. The only additional functionality the getters
provided was to check for missing values and return False.
Since we do not want that functionality, we can revert to networkx
style access of attributes.
@cmalinmayor cmalinmayor marked this pull request as ready for review November 30, 2023 16:25
@cmalinmayor
Copy link
Collaborator Author

@DragaDoncila @msschwartz21 I think this is ready for review now. Most of the non-TrackingGraph code changes come from the following relatively minor changes:

  • Renaming NodeAttr/EdgeAttr to NodeFlag/EdgeFlag
  • Refactoring set_node/edge_attribute to set_flag_on_node/edge and set_flag_on_all_nodes/edges

There is also the IOU matcher refactoring which is a more substantive change but necessary to remove TrackingGraph.get_nodes_with_attribute.

I was going to separate out the graph validation in the __init__ and add a flag as discussed in #91 but it wasn't a straightforward change (the validation is intertwined with the construction of the nodes_by_flag dict and we don't check edge validity at all, which we might want to add) so I will leave it for a future PR.

Copy link
Collaborator

@msschwartz21 msschwartz21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm liking how these changes came together. I had a couple questions and potential places where tests could be renamed but they are all minor.

On a separate note, I want to put down a comment in writing in this thread so we have it if we ever look back. We are keeping the set_flag_on_node/edge and set_flag_on_all_nodes/edges functions (even though they are easily accessible through the networkx api) because it enables us to maintain dictionaries of nodes/edges by flag for faster lookup.

src/traccuracy/_tracking_graph.py Outdated Show resolved Hide resolved
src/traccuracy/track_errors/divisions.py Show resolved Hide resolved
tests/test_tracking_graph.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@DragaDoncila DragaDoncila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmalinmayor looking good! Left some comments through the code.

src/traccuracy/_tracking_graph.py Outdated Show resolved Hide resolved
self.nodes_by_flag[attr].add(_id)
else:
self.nodes_by_flag[attr].discard(_id)
self.graph.nodes[_id][flag] = value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, this function is setting flags on nodes and in the process is updating the self.nodes_by_flag attribute to ensure that the flag dictionaries are also updated correctly? I'm a little concerned about the fact that we can't enforce (through code), the strict usage of this function for annotating node and edge errors internally. We will have to be very careful in review and documentation - if any metric modifies these attributes, they need to do so exclusively through these functions (set_flag_on_node, set_flag_on_all_nodes, etc.) on the TrackingGraph, or they will invalidate the dictionaries.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - luckily we separated error annotation and metric computation, so new metrics that only use the existing annotations will be totally fine. New metrics probably shouldn't modify the values of existing NodeFlags or EdgeFlags. But any metric that adds a new Flag to the NodeFlag or EdgeFlag will have to be carefully checked.

src/traccuracy/matchers/_iou.py Outdated Show resolved Hide resolved
src/traccuracy/track_errors/divisions.py Show resolved Hide resolved
tests/test_tracking_graph.py Show resolved Hide resolved
@cmalinmayor
Copy link
Collaborator Author

Closes #110 and #89

Copy link
Collaborator

@msschwartz21 msschwartz21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs look good! Code looks good! I'm happy 😄

@cmalinmayor cmalinmayor merged commit 3d85480 into main Jan 10, 2024
16 checks passed
@cmalinmayor cmalinmayor deleted the prune-tracking-graph branch January 10, 2024 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
4 participants