-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prune tracking graph API #111
Conversation
I used VSCode "Find all references" to inspect all the calls to all the TrackingGraph functions and removed the following unused elements. - `limit_to` args in nodes and edges. These now literally call networkx.nodes and networkx.edges and could be removed. - `get_nodes_by_roi` function. This is probably unnecessary for metrics computation. - `get_edges_with_attribute` function. This is never used, although `get_nodes_with_attribute` is used in the iou matcher. I suggest refactoring the iou matcher to do the actual task more efficiently and getting rid of both of these functions. This same part of the iou matcher is the only place we use TrackingGraph.get_nodes_in_frame as well. Exceptions include: - `get_locations` function. It was never called and is probably not necessary for computing metrics after matching is over. However, it will be used in the point based matcher. We can revisit after that matcher is implemented. - `get_connected_components` and `get_tracklets` functions. They were never called but will likely be needed for metrics such as Cell Cycle Accuracy. We can revisit after that metric is implemented. - `get_node_attribute` and `get_edge_attribute`. These were just implemented and I will refactor the metrics to use them in the next commit.
The advantage of using this over the prior approach is that these functions assume that if the attribute is not present it is False. Therefore, I also removed all instances where we set an attribute to False for all nodes/edges before flipping some of them to True.
This accompanies the prior commit updating the metrics computations, and additionally updates the tests, since they can no longer assume that False attributes are explicitly annotated.
I vote for changing nodes and edges to properties so that they mirror the networkx api but allow us to avoid the extra
I'm in favor of removing the function. Agnostic to how the change is implemented in the iou matcher.
I don't love the pattern of setting attributes to False by default and then flipping them to True later. It seems like it adds a lot of clutter to the graph. Assuming we stop annotating False attributes, having a function that explicitly encodes the assumption that absent annotations are False seems like a good idea to prevent mistakes down the road.
I like flag for our special attributes and attribute for other annotations. Flag is a bit nicer to look at than attr |
Consider changing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love the pattern of setting attributes to False by default and then flipping them to True later. It seems like it adds a lot of clutter to the graph. Assuming we stop annotating False attributes, having a function that explicitly encodes the assumption that absent annotations are False seems like a good idea to prevent mistakes down the road.
I actually think it's dangerous to maintain ragged attributes on our graphs. Not only is it likely to lead to confusing KeyErrors
down the track for users who grab the networkx graph and use it for downstream processing, but it might also lead someone to believe that a graph has had its errors computed when it hasn't (they query is_fp
for a node and get a False
back so they think errors are annotated). It's also inexplicit which usually leads to confusion down the line. Setting an attribute to a default value and then overriding it as required is a pretty common pattern, and I think it's what we need here.
I also don't think the get_node_attribute
and get_edge_attribute
functions should return False
when the attribute doesn't exist, they should raise an error instead. Also nit, but if these two functions only work for the boolean error attributes, I think that should be reflected in the name of the function. Something like get_node_error_flag
or something?
Thoughts on constructing a dictionary {time: {segmentation_id : node_id}} (technically two dictionaries, one for gt and one for pred) before we enter the for loop to improve efficiency and remove this function?
Big +1. We should definitely avoid trawling through the list of nodes wherever possible. Especially for matching, we should ideally be finding the correct node in O(1)
since we're going to eventually look for all nodes, giving us quadratic growth otherwise.
Had to manually resolve merge conflicts with new Matched object, mostly in tests.
Improve efficiency by creating a pre-computed dictionary of time to segmentation_id to node_id. This avoids calling TrackingGraph functions to find each node with the given label_key in the given frame when constructing the matching tuples, which under the hood loops over all nodes.
This function was overly general and thus inefficient. After refactoring the IOU matcher is was no longer used and could be removed safely.
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #111 +/- ##
==========================================
+ Coverage 89.61% 91.66% +2.04%
==========================================
Files 19 19
Lines 915 828 -87
==========================================
- Hits 820 759 -61
+ Misses 95 69 -26 ☔ View full report in Codecov by Sentry. |
Thanks for the feedback! I'll revert the change and add some documentation regarding this convention. |
Having one function was clunky (had to cast ids specifically to a list to detect one vs many) and inefficient (couldn't leverage networkx functions for setting all attributes). This commit also renames the functions to the format `set_flag_on_node` or `set_flag_on_all_nodes` for maximum clarity. This naming is clear that we are setting one flag on one or many nodes or edges, and gets a head start on using `flag` instead of `attribute` for our custom flags.
We decided that we did not want to assume that missing flags are false, and instead explicitly annotate all flags on all nodes/edges. The only additional functionality the getters provided was to check for missing values and return False. Since we do not want that functionality, we can revert to networkx style access of attributes.
@DragaDoncila @msschwartz21 I think this is ready for review now. Most of the non-TrackingGraph code changes come from the following relatively minor changes:
There is also the IOU matcher refactoring which is a more substantive change but necessary to remove I was going to separate out the graph validation in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm liking how these changes came together. I had a couple questions and potential places where tests could be renamed but they are all minor.
On a separate note, I want to put down a comment in writing in this thread so we have it if we ever look back. We are keeping the set_flag_on_node/edge
and set_flag_on_all_nodes/edges
functions (even though they are easily accessible through the networkx api) because it enables us to maintain dictionaries of nodes/edges by flag for faster lookup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmalinmayor looking good! Left some comments through the code.
self.nodes_by_flag[attr].add(_id) | ||
else: | ||
self.nodes_by_flag[attr].discard(_id) | ||
self.graph.nodes[_id][flag] = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, this function is setting flags on nodes and in the process is updating the self.nodes_by_flag
attribute to ensure that the flag dictionaries are also updated correctly? I'm a little concerned about the fact that we can't enforce (through code), the strict usage of this function for annotating node and edge errors internally. We will have to be very careful in review and documentation - if any metric modifies these attributes, they need to do so exclusively through these functions (set_flag_on_node
, set_flag_on_all_nodes
, etc.) on the TrackingGraph
, or they will invalidate the dictionaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed - luckily we separated error annotation and metric computation, so new metrics that only use the existing annotations will be totally fine. New metrics probably shouldn't modify the values of existing NodeFlags or EdgeFlags. But any metric that adds a new Flag to the NodeFlag or EdgeFlag will have to be carefully checked.
I tried to annotate the return type as a generic Iterable, to match the networkx conventions, but we do call `len` on it, so I stuck to the specific set type annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs look good! Code looks good! I'm happy 😄
This is a first attempt at streamlining the TrackingGraph API as mentioned in #89. See commit messages for details of my thought process.
Questions I want feedback on include:
limit_to
args inTrackingGraph.nodes
andTrackingGraph.edges
, these functions literally just wrapnetworkx.nodes
andnetworkx.edges
. Additionally, their usage is very limited in the whole library. E.g. getting the number of nodes for which we could implement__len__
and setting a NodeAttr to True for all nodes which needs to be changed anyways as described in API Update: TrackingGraph.set_node_attribute and set_edge_attribute are overloaded #110. Shall we remove them?get_nodes_with_attribute
in the IOU matcher here is essentially to find the node with the given label_id in the given time point so we can construct the matching tuples of node_ids. Thoughts on constructing a dictionary {time: {segmentation_id : node_id}} (technically two dictionaries, one for gt and one for pred) before we enter the for loop to improve efficiency and remove this function?TrackingGraph.get_node_attribute
andTrackingGraph.get_edge_attribute
making the assumption that not having a flag means the attribute is negative. Since most of our attributes are rare (exception being True Positive, which we technically don't need to annotate), this saves time setting lots of attributes to False. However, it does lose out on explicitness of the annotations. Would we prefer explicitly annotating every attribute on every node/edge?attribute
toattr
everywhere? Also, flag and attribute are now basically synonymous, since all our supported attributes are True/False. Should we pick one? Use Flag for our reserved ones and Attribute for custom annotations that can be added directly through networkx? General nomenclature thoughts welcome.