Prune tracking graph API #111

cmalinmayor · 2023-11-04T02:43:44Z

This is a first attempt at streamlining the TrackingGraph API as mentioned in #89. See commit messages for details of my thought process.

Questions I want feedback on include:

Now that I have removed the limit_to args in TrackingGraph.nodes and TrackingGraph.edges, these functions literally just wrap networkx.nodes and networkx.edges. Additionally, their usage is very limited in the whole library. E.g. getting the number of nodes for which we could implement __len__ and setting a NodeAttr to True for all nodes which needs to be changed anyways as described in API Update: TrackingGraph.set_node_attribute and set_edge_attribute are overloaded #110. Shall we remove them?
The use of get_nodes_with_attribute in the IOU matcher here is essentially to find the node with the given label_id in the given time point so we can construct the matching tuples of node_ids. Thoughts on constructing a dictionary {time: {segmentation_id : node_id}} (technically two dictionaries, one for gt and one for pred) before we enter the for loop to improve efficiency and remove this function?
TrackingGraph.get_node_attribute and TrackingGraph.get_edge_attribute making the assumption that not having a flag means the attribute is negative. Since most of our attributes are rare (exception being True Positive, which we technically don't need to annotate), this saves time setting lots of attributes to False. However, it does lose out on explicitness of the annotations. Would we prefer explicitly annotating every attribute on every node/edge?
Shortening attribute to attr everywhere? Also, flag and attribute are now basically synonymous, since all our supported attributes are True/False. Should we pick one? Use Flag for our reserved ones and Attribute for custom annotations that can be added directly through networkx? General nomenclature thoughts welcome.

I used VSCode "Find all references" to inspect all the calls to all the TrackingGraph functions and removed the following unused elements. - `limit_to` args in nodes and edges. These now literally call networkx.nodes and networkx.edges and could be removed. - `get_nodes_by_roi` function. This is probably unnecessary for metrics computation. - `get_edges_with_attribute` function. This is never used, although `get_nodes_with_attribute` is used in the iou matcher. I suggest refactoring the iou matcher to do the actual task more efficiently and getting rid of both of these functions. This same part of the iou matcher is the only place we use TrackingGraph.get_nodes_in_frame as well. Exceptions include: - `get_locations` function. It was never called and is probably not necessary for computing metrics after matching is over. However, it will be used in the point based matcher. We can revisit after that matcher is implemented. - `get_connected_components` and `get_tracklets` functions. They were never called but will likely be needed for metrics such as Cell Cycle Accuracy. We can revisit after that metric is implemented. - `get_node_attribute` and `get_edge_attribute`. These were just implemented and I will refactor the metrics to use them in the next commit.

The advantage of using this over the prior approach is that these functions assume that if the attribute is not present it is False. Therefore, I also removed all instances where we set an attribute to False for all nodes/edges before flipping some of them to True.

This accompanies the prior commit updating the metrics computations, and additionally updates the tests, since they can no longer assume that False attributes are explicitly annotated.

msschwartz21 · 2023-11-06T18:53:34Z

Now that I have removed the limit_to args in TrackingGraph.nodes and TrackingGraph.edges, these functions literally just wrap networkx.nodes and networkx.edges. Additionally, their usage is very limited in the whole library. E.g. getting the number of nodes for which we could implement len and setting a NodeAttr to True for all nodes which needs to be changed anyways as described in API Update: TrackingGraph.set_node_attribute and set_edge_attribute are overloaded #110. Shall we remove them?

I vote for changing nodes and edges to properties so that they mirror the networkx api but allow us to avoid the extra .graph calls everywhere.

The use of get_nodes_with_attribute in the IOU matcher here is essentially to find the node with the given label_id in the given time point so we can construct the matching tuples of node_ids. Thoughts on constructing a dictionary {time: {segmentation_id : node_id}} (technically two dictionaries, one for gt and one for pred) before we enter the for loop to improve efficiency and remove this function?

I'm in favor of removing the function. Agnostic to how the change is implemented in the iou matcher.

TrackingGraph.get_node_attribute and TrackingGraph.get_edge_attribute making the assumption that not having a flag means the attribute is negative. Since most of our attributes are rare (exception being True Positive, which we technically don't need to annotate), this saves time setting lots of attributes to False. However, it does lose out on explicitness of the annotations. Would we prefer explicitly annotating every attribute on every node/edge?

I don't love the pattern of setting attributes to False by default and then flipping them to True later. It seems like it adds a lot of clutter to the graph. Assuming we stop annotating False attributes, having a function that explicitly encodes the assumption that absent annotations are False seems like a good idea to prevent mistakes down the road.

Shortening attribute to attr everywhere? Also, flag and attribute are now basically synonymous, since all our supported attributes are True/False. Should we pick one? Use Flag for our reserved ones and Attribute for custom annotations that can be added directly through networkx? General nomenclature thoughts welcome.

I like flag for our special attributes and attribute for other annotations. Flag is a bit nicer to look at than attr

src/traccuracy/_tracking_graph.py

msschwartz21 · 2023-11-06T19:00:08Z

Consider changing get_nodes_with_flags to get_nodes_by_flag

DragaDoncila

I don't love the pattern of setting attributes to False by default and then flipping them to True later. It seems like it adds a lot of clutter to the graph. Assuming we stop annotating False attributes, having a function that explicitly encodes the assumption that absent annotations are False seems like a good idea to prevent mistakes down the road.

I actually think it's dangerous to maintain ragged attributes on our graphs. Not only is it likely to lead to confusing KeyErrors down the track for users who grab the networkx graph and use it for downstream processing, but it might also lead someone to believe that a graph has had its errors computed when it hasn't (they query is_fp for a node and get a False back so they think errors are annotated). It's also inexplicit which usually leads to confusion down the line. Setting an attribute to a default value and then overriding it as required is a pretty common pattern, and I think it's what we need here.

I also don't think the get_node_attribute and get_edge_attribute functions should return False when the attribute doesn't exist, they should raise an error instead. Also nit, but if these two functions only work for the boolean error attributes, I think that should be reflected in the name of the function. Something like get_node_error_flag or something?

Thoughts on constructing a dictionary {time: {segmentation_id : node_id}} (technically two dictionaries, one for gt and one for pred) before we enter the for loop to improve efficiency and remove this function?

Big +1. We should definitely avoid trawling through the list of nodes wherever possible. Especially for matching, we should ideally be finding the correct node in O(1) since we're going to eventually look for all nodes, giving us quadratic growth otherwise.

src/traccuracy/_tracking_graph.py

Had to manually resolve merge conflicts with new Matched object, mostly in tests.

Improve efficiency by creating a pre-computed dictionary of time to segmentation_id to node_id. This avoids calling TrackingGraph functions to find each node with the given label_key in the given frame when constructing the matching tuples, which under the hood loops over all nodes.

This function was overly general and thus inefficient. After refactoring the IOU matcher is was no longer used and could be removed safely.

codecov-commenter · 2023-11-28T21:01:36Z

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (6f6406e) 89.61% compared to head (f50eca2) 91.66%.

Files	Patch %	Lines
src/traccuracy/track_errors/_ctc.py	89.28%	3 Missing ⚠️
src/traccuracy/_tracking_graph.py	98.21%	1 Missing ⚠️
src/traccuracy/matchers/_ctc.py	50.00%	1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #111      +/-   ##
==========================================
+ Coverage   89.61%   91.66%   +2.04%     
==========================================
  Files          19       19              
  Lines         915      828      -87     
==========================================
- Hits          820      759      -61     
+ Misses         95       69      -26

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

cmalinmayor · 2023-11-29T15:14:18Z

I actually think it's dangerous to maintain ragged attributes on our graphs. Not only is it likely to lead to confusing KeyErrors down the track for users who grab the networkx graph and use it for downstream processing, but it might also lead someone to believe that a graph has had its errors computed when it hasn't (they query is_fp for a node and get a False back so they think errors are annotated). It's also inexplicit which usually leads to confusion down the line. Setting an attribute to a default value and then overriding it as required is a pretty common pattern, and I think it's what we need here.

Thanks for the feedback! I'll revert the change and add some documentation regarding this convention.

Having one function was clunky (had to cast ids specifically to a list to detect one vs many) and inefficient (couldn't leverage networkx functions for setting all attributes). This commit also renames the functions to the format `set_flag_on_node` or `set_flag_on_all_nodes` for maximum clarity. This naming is clear that we are setting one flag on one or many nodes or edges, and gets a head start on using `flag` instead of `attribute` for our custom flags.

…-tracking-graph

We decided that we did not want to assume that missing flags are false, and instead explicitly annotate all flags on all nodes/edges. The only additional functionality the getters provided was to check for missing values and return False. Since we do not want that functionality, we can revert to networkx style access of attributes.

cmalinmayor · 2023-11-30T16:33:19Z

@DragaDoncila @msschwartz21 I think this is ready for review now. Most of the non-TrackingGraph code changes come from the following relatively minor changes:

Renaming NodeAttr/EdgeAttr to NodeFlag/EdgeFlag
Refactoring set_node/edge_attribute to set_flag_on_node/edge and set_flag_on_all_nodes/edges

There is also the IOU matcher refactoring which is a more substantive change but necessary to remove TrackingGraph.get_nodes_with_attribute.

I was going to separate out the graph validation in the __init__ and add a flag as discussed in #91 but it wasn't a straightforward change (the validation is intertwined with the construction of the nodes_by_flag dict and we don't check edge validity at all, which we might want to add) so I will leave it for a future PR.

msschwartz21

I'm liking how these changes came together. I had a couple questions and potential places where tests could be renamed but they are all minor.

On a separate note, I want to put down a comment in writing in this thread so we have it if we ever look back. We are keeping the set_flag_on_node/edge and set_flag_on_all_nodes/edges functions (even though they are easily accessible through the networkx api) because it enables us to maintain dictionaries of nodes/edges by flag for faster lookup.

src/traccuracy/_tracking_graph.py

src/traccuracy/track_errors/divisions.py

tests/test_tracking_graph.py

DragaDoncila

@cmalinmayor looking good! Left some comments through the code.

src/traccuracy/_tracking_graph.py

DragaDoncila · 2023-12-04T04:54:58Z

src/traccuracy/_tracking_graph.py

-                self.nodes_by_flag[attr].add(_id)
-            else:
-                self.nodes_by_flag[attr].discard(_id)
+        self.graph.nodes[_id][flag] = value


As far as I understand, this function is setting flags on nodes and in the process is updating the self.nodes_by_flag attribute to ensure that the flag dictionaries are also updated correctly? I'm a little concerned about the fact that we can't enforce (through code), the strict usage of this function for annotating node and edge errors internally. We will have to be very careful in review and documentation - if any metric modifies these attributes, they need to do so exclusively through these functions (set_flag_on_node, set_flag_on_all_nodes, etc.) on the TrackingGraph, or they will invalidate the dictionaries.

Agreed - luckily we separated error annotation and metric computation, so new metrics that only use the existing annotations will be totally fine. New metrics probably shouldn't modify the values of existing NodeFlags or EdgeFlags. But any metric that adds a new Flag to the NodeFlag or EdgeFlag will have to be carefully checked.

src/traccuracy/matchers/_iou.py

src/traccuracy/track_errors/divisions.py

tests/test_tracking_graph.py

src/traccuracy/_tracking_graph.py

I tried to annotate the return type as a generic Iterable, to match the networkx conventions, but we do call `len` on it, so I stuck to the specific set type annotation.

cmalinmayor · 2024-01-10T19:23:01Z

Closes #110 and #89

msschwartz21

Docs look good! Code looks good! I'm happy 😄

cmalinmayor added 3 commits November 3, 2023 21:59

Use get_node/edge_attributes in metrics tests

1a79957

This accompanies the prior commit updating the metrics computations, and additionally updates the tests, since they can no longer assume that False attributes are explicitly annotated.

cmalinmayor requested review from msschwartz21, DragaDoncila and bentaculum November 4, 2023 02:43

style(pre-commit.ci): auto fixes [...]

d3d0251

cmalinmayor added the enhancement New feature or request label Nov 4, 2023

msschwartz21 reviewed Nov 6, 2023

View reviewed changes

src/traccuracy/_tracking_graph.py Outdated Show resolved Hide resolved

DragaDoncila reviewed Nov 28, 2023

View reviewed changes

src/traccuracy/_tracking_graph.py Outdated Show resolved Hide resolved

cmalinmayor and others added 6 commits November 28, 2023 15:02

Change nodes and edges to properties

17e1190

Merge branch 'main' into prune-tracking-graph

cdd8009

Had to manually resolve merge conflicts with new Matched object, mostly in tests.

Use get_nodes_with_flag in division metrics

baf79d4

Remove 'get_nodes_with_attribute' from TrackingGraph

3b7c3bc

This function was overly general and thus inefficient. After refactoring the IOU matcher is was no longer used and could be removed safely.

style(pre-commit.ci): auto fixes [...]

c0d5d04

cmalinmayor and others added 10 commits November 29, 2023 14:10

Merge branch 'main' into prune-tracking-graph

2b1b963

style(pre-commit.ci): auto fixes [...]

0886696

Simplify IOU dictionary naming

1e9aa7e

Merge remote-tracking branch 'origin/prune-tracking-graph' into prune…

df06ed6

…-tracking-graph

style(pre-commit.ci): auto fixes [...]

7426439

Fix ruff and mypy complaints

72d143e

style(pre-commit.ci): auto fixes [...]

e88f0ed

Actually fix mypy typing issue

8e2c0e3

cmalinmayor added 2 commits November 29, 2023 15:51

Actually actually fix mypy typing errors

a5e8d62

Change from Node/EdgeAttr to Node/EdgeFlag

6fc9015

cmalinmayor marked this pull request as ready for review November 30, 2023 16:25

cmalinmayor requested a review from DragaDoncila November 30, 2023 16:25

cmalinmayor requested a review from msschwartz21 November 30, 2023 16:33

Add typing annotations to TrackingGraph

99933ed

msschwartz21 requested changes Dec 3, 2023

View reviewed changes

src/traccuracy/_tracking_graph.py Outdated Show resolved Hide resolved

src/traccuracy/track_errors/divisions.py Show resolved Hide resolved

tests/test_tracking_graph.py Outdated Show resolved Hide resolved

DragaDoncila reviewed Dec 4, 2023

View reviewed changes

cmalinmayor commented Dec 5, 2023

View reviewed changes

src/traccuracy/_tracking_graph.py Outdated Show resolved Hide resolved

cmalinmayor and others added 7 commits December 5, 2023 16:07

Return set from TrackingGraph node/edge_by_flag

b68e3a8

I tried to annotate the return type as a generic Iterable, to match the networkx conventions, but we do call `len` on it, so I stuck to the specific set type annotation.

Separate out and test helper function in iou matcher

f8e4973

Test nodes/edges_by_flag dict when updating flags

87dda14

Merge branch 'main' into prune-tracking-graph

a49865a

Remove get from test names for flag setting

528323b

Remove get_preds and get_succs

f036f28

Merge branch 'main' into prune-tracking-graph

f50eca2

cmalinmayor requested a review from msschwartz21 January 10, 2024 19:19

This was linked to issues Jan 10, 2024

API Update: Identify core TrackingGraph functions, eliminate unnecessary code #89

Closed

API Update: TrackingGraph.set_node_attribute and set_edge_attribute are overloaded #110

Closed

msschwartz21 approved these changes Jan 10, 2024

View reviewed changes

cmalinmayor merged commit 3d85480 into main Jan 10, 2024
16 checks passed

cmalinmayor deleted the prune-tracking-graph branch January 10, 2024 19:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prune tracking graph API #111

Prune tracking graph API #111

cmalinmayor commented Nov 4, 2023

msschwartz21 commented Nov 6, 2023

msschwartz21 commented Nov 6, 2023

DragaDoncila left a comment •

edited

Loading

codecov-commenter commented Nov 28, 2023 •

edited

Loading

cmalinmayor commented Nov 29, 2023

cmalinmayor commented Nov 30, 2023

msschwartz21 left a comment

DragaDoncila left a comment

DragaDoncila Dec 4, 2023

cmalinmayor Dec 5, 2023

cmalinmayor commented Jan 10, 2024

msschwartz21 left a comment

Prune tracking graph API #111

Prune tracking graph API #111

Conversation

cmalinmayor commented Nov 4, 2023

msschwartz21 commented Nov 6, 2023

msschwartz21 commented Nov 6, 2023

DragaDoncila left a comment • edited Loading

Choose a reason for hiding this comment

codecov-commenter commented Nov 28, 2023 • edited Loading

Codecov Report

cmalinmayor commented Nov 29, 2023

cmalinmayor commented Nov 30, 2023

msschwartz21 left a comment

Choose a reason for hiding this comment

DragaDoncila left a comment

Choose a reason for hiding this comment

DragaDoncila Dec 4, 2023

Choose a reason for hiding this comment

cmalinmayor Dec 5, 2023

Choose a reason for hiding this comment

cmalinmayor commented Jan 10, 2024

msschwartz21 left a comment

Choose a reason for hiding this comment

DragaDoncila left a comment •

edited

Loading

codecov-commenter commented Nov 28, 2023 •

edited

Loading