Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow segmentations to be passed in vector format #114

Open
lupinthief opened this issue Nov 7, 2023 · 5 comments
Open

Allow segmentations to be passed in vector format #114

lupinthief opened this issue Nov 7, 2023 · 5 comments

Comments

@lupinthief
Copy link

  • traccuracy version: 0.0.2
  • Python version: 3.8
  • Operating System: Windows 11

Description

It would be useful for geospatial applications, situations where gt and pred domains don't necessarily match, and to minimise memory use and data storage volumes if segmentations could be passed in vector format rather than as raster/np.array masks.

Looking through the codebase, I think this could be achieved by effectively bypassing the regionprops stage for label extraction and instead passing a pd.DataFrame with label, t, and geometry columns, where the geometry column is a coordinate sequence representing a polygon.

Matching would then require calculating intersections of these polygons. Shapely does this nicely for geodata and would seem the obvious way to go but it could probably be implemented without creating the additional (presumably optional) dependency.

@cmalinmayor
Copy link
Collaborator

Thanks for the interesting idea, we haven't yet considered segmentations that aren't masks. Can you give an example dataset with segmentations in a vector format that we could use for dev/testing?

I don't think this is at the top of our list to support right now, but we can see what the other devs think. I'd be happy to revisit this feature after Version 1 release once we have all the basics of the library implemented. I agree that this change does require being more flexible with the type of the segmentation that we store in the TrackingGraph, so I will keep that in mind as we solidify the API.

@lupinthief
Copy link
Author

Thanks for the response. Absolutely understand that this is probably a bit left-field at the moment, and it may always be. I'm hoping I might be able to do a bit of work on it in the next few weeks.

Here is a sample of the GT dataset I'm working with. It represents iceberg movement around Greenland and contains a json string. If you read it into a df you'll see there's a 'str_geom' column that defines the geometry and columns for ID, parent, t, x, y, z. I'm trying to use btrack to track them automatically.

CI2D3_subset_subset_for_btrack.txt

It looks like the TrackingGraph doesn't care what the segmentation is at the moment, just the loaders and matchers, or am I wrong?

@cmalinmayor
Copy link
Collaborator

It looks like the TrackingGraph doesn't care what the segmentation is at the moment, just the loaders and matchers, or am I wrong?

The segmentation is just stored in the TrackingGraph, with a docstring that says it is a np.array. You are correct that it truly can be any type, as long as the loader and the matcher agree on what it is. Nothing would need to change in the TrackingGraph except we should update the docstring - I more meant that since we are still finalize the API, it is good to know that we should keep them decoupled.

Here is a sample of the GT dataset I'm working with. It represents iceberg movement around Greenland and contains a json string. If you read it into a df you'll see there's a 'str_geom' column that defines the geometry and columns for ID, parent, t, x, y, z. I'm trying to use btrack to track them automatically.

Thank you!! Glad you found this library to help evaluate the results 🙂. If you can write a loader/matcher that works for your data, even if we don't merge it, you should still be able to run the metrics! We are actively working on documenting the meaning of all the metrics more thoroughly, so hopefully soon it will be even more obvious which ones are most relevant for icebergs vs cells.

@DragaDoncila
Copy link
Collaborator

Thanks for this issue @lupinthief! I think exploring other domains for traccuracy could be a cool idea!

It sounds like right now the main difference is the data representation, so I tend to agree with @cmalinmayor that a loader is what would unlock your workflow. If it is possible (and not very inefficient or otherwise impractical) to load your polygons into a dense numpy array, then I think even the matching would work fine - we just need pixel-wise classification. But I realize that may be a non-starter for your data. If that were the case then a matcher would also be required.

@lupinthief
Copy link
Author

Just a quick update on this - I've got a somewhat hacky vector-based loader and matcher working. Converting to numpy arrays would be feasible but, in the long run, seems like an inefficient way to handle and transport these data so I've stuck with vector.
A similar approach may be helpful for issue #134, allowing point to polygon comparisons as well as point to point within a given search radius, which could also be handy. It could be easily adaptable to handle actual IOUs rather than bounding box IOUs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

3 participants