VectorShapesDataset for loading geometries from vector files #458

weiji14 · 2022-03-11T03:19:31Z

The current VectorDataset in torchgeo v0.2.0 returns an image mask, but people might want the actual geometries instead (e.g. for object detection tasks which uses bounding boxes).

This PR moves the geometry loading logic in VectorDataset (handled by fiona) into a _load_shapes method. A new VectorShapesDataset class is then created (subclassed from this modified VectorDataset), which returns a sample like so:

sample = {
    "shapes": shapes,  # the polygon geometries
    "crs": self.crs,  # Coordinate reference system
    "bbox": query,  # Original bounding box query
}

Note that the geometries returned are raw geometries like [(0.0, 0.0), (0.0, 1.0), (1.0, 1.0), (1.0, 0.0), (0.0, 0.0)] (in the case of a polygon), and the user would need to write their own code to convert it into a bounding box tuple like (minx, miny, maxx, maxy) or (x, y, width, height). I've got some code to do this, but want to know whether this VectorShapesDataset should be generic, or actually output those bounding boxes.

Happy to add more tests and/or change this draft implementation. I've just been working on an object detection project that has the bounding box labels in a shapefile/geopackage, and thought it might be useful to have this in torchgeo 😃

May help with the object detection related feature requests at #442 and #454.

First step in making VectorDataset more easily extensible.

Getting the geometry shapes instead of an image mask as with VectorDataset.

adamjstewart

Alternative proposal: would it be possible to keep everything in VectorDataset and instead return both mask AND shapes for all vector datasets? Then every vector dataset could be used for either object detection OR semantic segmentation OR instance segmentation.

P.S. We have a bunch of other object detection and instance segmentation datasets. We should make sure all of these use the same key name and object structure so they are compatible with a single ObjectDetectionTask or InstanceSegmentationTask trainer.

weiji14 · 2022-09-09T14:57:55Z

Gonna close this as I've re-implemented a vectorized (i.e. no for-loop) vector geometry loader/reader in a torch DataPipe class at https://zen3geo.readthedocs.io/en/v0.4.0/api.html#zen3geo.datapipes.pyogrio.PyogrioReaderIterDataPipe. Example tutorials

Semantic segmentation - https://zen3geo.readthedocs.io/en/v0.4.0/vector-segmentation-masks.html, done in 🚸 Walkthrough on rasterizing vector polygons into label masks weiji14/zen3geo#31
Object detection - https://zen3geo.readthedocs.io/en/v0.4.0/object-detection-boxes.html, done in 🚸 Walkthrough on object detection with bounding boxes weiji14/zen3geo#49

Alternative proposal: would it be possible to keep everything in VectorDataset and instead return both mask AND shapes for all vector datasets? Then every vector dataset could be used for either object detection OR semantic segmentation OR instance segmentation.

Not sure if it is ideal to have both mask and shapes returned. Certainly the shapes part will always be needed (which is what this PR was attempting to do). Returning a full mask (technically a raster) for say, a bounding box object detection task (e.g. IDTReeS #201), seems like a waste of GPU memory. The VectorDataset implementation will also need a rethink due to #576 anyways. Two common paths that I see would be:

vector file -> shapely geometry -> box (for object detection)
vector file -> shapely geometry -> mask (for segmentation)

And there's a bunch of other niche tasks we haven't even considered such as oriented bounding box object detection, keypoint detection, graph neural networks, etc that might feed off from the core shapely geometry class.

P.S. We have a bunch of other object detection and instance segmentation datasets. We should make sure all of these use the same key name and object structure so they are compatible with a single ObjectDetectionTask or InstanceSegmentationTask trainer.

Agree on having a standardized name, and I think this should be reflected in #758 😉

weiji14 added 2 commits March 10, 2022 21:18

Put fiona geometry loader into _load_shape

988f8fc

First step in making VectorDataset more easily extensible.

Implement VectorShapesDataset for object detection tasks

5359b7f

Getting the geometry shapes instead of an image mask as with VectorDataset.

github-actions bot added datasets Geospatial or benchmark datasets documentation Improvements or additions to documentation testing Continuous integration testing labels Mar 11, 2022

adamjstewart added this to the 0.3.0 milestone Mar 11, 2022

adamjstewart requested changes Jun 27, 2022

View reviewed changes

adamjstewart modified the milestones: 0.3.0, 0.4.0 Jul 9, 2022

weiji14 closed this Sep 9, 2022

weiji14 deleted the datasets/vector_shapes branch September 9, 2022 14:58

adamjstewart removed this from the 0.4.0 milestone Sep 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VectorShapesDataset for loading geometries from vector files #458

VectorShapesDataset for loading geometries from vector files #458

weiji14 commented Mar 11, 2022 •

edited

Loading

adamjstewart left a comment

weiji14 commented Sep 9, 2022 •

edited

Loading

VectorShapesDataset for loading geometries from vector files #458

VectorShapesDataset for loading geometries from vector files #458

Conversation

weiji14 commented Mar 11, 2022 • edited Loading

adamjstewart left a comment

Choose a reason for hiding this comment

weiji14 commented Sep 9, 2022 • edited Loading

weiji14 commented Mar 11, 2022 •

edited

Loading

weiji14 commented Sep 9, 2022 •

edited

Loading