Semantic Segmentation messages #71

pepisg · 2022-06-28T19:59:12Z

In this PR I aim to add a semantic segmentation message. I'm aware of the discussion going on on #63, however I think using images as masks may leave behind important metadata like the class_map (the name of the class corresponding to each ID). Also, using several channels in the image for sending the mask and the confidence of each pixel could lead to varying implementations that may not be compatibles with standard functionalities relying on semantic segmentation like the in progress semantic segmentation costmap plugin PR in nav2.

…/semantic-segmentation

mintar · 2022-06-28T23:03:19Z

I agree that having additional metadata would be useful, but is that worth the disadvantages I've outlined in #63 (comment)?

SteveMacenski · 2022-06-29T00:52:53Z

@mintar totally agree. Looking at the proposal, I don't know how to massage all that content into a sensor_msgs/Image. Do you have a suggestion for that? For instance the confidence of the predictions and the threshold the algorithm is told to use about whether or not something is sufficiently confident for use. class_map also seems important to know what means what -- though I think that could be a potentially massive thing that might not be ideal to embed into the messages (maybe a camera_info -> semantic_info eq. is needed in addition the Image to contain that info)?

I suppose the confidence_threshold could be a ROS parameter for the end user without too much of an issue.

Between parameters and a semantic_info topic with the class maps, I think that handles quite a bit of the issue. The only thing that I'm not sure how to model with Image is the confidences.

Maybe we use a 2 channel image format, 1 for the class and a second for the confidence? Does the CV tools work for 2 channel images or would we need to add in a 3rd for visualization purposes. I suppose the 3rd channel could be threshold in case there are different thresholds for different classes / sections of the images, but I'm not sure if that's a 'normal' thing to do or I'm just coming up with ideas to try to fill space.

mintar · 2022-06-29T10:49:19Z

msg/SemanticSegmentation.msg

+
+# the confidence of the inference of each pixel
+# between 0-100%
+uint8[] confidence 


Why not use the full range 0-255 here?

That's not standard for the output of AI libraries, they usually give out a probability value. It would be a bit unnatural to convert it to 0-255 for use (e.g. I want to use predictions that are at least 80% confident). I'd argue this should actually be a float (?) from 0-1.

It could be a float, but does it really make a difference to have a confidence of say 60% (uint8) or 60.5% (float) knowing that this would increase the space used by this array by 4?

Specially on an array field that is expected to contain a lot of elements. On a 640x360 mask changing this for a float32 would mean an increase of 640x360x3 bytes, almost 700kB per message with respect to uint8

You should make sure to explain these are uints not floats, otherwise people will just put their 0-1 percentages from TF or something into it and it'll always render as 0 since it needs to be multiplied by 100 and then cast to an int (in the comment above)

mintar · 2022-06-29T10:53:02Z

msg/SemanticClass.msg

+
+# Integer value corresponding to the value of pixels belonging
+# to a given class in a segmentation mask
+uint16 class_id


Why is this a uint16 when the data field in SemanticSegmentation.msg is only uint8?

Limiting the number of classes to 255 is probably bad, but doubling the message size for the segmentation image by making it a uint16 is equally bad. Anyhow, the two should be consistent.

On second thought, it's probably better to leave it as uint16. If we use a sensor_msgs/Image as the segmentation image format, we can choose between mono8 and mono16 encodings as appropriate there.

Can't speak to that exactly without his comment, but there can easily be more than 255 class values in segmentation algorithms.

My bad, they must be consistent. I will use uint16 for the final version of the PR

mintar · 2022-06-29T10:58:53Z

msg/SemanticClass.msg

@@ -0,0 +1,10 @@
+# A key value pair that maps an integer class_id to a string class label.
+# The class_id should be interpreted as the pixel value corresponding to a 


There's trailing whitespace in this line which makes the tests fail. Also both files are missing a newline at the end of the file.

mintar · 2022-06-29T11:26:01Z

@SteveMacenski wrote:

@mintar totally agree. Looking at the proposal, I don't know how to massage all that content into a sensor_msgs/Image. Do you have a suggestion for that? For instance the confidence of the predictions and the threshold the algorithm is told to use about whether or not something is sufficiently confident for use. class_map also seems important to know what means what -- though I think that could be a potentially massive thing that might not be ideal to embed into the messages (maybe a camera_info -> semantic_info eq. is needed in addition the Image to contain that info)?

I don't think the class_map would be that massive. Let's be generous and assume 20 characters per class name, and let's assume we have 255 classes. Then the whole array would be 255 * (20 + 1) = 5355 bytes. A 640x480 uint8 image has 307200 bytes, so the class_map would be around 1.7% of that. On the other hand, 5kb overhead is not nothing, so I'd like to hear better ideas.

BTW, how to handle the mapping from int to class string is also an open issue with the existing messages (Detection3DArray etc.). I currently handle that by having a yaml file with the mapping that I load into the namespace of every single node that requires it. I would prefer it though if it was available as a message - that way if you record a rosbag, it would reflect the mapping at the time the rosbag was recorded.

I like the idea of publishing all the metadata on a "semantic_info" topic, similar to the camera_info topic. If we're really worried about the overhead of the class_map, perhaps we don't publish the semantic_info in sync with each frame, but instead only whenever it changes on a latched topic?

Maybe we use a 2 channel image format, 1 for the class and a second for the confidence? Does the CV tools work for 2 channel images or would we need to add in a 3rd for visualization purposes. I suppose the 3rd channel could be threshold in case there are different thresholds for different classes / sections of the images, but I'm not sure if that's a 'normal' thing to do or I'm just coming up with ideas to try to fill space.

Here's a list of all OpenCV image encodings: sensor_msgs/image_encodings.h

You can have 2-channel images, like 8UC2 (8 bit, unsigned, 2 channels) or 16UC2, and OpenCV handles them just fine. I'm not sure how well image_transport (for on-the-fly PNG or JPG compression) or visualization tools like RViz handle these more exotic formats. Or all the other stuff like image_proc, image_geometry and so on.

Alternatively, the 2 images (semantic segmentation and confidence) could be published on 2 different topics. That would have the following advantages:

You can use different encodings for the images. For example, mono8 or mono16 for the semantic segmentation image (depending whether you have more than 255 classes or not), and mono8 or 32FC1 or whatever for the confidence image.
You could also have one confidence image per class if that's important for your application
You'd have better support for image_transport and visualization tools (see above).

SteveMacenski · 2022-06-29T17:59:45Z

5kb overhead is not nothing, so I'd like to hear better ideas.

Have a semantic_info topic / message like we have a camera_info topic / message for metadata regarding the camera. Now its metadata regarding the AI system. That way we can only grab this once (or pair up with the sensor_msgs/Image) and let us use the Image message without class definitions in it. That's not uncommon in my experience with camera_info to just grab 1 message and then stop subscribing to it.

would prefer it though if it was available as a message

Maybe then this is more generic than just a semantic_info topic but a vision_info topic? Latched seems fine to me!

the 2 images (semantic segmentation and confidence) could be published on 2 different topics.

That is interesting. I wonder about the run-time impacts of that since anything about 1-2MB in size has a hard time publishing without zero-copy or composition. I'm not sure if its better to have one 2MB message or two 1MB messages going into the same node. I suspect 1 message is better from some internal Samsung projects I'm helping with, but I don't have the data specifically to support that at the moment.

pepisg · 2022-06-29T20:20:52Z

I agree with the idea of using sensor_msgs/msg/Image instead. From what you have been discussing, this is an sketch of how a semantic_info message would look like

vision_msgs/LabelMap[] label_map # uint16 - string key value pairs
float32 confidence_threshold

This message seems to also work for object detection models using the other messages on vision_msgs. int64 is used for ids there, but can there really be more than 65536 classes?
If using a single multichannel image for packing the mask and the confidence, I think it may be worth adding a field to somehow tell how the channels in the image are organized, this way it would not be necessary to define a fixed convention to use segmentation data with other ROS packages like nav2. This however would break the idea of having a message to fit AI applications

With that in mind, when things are settled I can reformat this PR to include these messags

SteveMacenski · 2022-06-30T18:11:49Z

VisionInfo maybe a good use for the message name if we plan to use this for detections and segmentations?

I think uint16 is a good bet for the moment and the current state of AI. If in another decade our magic AIs can know millions of things, we can always increase it.

pepisg · 2022-07-01T16:25:32Z

Should we settle for two distinct images for the mask and confidence? Or should we use a single dual-channel image and define a convention to specify which channel should contain what? This could be overcame by adding a metadata field to the VisionInfo message to make explicit the channel order, or applications relying on semantic segmentation data (like the segmentation plugin for nav2) could take in some parameter in their configuration to let them know where to look.

The only drawback of using two images seems to be the potential impact of publishing 2 data intensive messages instead of one. That would also force the message consumer to synchronize them (which can be done with a message filter, but drifts from ros standard implementation). This approach seems to be more flexible and does not rely on assumed conventions nor requires extra configuration parameters on the consumers.

Let me know what you think to fix the PR and adjust the work on the segmentation plugin

mintar · 2022-07-01T16:40:13Z

I'd prefer two separate image topics instead of a two-channel image.

SteveMacenski · 2022-07-01T18:24:47Z

I prefer a single 2-channel image over 2 separate images 😆

Maybe we need to ask @tfoote what his thoughts are. He usually brings up very good points when I propose interface definitions.

mintar · 2022-07-01T18:57:38Z

One thing that speaks for two separate images is that you can have separate data types. You wrote yourself that it's possible to have more than 255 classes. In that case, the segmentation image would need to be a uint16. If you have a single 2 channel image, that would mean the confidence image must also be uint16. With separate images, you could always use uint8 for the confidence image and uint8 or uint16 for the segmentation image, depending on the number of classes that you have.

SteveMacenski · 2022-07-01T23:35:13Z

The option behind door number three is that we actually don't include confidence information at all, and make as part of the Interface API that any pixel below a certain threshold should be set to -1, meaning invalid, as part of populating the image. Therefore, when any consumer of this work is parsing the image, any value of -1 means it was too low of confidence for use, provided by the publisher node. Just push that logic down.

mintar · 2022-07-02T06:06:26Z

We can't do -1 with an unsigned int, but in semantic segmentation images "0" usually means "unknown class". The confidence image is still nice extra info to have.

mintar · 2022-07-02T07:11:27Z

Another advantage of two separate topics: Not all subscribers may be interested in the confidence values, so they don't even need to subscribe to that topic.

SteveMacenski · 2022-07-05T20:21:32Z

We can't do -1 with an unsigned int, but in semantic segmentation images "0" usually means "unknown class". The confidence image is still nice extra info to have.

Well -1 is wrapped around in unsigned ints, so it would be max. I wouldn't want to override 0 since that could be a meaningful value for a detection/segmentation algorithm and it would be annoying if we had to have folks increment all their outputs by 1 to meet the standard the interface defines. Its far less disruptive / likely that a collision would occur at 65,536

I suppose we could do both; publish the confidence image and also set the precedence that 65,536 is a reserved value for not confident enough, if the publisher can know that value. That would get both worlds, but also require algorithms implementing this to act about both, if available.

tfoote · 2022-07-05T20:35:36Z

I prefer a single 2-channel image over 2 separate images laughing

Maybe we need to ask @tfoote what his thoughts are. He usually brings up very good points when I propose interface definitions.

The primary reason for choosing to send one multi channel image versus 2 separate images would be if they generally would be used usefully separately. The simplest examples are stereo images. If you want just a monocular camera stream, you don't want to incur the overhead of subscribing to the full stereo image just to ignore the second half of the stream. On the flip side there is a small amount of overhead related to synchronizing incoming images, and they have to have enough embedded content to be useful on their own (aka they both have full headers). We have tools to make it easier, but the extra overhead is why we don't send everything decoupled.

The only drawback of using two images seems to be the potential impact of publishing 2 data intensive messages instead of one. That would also force the message consumer to synchronize them (which can be done with a message filter, but drifts from ros standard implementation). This approach seems to be more flexible and does not rely on assumed conventions nor requires extra configuration parameters on the consumers.

I would disagree and say that publishing the synchronized image messages is the ROS standard. We do it for stereo images, compressed images, and camera_info.

And if you're publishing two images instead of one they should be half the size each as they will have half the data each (or some ratio depending on the datatypes)

The other major concern in this sort of discussion is encapsulation. You want to make sure that messages generally can stand on their own without knowing the context that they were broadcast in. Such that if you get a rosbag of some random data it will have everything that you need to process it later.

mintar · 2022-07-06T08:45:37Z

@SteveMacenski wrote:

Well -1 is wrapped around in unsigned ints, so it would be max. I wouldn't want to override 0 since that could be a meaningful value for a detection/segmentation algorithm and it would be annoying if we had to have folks increment all their outputs by 1 to meet the standard the interface defines. Its far less disruptive / likely that a collision would occur at 65,536

I've checked what other semantic segmentation / panoptic segmentation datasets are using, and most of them seem to use 255 for the "UNLABELED" class (bdd100k semantic, COCO-Stuff), although some use 0 (bdd100k panoptic, CityScapes). So I agree we should probably use 255 here. The only trouble is that "-1" means 255 if the semantic segmentation image encoding is mono8, and "-1" means 65,536 if the encoding is mono16. I would like to allow both encodings for the segmentation image.

@tfoote wrote:

The primary reason for choosing to send one multi channel image versus 2 separate images would be if they generally would be used usefully separately. [...] And if you're publishing two images instead of one they should be half the size each as they will have half the data each (or some ratio depending on the datatypes).

If we include the UNLABELED class (which we absolutely should), then the segmentation image is very useful on its own. I think this probably covers at least 80% of use cases; most clients probably won't care about the confidence image. So that would speak for having two separate image topics.

SteveMacenski · 2022-07-06T18:23:21Z

I'd like to bring up though that we're now talking about having to synchronize 3 topics potentially (vision_info, segmentation_mask, segmentation_mask_confidences) for a "typical and full" implementation. That seems to be approaching excessiveness, even though I can live with the individual technical points that leads to that outcome.

I'm not sure I agree that they would both be useful separately. The segmentation mask without confidences could be interesting, but the confidences without the segmentation mask is not - unlike the stereo example where either image feed could be individually interesting without the other.

mintar · 2022-07-06T18:42:46Z

If we go with the latched topic for the vision messages, we only need to synchronize 2 topics, and only those clients that are interested in the confidences. You're right that the confidence image is not interesting on its own, but Tully's argument still holds: you save 50% data in a lot of use cases by having 2 topics.

SteveMacenski · 2022-07-06T19:51:02Z

There would need to be a level of documentation required for that workflow -- e.g. it would not be obvious to anyone just looking at the vision_info message / topic that it would be a transient local QoS published 1 time at initialization and cannot be synchronized with the mask/detections. While camera_info you can do a same thing with, that's being constantly published with the images too. So this would be a slightly different design pattern.

I'd think to be analog to existing similar things, we'd want to publish with the masks/detections as well, regardless of QoS. Those that want to grab it 1 time to use forever can do so, but then others that want it synchronized for whatever reason can, just like camera_info. I think we have to assume that some people will want to do that, which would involve 3 synchronized topics.

mintar · 2022-07-06T21:48:09Z

Publishing vision_info in sync is fine with me!

pepisg · 2022-07-08T20:18:53Z

To sum up, the consensus seems to be using 2 sensor_msgs/msg/Image along with a new vision_info message with the following structure:

vision_msgs/LabelMap[] label_map # uint16 - string key value pairs
float32 confidence_threshold

I will refactor this PR to only include the new vision_info and label_map messages.

Additionally according to @SteveMacenski suggestion some documentation should be added describing how:

The -1 value in the segmentation mask, which would end up being the greatest unsigned value of the chosen encoding, means the pixel belongs to the UNLABELED class
The workflow for getting vision_info messages when published using a transient local qos
The workflow for synchronizing three topics containing a class mask, a confidence mask and the vision_info

Does that sound good @SteveMacenski @mintar ?

Also, please let me know what would be the best place to add the documentation

mintar · 2022-07-09T11:58:39Z

Sounds good! Comments on how to interpret the data fields (for example, the UNLABELED class) should go into the message comments of that field. More extensive documentation about the workflows should go into the README, IMO.

SteveMacenski · 2022-07-11T19:04:36Z

Agree, or maybe add in a vision_msgs_example package that has some trivial example node subscribing & synchronizing the 3 just so there's some boilerplate folks can work with (or just a snippet in the readme is fine)

pepisg · 2022-07-13T20:57:27Z

Hi @mintar, I just realized there is already a VisionInfo message defined here. Should I update it with what was discussed here or should I create another one called say InferenceInfo? I will do the latter since I imagine changing an interface can break people's code easily, but I will change back in case you prefer to do so

mintar · 2022-07-14T09:03:18Z

Huh, I forgot about the existing VisionInfo. I agree that we can't change it (adding fields is usually ok, but not completely remodeling the message), so we should have a new message. I'm not fond of the name InferenceInfo, but I can't think of a better name. @SteveMacenski , @Kukanani: Any ideas?

mintar · 2022-07-14T09:06:42Z

README.md

@@ -47,6 +47,14 @@ http://wiki.ros.org/message_filters#Policy-Based_Synchronizer_.5BROS_1.1.2B-.5D)
 in your code, as the message's header should match the header of the source
 data.

+Semantic segmentation pipelines should use `sensor_msgs/Image` messages for publishing segmentation and confidence masks. This allows to use all the ROS tools for image processing easily and to choose the most lightweight encoding for each type of message. To transmit the metadata associated with the vision pipeline the [`/vision_msgs/InferenceInfo`](msg/InferenceInfo.msg) message can be used analogously to how `/sensor_msgs/CameraInfo` message and `/sensor_msgs/Image` are:
+
+1. The `InferenceInfo` topic can be latched so that new nodes joining the ROS system can get the messages that were published since the beginning. In ROS2 this can be achieved using a `transient local` QoS profile.


It should be inference_info when talking about the topic (as opposed to the message type InferenceInfo).

mintar · 2022-07-14T09:09:42Z

README.md

+
+2. The subscribing node can get and store one `InferenceInfo` message and cancel its subscription after that. This assumes the provider of the message publishes it periodically.
+
+3. The `InferenceInfo` message can be synchronized with the `Image` messages that containing the segmentation mask and the confidence values of the inference of each pixel. This can be achieved using ROS's `message_filters`. The same [snippet](http://wiki.ros.org/message_filters#Policy-Based_Synchronizer_.5BROS_1.1.2B-.5D) mentioned above can work as an example.


Isn't this incompatible with mentioning the latched topic above? If we use the time synchronizers from message_filters, we need to publish a InferenceInfo message for every image with the exact same time stamp (just like the camera_info topic). The section about latched publishers above sounds like we only publish a message once (or whenever the contents change), analogous to static TFs or maps.

I'm fine with both, BTW.

Agreed. InferenceInfo should be one of these two:

A message truly like CameraInfo - it is not expected to change. Caching it locally on other nodes should be fine. Latched publishers should be fine.

It can't be assumed to be the same for each published Image. If that's the case, we need to remove all mention of latching and caching, and require that people subscribe to both the Image and the InferenceInfo. But if they're always being published in pairs, you'll need to remind me why we aren't combining them into one message.

I think it can be both ways, that's why I want to place both options in the readme to describe what would be the common ways of using the message. The third option I propose also assumes the message does not change but does not require the use of a latched topic. I think that it is worth referencing at least the first two but let me know if you think we should settle for only one of them

mintar · 2022-07-14T09:10:56Z

msg/InferenceInfo.msg

+#   constant) over time, and so it is wasteful to send it with each individual
+#   result. By listening to these messages, subscribers will receive
+#   the context in which published vision messages are to be interpreted.
+# Each vision pipeline should publish its VisionInfo messages to its own topic,


VisionInfo -> InferenceInfo

Kukanani · 2022-07-15T00:19:43Z

I like the interface that we've settled on here. In terms of naming, if InferenceInfo is specifically for use with semantic segmentation, should we call it something like SegmentationInfo?

Either way, I'm going to suggest some documentation improvements for clarity.

Kukanani · 2022-07-15T00:22:49Z

README.md

@@ -47,6 +47,14 @@ http://wiki.ros.org/message_filters#Policy-Based_Synchronizer_.5BROS_1.1.2B-.5D)
 in your code, as the message's header should match the header of the source
 data.

+Semantic segmentation pipelines should use `sensor_msgs/Image` messages for publishing segmentation and confidence masks. This allows to use all the ROS tools for image processing easily and to choose the most lightweight encoding for each type of message. To transmit the metadata associated with the vision pipeline the [`/vision_msgs/InferenceInfo`](msg/InferenceInfo.msg) message can be used analogously to how `/sensor_msgs/CameraInfo` message and `/sensor_msgs/Image` are:


Suggested change

Semantic segmentation pipelines should use `sensor_msgs/Image` messages for publishing segmentation and confidence masks. This allows to use all the ROS tools for image processing easily and to choose the most lightweight encoding for each type of message. To transmit the metadata associated with the vision pipeline the [`/vision_msgs/InferenceInfo`](msg/InferenceInfo.msg) message can be used analogously to how `/sensor_msgs/CameraInfo` message and `/sensor_msgs/Image` are:

Semantic segmentation pipelines should use `sensor_msgs/Image` messages for publishing segmentation and confidence masks. This allows systems to use standard ROS tools for image processing, and allows choosing the most compact image encoding appropriate for the task. To transmit the metadata associated with the vision pipeline, you should use the [`/vision_msgs/InferenceInfo`](msg/InferenceInfo.msg) message. This message works the same as `/sensor_msgs/CameraInfo or [`/vision_msgs/VisionInfo`](msg/VisionInfo.msg):

Kukanani · 2022-07-15T00:25:24Z

README.md

@@ -47,6 +47,14 @@ http://wiki.ros.org/message_filters#Policy-Based_Synchronizer_.5BROS_1.1.2B-.5D)
 in your code, as the message's header should match the header of the source
 data.

+Semantic segmentation pipelines should use `sensor_msgs/Image` messages for publishing segmentation and confidence masks. This allows to use all the ROS tools for image processing easily and to choose the most lightweight encoding for each type of message. To transmit the metadata associated with the vision pipeline the [`/vision_msgs/InferenceInfo`](msg/InferenceInfo.msg) message can be used analogously to how `/sensor_msgs/CameraInfo` message and `/sensor_msgs/Image` are:
+
+1. The `InferenceInfo` topic can be latched so that new nodes joining the ROS system can get the messages that were published since the beginning. In ROS2 this can be achieved using a `transient local` QoS profile.


Suggested change

1. The `InferenceInfo` topic can be latched so that new nodes joining the ROS system can get the messages that were published since the beginning. In ROS2 this can be achieved using a `transient local` QoS profile.

1. Publish `InferenceInfo` to a topic. The topic should be at same namespace level as the associated image. That is, if your image is published at `/my_segmentation_node/image`, the `InferenceInfo` should be published at `/my_segmentation_node/inference_info`. Use a latched publisher for `InferenceInfo`, so that new nodes joining the ROS system can get the messages that were published since the beginning. In ROS2, this can be achieved using a `transient local` QoS profile.

Kukanani · 2022-07-15T00:26:17Z

README.md

+
+1. The `InferenceInfo` topic can be latched so that new nodes joining the ROS system can get the messages that were published since the beginning. In ROS2 this can be achieved using a `transient local` QoS profile.
+
+2. The subscribing node can get and store one `InferenceInfo` message and cancel its subscription after that. This assumes the provider of the message publishes it periodically.


Suggested change

2. The subscribing node can get and store one `InferenceInfo` message and cancel its subscription after that. This assumes the provider of the message publishes it periodically.

2. A subscribing node may receive an `InferenceInfo` message, store it locally, then cancel its subscription.

I don't understand why we need to assume that the publisher publishes periodically - that's the point of making it latched

Its my understanding that a latched topic uses the transient local QoS. In contrast, this second method does not require the usage of any specific QoS, that's why messages should be published periodically to allow new nodes to get the messages.

Makes sense

Kukanani · 2022-07-15T00:31:17Z

README.md

+
+2. The subscribing node can get and store one `InferenceInfo` message and cancel its subscription after that. This assumes the provider of the message publishes it periodically.
+
+3. The `InferenceInfo` message can be synchronized with the `Image` messages that containing the segmentation mask and the confidence values of the inference of each pixel. This can be achieved using ROS's `message_filters`. The same [snippet](http://wiki.ros.org/message_filters#Policy-Based_Synchronizer_.5BROS_1.1.2B-.5D) mentioned above can work as an example.


Agreed. InferenceInfo should be one of these two:

A message truly like CameraInfo - it is not expected to change. Caching it locally on other nodes should be fine. Latched publishers should be fine.

It can't be assumed to be the same for each published Image. If that's the case, we need to remove all mention of latching and caching, and require that people subscribe to both the Image and the InferenceInfo. But if they're always being published in pairs, you'll need to remind me why we aren't combining them into one message.

msg/InferenceInfo.msg

pepisg · 2022-07-15T18:23:44Z

I like the interface that we've settled on here. In terms of naming, if InferenceInfo is specifically for use with semantic segmentation, should we call it something like SegmentationInfo?

This message is not only meant for segmentation but for object detection as well, or in general to any vision pipeline that returns id based classifications that you later may want to convert back to human readable string class names.

SteveMacenski · 2022-07-15T18:33:01Z

LabelInfo? ClassInfo? Then the individual instances would be Label or Class, plus or minus a Vision.

I agree, I'm not a fan of InferenceInfo

jmm-slamcore · 2022-11-15T08:59:57Z

@pepisg is this still relevant? Think it would be great to get this merged to bring more obj. det./semantics capabilities to ROS2/Nav2

SteveMacenski · 2022-11-15T20:34:11Z

agreed!

Kukanani · 2022-11-16T02:34:49Z

List of outstanding items I see before this can merge:

Alternate name for InferenceInfo. I propose SegmentationInfo but am also happy with ClassInfo or LabelInfo or perhaps some other proposal. Author's choice :)
Clarify all comments and docs to either be latched/transient local Qos (and no mention of "being like CameraInfo" or time synchronizers), or continually published. I don't have a preference, but we should be consistent in our intended usage, or else it will cause confusion
Other misc comment typos/clarifications as proposed in PR comments

Anything else?

pepisg · 2022-11-16T16:24:07Z

Alternate name for InferenceInfo. I propose SegmentationInfo but am also happy with ClassInfo or LabelInfo or perhaps some other proposal. Author's choice :)

Done. left it LabelInfo following @SteveMacenski 's suggestion and my earlier thought about this messages enabling users to convert id encoded data from vision pipelines to human readable class names. which may apply for object detection as well

Clarify all comments and docs to either be latched/transient local Qos (and no mention of "being like CameraInfo" or time synchronizers), or continually published. I don't have a preference, but we should be consistent in our intended usage, or else it will cause confusion

We seemed to agree on being two possible setups: a) latched topic; b) node subscribes to topic, gets one message and then unsubscribes. There are several comments on the PR which I tried to address on the last commit regarding this. Please let me know if you have any further comments on this section of the documentation.

Other misc comment typos/clarifications as proposed in PR comments

I guess I integrated all your change requests, let me know if there is anything else I should change / include

SteveMacenski · 2022-11-16T20:21:45Z

@Kukanani are you happy with these?

Kukanani

Great, ship it!

pepisg added 4 commits February 8, 2022 20:28

[FEAT]: Add semantic segmentation messages

7edefb5

[Fix]: Revert clang stylin

bb9dadc

Merge branch 'ros2' of github.com:kiwicampus/vision_msgs into feature…

87f2eae

…/semantic-segmentation

Fetch upstream

f26fb65

pepisg mentioned this pull request Jun 28, 2022

Draft: Semantic segmentation plugin for nav2_costmap_2d ros-navigation/navigation2#3042

Open

7 tasks

mintar reviewed Jun 29, 2022

View reviewed changes

pepisg added 3 commits July 13, 2022 23:20

Refactor PR to include new messages

ccddf62

update new messages in cmake

a5d7352

fetch upstream

ad6e80e

mintar reviewed Jul 14, 2022

View reviewed changes

Kukanani requested changes Jul 15, 2022

View reviewed changes

Address PR comments

4ecebca

Update message name on readme

46f6a78

Kukanani approved these changes Nov 17, 2022

View reviewed changes

Kukanani merged commit bc2240c into ros-perception:ros2 Nov 17, 2022

jaagut mentioned this pull request Nov 18, 2022

Use LabelInfo msg for semantic segmentation bit-bots/bitbots_main#263

Open

		@@ -0,0 +1,10 @@
		# A key value pair that maps an integer class_id to a string class label.
		# The class_id should be interpreted as the pixel value corresponding to a


		2. The subscribing node can get and store one `InferenceInfo` message and cancel its subscription after that. This assumes the provider of the message publishes it periodically.

		3. The `InferenceInfo` message can be synchronized with the `Image` messages that containing the segmentation mask and the confidence values of the inference of each pixel. This can be achieved using ROS's `message_filters`. The same [snippet](http://wiki.ros.org/message_filters#Policy-Based_Synchronizer_.5BROS_1.1.2B-.5D) mentioned above can work as an example.


		1. The `InferenceInfo` topic can be latched so that new nodes joining the ROS system can get the messages that were published since the beginning. In ROS2 this can be achieved using a `transient local` QoS profile.

		2. The subscribing node can get and store one `InferenceInfo` message and cancel its subscription after that. This assumes the provider of the message publishes it periodically.

	2. The subscribing node can get and store one `InferenceInfo` message and cancel its subscription after that. This assumes the provider of the message publishes it periodically.
	2. A subscribing node may receive an `InferenceInfo` message, store it locally, then cancel its subscription.

Semantic Segmentation messages #71

Semantic Segmentation messages #71

Conversation

pepisg commented Jun 28, 2022

mintar commented Jun 28, 2022

SteveMacenski commented Jun 29, 2022 • edited Loading

Choose a reason for hiding this comment

SteveMacenski Jun 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SteveMacenski Jun 30, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mintar commented Jun 29, 2022 • edited Loading

SteveMacenski commented Jun 29, 2022

pepisg commented Jun 29, 2022

SteveMacenski commented Jun 30, 2022 • edited Loading

pepisg commented Jul 1, 2022 • edited Loading

mintar commented Jul 1, 2022

SteveMacenski commented Jul 1, 2022

mintar commented Jul 1, 2022

SteveMacenski commented Jul 1, 2022 • edited Loading

mintar commented Jul 2, 2022

mintar commented Jul 2, 2022

SteveMacenski commented Jul 5, 2022

tfoote commented Jul 5, 2022

mintar commented Jul 6, 2022

SteveMacenski commented Jul 6, 2022 • edited Loading

mintar commented Jul 6, 2022

SteveMacenski commented Jul 6, 2022 • edited Loading

mintar commented Jul 6, 2022

pepisg commented Jul 8, 2022

mintar commented Jul 9, 2022

SteveMacenski commented Jul 11, 2022

pepisg commented Jul 13, 2022

mintar commented Jul 14, 2022

Choose a reason for hiding this comment

mintar Jul 14, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pepisg Jul 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Kukanani commented Jul 15, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pepisg commented Jul 15, 2022 • edited Loading

SteveMacenski commented Jul 15, 2022 • edited Loading

jmm-slamcore commented Nov 15, 2022

SteveMacenski commented Nov 15, 2022

Kukanani commented Nov 16, 2022 • edited Loading

pepisg commented Nov 16, 2022

SteveMacenski commented Nov 16, 2022

Kukanani left a comment

Choose a reason for hiding this comment

SteveMacenski commented Jun 29, 2022 •

edited

Loading

SteveMacenski Jun 29, 2022 •

edited

Loading

SteveMacenski Jun 30, 2022 •

edited

Loading

mintar commented Jun 29, 2022 •

edited

Loading

SteveMacenski commented Jun 30, 2022 •

edited

Loading

pepisg commented Jul 1, 2022 •

edited

Loading

SteveMacenski commented Jul 1, 2022 •

edited

Loading

SteveMacenski commented Jul 6, 2022 •

edited

Loading

SteveMacenski commented Jul 6, 2022 •

edited

Loading

mintar Jul 14, 2022 •

edited

Loading

pepisg Jul 15, 2022 •

edited

Loading

pepisg commented Jul 15, 2022 •

edited

Loading

SteveMacenski commented Jul 15, 2022 •

edited

Loading

Kukanani commented Nov 16, 2022 •

edited

Loading