Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CompressedImage #237

Open
wants to merge 1 commit into
base: rolling
Choose a base branch
from
Open

Conversation

ahcorde
Copy link
Contributor

@ahcorde ahcorde commented Feb 7, 2024

The compressedImage message hasn’t been updated since 2009 . This message was probably designed for mainly for jpeg and png according to the history and comments in the message definition.

During this time some other compressedImage types have appeared, for example:

The idea of this PR is to try to refresh this message type allowing new compression algorithms to set all the required information.

As a reference, I was working on a SVTAV1 image transport plugin and I had to encode the metadata inside the payload which is ugly.

This new message type will allow to use the same message type in all the image_transport plugins (at least released in rolling)

  • ffmpeg
  • Jpeg and png
  • Theora
  • SVTAV1
  • AVIF
  • others ?

FYI @calderpg-tri @IanTheEngineer

Signed-off-by: Alejandro Hernández Cordero <[email protected]>
@ahcorde ahcorde self-assigned this Feb 7, 2024
@ahcorde ahcorde requested a review from tfoote as a code owner February 7, 2024 14:08
uint32 height # image height, that is, number of rows
uint32 width # image width, that is, number of columns

string pixel_format # Specifies the format of the data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a needless difference. I understand that this is slightly more descriptive, but I also don't think it is worth the headache for downstream users to change this. So I'll suggest putting this back to format.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that encoding is the current spelling for this in sensor_msgs/Image, which is just wrong in almost every case. Everything else (OpenCV, commercial machine vision libraries, etc) refers to this as "pixel format". Unfortunately, "encoding" is even more confusing when dealing with compressed data, so if there is a chance to fix this properly we should.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For compatibility we shouldn't rename this field but just update it'd documentation to state that it's expecting the pixel_format string as per OpenCV). It's still clear what the content is and this will avoid breaking any existing code.

Comment on lines +10 to +11
uint32 height # image height, that is, number of rows
uint32 width # image width, that is, number of columns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior to this PR, how are users of this message determining this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this was encoded into the payload before, we need some more semantics about when it should be used and when not. It's not great to have data that can potentially disagree.


uint64 sequence_number # sequence number
uint64 flags # flags (for example: KEYFRAME)
uint8 is_bigendian # is this data bigendian?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it would be a bool.

@@ -38,4 +41,10 @@ string format # Specifies the format of the data
# need for successful decoding of the image. Refer to
# documentation of the other transports for details.

string compression_type # Compression type used (jpeg, png, theora, etc)

uint64 sequence_number # sequence number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I agree that this should have a sequence number. A compressed image doesn't logically have one of these, and we don't have this in Image.msg, for instance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does seem to be breaking the encapsulation of this message standing on it's own.

string compression_type # Compression type used (jpeg, png, theora, etc)

uint64 sequence_number # sequence number
uint64 flags # flags (for example: KEYFRAME)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I don't think I agree that this should be here. A compressed image doesn't represent anything other than data, and I don't think we should be adding "external" metadata like this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, this argument is about whether there should be any metadata fields in the message, or if the compression plugin should just shove its metadata into the binary blob in some bespoke fashion. To the extent that a common set of metadata exists, I think it is preferable to put that metadata in the message so that message introspection tooling can see it, rather than putting this all in binary blob form and practically inaccessible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, this argument is about whether there should be any metadata fields in the message

I disagree. I absolutely think we should have metadata in this message that is intrinsic to the CompressedImage. Things like width, height, format, compression_type, and is_bigendian all belong in here.

But metadata about the larger context in which this data is delivered doesn't seem to belong, in my opinion. The exception to this is of course std_msgs/Header, which is already here and is heavily used with the rest of our tooling.

I definitely understand the need for things like a sequence_number, a keyframe flag, and the like. But those seem like things that should be in a custom message, composed of a CompressedImage plus the metadata, like:

# CompressedKeyFrame
sensor_msgs/CompressedImage compressed_image
int sequence_number

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But those seem like things that should be in a custom message, composed of a CompressedImage plus the metadata...

Frankly, I really think that if the outcome isn't an improved-for-everyone message, the right answer is for image_transport to just define its own internal message type that it can change on its own. I don't think that's better for the community, but if we have to introduce a new type for image_transport uses, having it inherit all the inertia that accompanies being a part of core message types isn't an improvement either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that Chris is right that the higher level information such as keyframes and sequence numbers deserve to be in a higher level concept than just the compressed image. Things like Key Frames and sequences are only valid if they're associated with a specific stream. Where a standalone compressed image may or may not be used in those applications.

In particular if someone muxes two compressed Image topics together, the sequence numbers become meaningless unless you're associating them with a name stream or the like.

There's the approach above of using a hierarchy of messages/wrapper messages. The other alternative would be to send a parallel stream of messages in the same way that we do with camera_info. There's a parallel stream of camera info which can be associated with the image by timestamp + frame_id and with that this extra metadata can be applied in the image_transport process, but can be ignored by those who just want to access the raw images. And those images should standalone, in log files etc. Thus there would be CompressedImages and then there would be a CompressedImageStream protocol/standard for how to send along a series of compressed images with keyframes and potentially other metadata in a parallel CompressedImageStreamInfo or the like.

uint32 height # image height, that is, number of rows
uint32 width # image width, that is, number of columns

string pixel_format # Specifies the format of the data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For compatibility we shouldn't rename this field but just update it'd documentation to state that it's expecting the pixel_format string as per OpenCV). It's still clear what the content is and this will avoid breaking any existing code.

@@ -38,4 +41,10 @@ string format # Specifies the format of the data
# need for successful decoding of the image. Refer to
# documentation of the other transports for details.

string compression_type # Compression type used (jpeg, png, theora, etc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a backwards compatible default expectation that should be documented and implemented as a fallback if unset?

@@ -38,4 +41,10 @@ string format # Specifies the format of the data
# need for successful decoding of the image. Refer to
# documentation of the other transports for details.

string compression_type # Compression type used (jpeg, png, theora, etc)

uint64 sequence_number # sequence number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does seem to be breaking the encapsulation of this message standing on it's own.

string compression_type # Compression type used (jpeg, png, theora, etc)

uint64 sequence_number # sequence number
uint64 flags # flags (for example: KEYFRAME)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that Chris is right that the higher level information such as keyframes and sequence numbers deserve to be in a higher level concept than just the compressed image. Things like Key Frames and sequences are only valid if they're associated with a specific stream. Where a standalone compressed image may or may not be used in those applications.

In particular if someone muxes two compressed Image topics together, the sequence numbers become meaningless unless you're associating them with a name stream or the like.

There's the approach above of using a hierarchy of messages/wrapper messages. The other alternative would be to send a parallel stream of messages in the same way that we do with camera_info. There's a parallel stream of camera info which can be associated with the image by timestamp + frame_id and with that this extra metadata can be applied in the image_transport process, but can be ignored by those who just want to access the raw images. And those images should standalone, in log files etc. Thus there would be CompressedImages and then there would be a CompressedImageStream protocol/standard for how to send along a series of compressed images with keyframes and potentially other metadata in a parallel CompressedImageStreamInfo or the like.

Comment on lines +10 to +11
uint32 height # image height, that is, number of rows
uint32 width # image width, that is, number of columns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this was encoded into the payload before, we need some more semantics about when it should be used and when not. It's not great to have data that can potentially disagree.

@christianrauch
Copy link
Contributor

Can I suggest that the CompressedImage definition is not updated? This will change the hash and make recorded data incompatible with new subscribers. I would rather create a new CompressedImage2 if the definition has to be updated.

I found the use of "formats" and "encodings" in the images also confusing. Eventually, all image data, raw or compressed, is just an unstructured binary blob that has to be interpreted according to the encoding or format. Can't we merge this and simply use the binary blob together with the FourCC to describe raw (e.g. RGB3 for a "standard" colour image with 8bit per pixel and channel) and compressed (e.g. JPEG for jpeg and AV1F for AV1) data? There is much more documentation on how to interpret memory via the FourCC than on custom encoding / format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants