Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mutate: add uncompressed blob size annotation #510

Conversation

mikemccracken
Copy link
Contributor

Sometimes it is useful to have a reliable estimate of the space that would be taken up by a layer when it is unpacked. Since gzip's headers are only accurate up to 4GiB, larger layers can not be easily measured without uncompressing them.

This commit adds a field to the gzip and zstd compressors to store the bytes read from the stream during compression and uses that to set an annotation on newly generated layers.

If the noop compressor is used, the annotation is not added, as the existing "compressed" layer size would be the same.

👀 note that the annotation name is just a shot in the dark, I am open to suggestions for something that'd make more sense.

For validation, I unpacked the opensuse/leap:15.4 image with sudo ~/umoci/umoci unpack --image opensuse:latest opensuse-latest

added a 5GiB file with just truncate -s 5GB opensuse-latest/rootfs/test

then repacked and inspected the resulting blob:
sudo ~/umoci/umoci --log=debug repack --image ./opensuse:latest-repacked-w-bigchange opensuse-latest/

here are the layers from the resulting image:

  "layers": [
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:97fb1fedf685704ab0caadb3872223dfa7a49e26e65ea3a57aa34508d93ea5b8",
      "size": 44987689
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:cb6f4fa41f115bc6070e91509cc23a5df6a98da11c971060eb33a74a94861362",
      "size": 27975432,
      "annotations": {
        "ci.umo.uncompressed_blob_size": "5368711168"
      }
    }
  ]

and we can see the incorrect gzip header, followed by the actual size after decompressing:
(note I copied the blobs from the layout into this tmp dir, this is slightly edited)

embrane@atom-lab-8:/tmp/umoci$ file ../opensuse/blobs/sha256/cb6f4fa41f115bc6070e91509cc23a5df6a98da11c971060eb33a74a94861362
../opensuse/blobs/sha256/cb6f4fa41f115bc6070e91509cc23a5df6a98da11c971060eb33a74a94861362: gzip compressed data, original size modulo 2^32 1073743872
# elide copy and renaming blobs here
embrane@atom-lab-8:/tmp/umoci$ gunzip cb6f4fa41f115bc6070e91509cc23a5df6a98da11c971060eb33a74a94861362.tar.gz
embrane@atom-lab-8:/tmp/umoci$ ll
total 5242888
drwxrwxr-x   3 embrane embrane        100 Oct 16 16:52 ./
drwxrwxrwt 328 root    root         11560 Oct 16 16:50 ../
-rw-------   1 embrane embrane 5368711168 Oct 16 16:51 cb6f4fa41f115bc6070e91509cc23a5df6a98da11c971060eb33a74a94861362.tar

@codecov-commenter
Copy link

codecov-commenter commented Oct 17, 2023

Codecov Report

Merging #510 (8f65e8f) into main (312b2db) will decrease coverage by 0.03%.
Report is 2 commits behind head on main.
The diff coverage is 75.00%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #510      +/-   ##
==========================================
- Coverage   73.48%   73.45%   -0.03%     
==========================================
  Files          60       60              
  Lines        4884     4902      +18     
==========================================
+ Hits         3589     3601      +12     
- Misses        935      941       +6     
  Partials      360      360              
Files Coverage Δ
mutate/mutate.go 70.98% <100.00%> (+0.79%) ⬆️
mutate/compress.go 45.71% <64.28%> (+2.38%) ⬆️

... and 1 file with indirect coverage changes

Sometimes it is useful to have a reliable estimate of the space that
would be taken up by a layer when it is unpacked. Since gzip's headers
are only accurate up to 4GiB, larger layers can not be easily measured
without uncompressing them.

This commit adds a field to the gzip and zstd compressors to store
the bytes read from the stream during compression and uses that to set
an annotation on newly generated layers.

If the noop compressor is used, the annotation is not added, as the
existing "compressed" layer size would be the same.

Signed-off-by: Michael McCracken <[email protected]>
@mikemccracken mikemccracken force-pushed the 2023.10.16/main/add-uncompressed-size-annotation branch from 96974db to 8f65e8f Compare October 23, 2023 19:44
@mikemccracken
Copy link
Contributor Author

now that CI is passing here, ping @cyphar or @tych0 , please take a look, and especially let me know if ci.umo.uncompressed_blob_size is the right key to use for this. Thanks!

Copy link
Member

@tych0 tych0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm... also seems very useful. why did the original spec not have this?

@rchincha
Copy link
Contributor

lgtm... also seems very useful. why did the original spec not have this?

A very very good question which @cyphar also pointed to and we are painfully finding out when inter-operating with other tools :(

@hallyn
Copy link
Contributor

hallyn commented Jan 23, 2024

Uh, hm. Ping @tych0 ? Was this meant to be merged?

@tych0
Copy link
Member

tych0 commented Jan 23, 2024

I don't see why not, thanks for the ping.

@tych0 tych0 merged commit 501944d into opencontainers:main Jan 23, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants