Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reproducible tarballs have unexpected checksums on macOS #4657

Open
lexming opened this issue Sep 26, 2024 · 9 comments
Open

reproducible tarballs have unexpected checksums on macOS #4657

lexming opened this issue Sep 26, 2024 · 9 comments
Labels
Milestone

Comments

@lexming
Copy link
Contributor

lexming commented Sep 26, 2024

This is related to PR easybuilders/easybuild-easyconfigs#21392

The checksum of the generated lsp-types-0.95.1.tar.gz with our reproducible tarball code is 455f2c12f64c2e293c72efdf2b90b96f5903d6590bd2e496bab5f12750402631 on several Linux systems (RHEL8, Rocky 8, Fedora 40 and Debian Bookworm).

However, on MacOS it has an unexpected 59869db34853933b239f1e2219cf7d431da006aa919635478511fabbfc8849d2.

Steps to generate tarball from git repo:

  1. git clone https://github.com/astral-sh/lsp-types.git
  2. cd lsp-types && git checkout 3512a9f33eadc5402cfab1b8f7340824c8ca1439 && cd ..
  3. find lsp-types -name ".git" -prune -o -print0 -exec touch --date=@0 {} \; -exec chmod "go+u,go-w" {} \; | 
    LC_ALL=C sort --zero-terminated | tar --create --no-recursion --owner=0 --group=0 --numeric-owner 
    --format=gnu --null --files-from - | gzip --no-name > lsp-types-0.95.1.tar.gz
    

We need to troubleshoot this on a mac:

  • compare the checksum of the ungzip tarballs on both systems
  • compare the raw contents of the tarball header, there is some metadata there that can point to differences. Basically check the outcome of head -n 100 on the .tar files
@lexming
Copy link
Contributor Author

lexming commented Sep 26, 2024

@boegel this one is for you 😃

@boegel
Copy link
Member

boegel commented Sep 27, 2024

@lexming This explains it:

$ find lsp-types -name ".git" -prune -o -print0 -exec touch --date='@0' {} \; -exec chmod 'go+u,go-w' {} \; | LC_ALL=C sort --zero-terminated | tar --create --no-recursion --owner=0 --group=0 --numeric-owner  --format=gnu --null --files-from - | gzip --no-name > lsp-types-0.95.1.tar.gz
tar: Option --owner=0 is not supported
Usage:
  List:    tar -tf <archive-filename>
  Extract: tar -xf <archive-filename>
  Create:  tar -cf <archive-filename> [filenames...]
  Help:    tar --help
touch: illegal option -- -
usage: touch [-A [-][[hh]mm]SS] [-achm] [-r file] [-t [[CC]YY]MMDDhhmm[.SS]]
       [-d YYYY-MM-DDThh:mm:SS[.frac][tz]] file ...
...
$ echo $?
0
$ ls -l lsp-types-0.95.1.tar.gz
-rw-r--r--  1 kehoste  staff  20 Sep 27 09:06 lsp-types-0.95.1.tar.gz

So, the resulting tarball is basically an empty file.

Two (well, three) problems here:

  • incorrect usage of tar (--owner) and touch (--date);
  • we're not correctly checking the exit code of the pipeline of commands we're running, we should also run "set -o pipefail"

For tar, I don't see an equivalent to --owner on macOS, we may need to add chown in the mix?

For touch, there's -d as option, but that only takes a proper timestamp, can't use @0:

     -d      Change the access and modification times to the specified date time instead of the current time of day.  The argument is of the form “YYYY-MM-DDThh:mm:SS[.frac][tz]” where the letters represent the following:
                   YYYY    At least four decimal digits representing the year.
                   MM, DD, hh, mm, SS
                           As with -t time.
                   T       The letter T or a space is the time designator.
                   .frac   An optional fraction, consisting of a period or a comma followed by one or more digits.  The number of significant digits depends on the kernel configuration and the filesystem, and may be zero.
                   tz      An optional letter Z indicating the time is in UTC.  Otherwise, the time is assumed to be in local time.  Local time is affected by the value of the TZ environment variable.

@boegel boegel added this to the 5.0 milestone Sep 27, 2024
@boegel
Copy link
Member

boegel commented Sep 27, 2024

TZ=UTC touch -t 197001010000.00 may works as equivalent to touch --date='@0'?

@lexming
Copy link
Contributor Author

lexming commented Sep 27, 2024

It doesn't. Using -t 197001010000.00 is timezone dependent, depending on your locale you might get a timestamp zero or not.

@lexming
Copy link
Contributor Author

lexming commented Sep 27, 2024

@boegel using touch --date=1970-01-01T00:00:00.00Z does seem to be equivalent to touch --date='@0', can you try this one?

@lexming
Copy link
Contributor Author

lexming commented Sep 27, 2024

@boegel the ownership issue is a big problem. Only root can chown the owner of a file. So there is no workaround possible here.

Good news is that BSD tar does actually support --owner, see https://man.freebsd.org/cgi/man.cgi?tar(1) . So maybe this is limited to some obscure version of macOS.

@lexming
Copy link
Contributor Author

lexming commented Sep 27, 2024

PR #4660 adds a more portable setting for timestamps and enables pipeline failures.

The issue about --owner stays unchanged.

@boegel
Copy link
Member

boegel commented Oct 2, 2024

For tar, one option we could explore could be to use the tarfile module in the Python standard library.

Quick & dirty try:

import tarfile
import os

def reset(tarinfo):
    tarinfo.uid = tarinfo.gid = 0
    tarinfo.uname = tarinfo.gname = "root"
    tarinfo.mtime = 0
    return tarinfo

# Create a tar archive
with tarfile.open("example.tar", "w") as tar:
    tar.add("file1.txt", filter=reset)
    tar.add("file2.txt", filter=reset)

print("Tar archive created.")

(doesn't reproduce exact same tarball on macOS + Linux yet though using empty file1.txt and file2.txt)

@lexming
Copy link
Contributor Author

lexming commented Oct 9, 2024

@boegel as discussed, updated #4660 with an implementation using tarfile that works across systems.

However there is a different problem now, tarfile changed the way it writes tarfile headers in python/cpython#90021 . Although the PR states that this only applies to tarfiles with the PAX_FORMAT, I tested that it also applies to GNU_FORMAT. This means that tarfiles generated by EasyBuild are not reproducible across all supported Python versions.

Python version [3.6, 3.8] use the old header format and versions 3.9 onwards use the new one.

AFAIK there is no workaround we can use. So the only solution seems to be to support reproducible tarballs in Python 3.9+ only and leave older versions of Python with the current status in 4.9. In time those will become unsupported anyhow.

@boegel boegel changed the title reproducible tarballs have unexpected checksums on mac reproducible tarballs have unexpected checksums on macOS Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Blockers
Development

No branches or pull requests

2 participants