-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change tar
command used in get_source_tarball_from_git
to get reproducible tarballs
#4248
change tar
command used in get_source_tarball_from_git
to get reproducible tarballs
#4248
Conversation
354b94b
to
1b4e18d
Compare
1b4e18d
to
049b0cb
Compare
Hi,
Hopefully I can adress your concerns.
1.a. This may not be enough to get 100% reproducible tarballs across a
variety of systems.
This patch could be insufficient, that is always a possibility, but I have
done my best to be compatible with old versions of GNU Tar. Given that the
patch is based on documentation from reproducible-builds.org, is it not
probable that it will improve the situation compared to the code in place
today?
The code can always be improved upon if a user of easybuild comes with a
report of the code not producing reproducible tarballs.
1.b. I'm fairly sure that a different version of gzip being used may
already be sufficient, since that may do better compression, etc.
I do not think this is the case. Sorting the files, zeroing out time and
user metadata - all of this will be included by default in all versions of
gzip that I am aware of, unless we tell gzip otherwise. The efficiency of
the compression algorithm itself is irrelevant for the reproducability of
the tarball.
1.c. Another factor could be locale, which may affect the order of paths
produced with the find command.
Which is why the results of find ar piped through sort with LC_ALL set to C.
1.d. If the tarball creation not 100% reproducible across a wide variety
of systems, it's basically useless, since people who get a slightly
different tarball will be running into incorrect checksums all the time,
which would not be good.
I am not sure I understand. It is already possible to create tarballs from
git archives using easybuild today, right?
This patch, at a minimum, makes the creation of tarballs from git
reproducible on some systems whereas before they were never (?)
reproducible. I say never since git does not track file creation or
modification time, so they would always be set to the time at which the
repository was cloned. [1] But I could be totally wrong here since I'm
relatively new to easybuild.
2. The side effect of using --mtime to reset time stamps and --owner=0 is
that information gets lost. Especially the timestamp info may be
interesting, so it should be retained.
This is the very core of the patch. These variables, data that differ
between downloads of the same git repository, must be zeroed out or set to
known values so that each tarball of the same git repository and
commit/tag/hash is the same. You cannot create a reproduceable tarball of a
git repository without erasing the mtime of the files on the local
filesystem in some manner.
3. This likely breaks the creating of tarball for easyconfigs using
git_config on macOS, which is a regression. Although you currently won't
get very far with installing software on macOS with EasyBuild, features
like downloading sources with eb --fetch should work.
Do you want me to run some test protocol on a mac? I could probably get
hold of such a machine.
[1]: https://stackoverflow.com/a/62039285/501017
… Message ID: <easybuilders/easybuild-framework/pull/4248/review/1420648038@
github.com>
|
82cb6f0
to
d257534
Compare
Hej Åke!
Changing mtime is an absolute do-not-do-that.
Could you clearify why that is? If I understand correctly Git does not
record the mtime of the original source files. This means that the mtime of
the local files on disks at the point of archive creation only reflects
when the git repository was cloned, no more no less.
It also has the problem of not generating a tar file that has the same
content as what tar zcf does.
A small test I did shows that the file/sort/tar pipe above actually adds
each file twice in the tar file.
Example:
mkdir q
echo hej > q/t
echo hopp > q/w
find q | sort | tar --create --file q.tar --files-from -
tar cf q2.tar q
tar tf q2.tar
q/
q/t
q/w
tar tf q.tar
q/
q/t
q/w
q/t
q/w
this is caused by directory coming first, thereby making tar copy the
whole dir into the tar file, then the list of files, making tar add the
files once more.
Thank you for finding the bug and the clear demonstration! I had missed
attaching the flag --no-recursion to the tar-invocation. I have pushed a
modified version of the patch that rectifies this shortcomming.
Thank you also for prodding me to have a look at contents of the archive my
version of easybuild produced. Turns out .git wasn't properly pruned
either. But the contents are removed in the latest version of the patch.
… Message ID: <easybuilders/easybuild-framework/pull/4248/review/1421806504@
github.com>
|
d257534
to
d40c8d4
Compare
We've discussed this PR a bit in a recent EasyBuild conf call, and the general consensus was that we can indeed proceed with this, but do so in the scope of the EasyBuild 5.0 effort (since it's a significant change). @Rovanion Are you up for making the tests happy here that complain about style issues? |
Yeah I can do that. I'll see if I can get it in before vacation. |
768d66a
to
f621000
Compare
All linter complaints should be fixed now. |
9daa03c
to
bf4f3f6
Compare
The only workflow errors that seem to occur are running Python 2.7 and 3.5 which seem to have been (partly?) dropped from the list of supported Pythons to run Easybuild on? |
The check "Tests for Apptainer container support / build (3.7, 1.0.0) (pull_request)" have failed due to a fail to write a file, looks like the machine used for testing failed in this case. |
Is there anything else that needs to be done before this can be merged? |
@Rovanion We need to take another detailed look at this, but what definitely needs to happen is re-targeting this to the |
Hi, On the BSD/Mac side: Does EasyBuild have a bootstrapping process/story in place so that it may use a self-built version of GNU Tar? If not: I read up on flags for FreeBSDs Tar one day during the Holidays, was not able to find any definitive answer but hints that it should be possible to coax it into submission. On the subject of including |
get_source_tarball_from_git
to get reproducible tarballstar
command used in get_source_tarball_from_git
to get reproducible tarballs
I brought this PR to the last EB5 conf call and we might have a solution for it:
@Rovanion is this a workable solution for you? |
A little proof of concept with $ git clone https://github.com/easybuilders/easybuild
Cloning into 'easybuild'...
remote: Enumerating objects: 11089, done.
remote: Counting objects: 100% (909/909), done.
remote: Compressing objects: 100% (457/457), done.
remote: Total 11089 (delta 482), reused 851 (delta 442), pack-reused 10180
Receiving objects: 100% (11089/11089), 595.66 MiB | 10.09 MiB/s, done.
Resolving deltas: 100% (7324/7324), done.
# Delete .git folder
$ rm -rf easybuild/.git
# Checksum of downloaded files
$ tar --create --file easybuild.tar easybuild/
$ sha256sum easybuild.tar
98457ba141e829641878550bcf82681c87380acd106d695235385043dc7327ad easybuild.tar
# Checksum of downloaded files with timestamp reset
$ find easybuild/ -exec touch -t 197001010100 {} \;
$ tar --create --owner=0 --group=0 --numeric-owner --format=gnu --null --file easybuild.0.tar easybuild/
$ sha256sum easybuild.0.tar
e1c54e194764bcd89165da527c25b4f5d82fcbd132be9d5f1f75ee40d9eb6ddc easybuild.0.tar
# Wait some time and repeat the clone + touch + tar
$ sha256sum easybuild.1.tar
e1c54e194764bcd89165da527c25b4f5d82fcbd132be9d5f1f75ee40d9eb6ddc easybuild.1.tar |
Yes, that would work. I'm not available to work on EasyBuild at the moment, I have been directed to prioritize other projects. Feel free to implement this until I am able to work on EasyBuild again. |
…et timestamps with touch
…th reproducible tar commands
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback. I made a PR to this one with the discussed changes. Please check Rovanion#1
The PR looks messy because I'm also syncing to current state of 5.0.x
branch.
The meaningful commit is Rovanion@8d9e14e
- use reproducible archive for git repos without
.git.
dir - set timestamps with
touch
- avoid deprecated
GZIP
environment variable by piping intogzip
Update for PR 4248
easybuild/tools/filetools.py
Outdated
# print names of all files and folders excluding .git directory | ||
'find', repo_name, '-name ".git"', '-prune', '-o', '-print0', | ||
# reset access and modification timestamps | ||
'-exec', 'touch', '-t 197001010100', '{}', '\;', '|', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Rovanion I missed using a raw string for this bit:
'-exec', 'touch', '-t 197001010100', '{}', '\;', '|', | |
'-exec', 'touch', '-t 197001010100', '{}', r'\;', '|', |
for \; in get_source_tarball_from_git.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Rovanion I opened another PR to this one that just syncs with recent changes in the 5.0.x
branch. Unfortunately, at the time of the previous PR, unit tests were broken. Can you please merge Rovanion#2?
@Rovanion ping? |
Sync PR#4248 with 5.0.x
I think all tests have passed now. All that remains are the three reviews. |
req. changes made
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Rovanion Thanks a lot for keeping up with the updates!
LGTM
Cheers!
Den mån 18 mars 2024 kl 22:57 skrev Alex Domingo ***@***.***>:
… Merged #4248
<#4248> into
5.0.x.
—
Reply to this email directly, view it on GitHub
<#4248 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAE2PRYY6VUWJ22ESTQZVRLYY5PK5AVCNFSM6AAAAAAXFGZW66VHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJSGE2TSOJYGM4TINY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***
.com>
|
Hi,
This pull request contains two commits performing two changes to easybuild that were needed to build a piece of Python software that was not located on the Python Package Index but only existed in a git repository.
The first change allows sources in
exts_list
to specify agit_config
. The second change changes the command used to produce tarballs from git archives so that they are made in a reproducible manner. Otherwise each rerun ofeb
would result in a checksum mismatch.These two patches have been tested on CentOS 7 and no other operating system. The command used to produce the tarball from git is complicated by the old versions of software that is found on CentOS/RHEL 7. But in the comments instructions on how to drop support for such outdated operating systems is also included.
If there are any questions, feel free to ask.
Cheerio!