Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add checksum check to TarEntry #210

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

basil
Copy link
Contributor

@basil basil commented May 25, 2024

I am migrating from Commons Compress to Ant for tar/untar functionality, since new releases of Commons Compress pull in unwanted dependencies, while new releases of Ant are self-contained. But while doing so, I noticed this missing feature in Ant, so I backported it from Commons Compress. As far as I understand it, these two projects have the same license, so this should be legal. This PR backports apache/commons-compress@811fb4e and its associated test to this repository.

@basil
Copy link
Contributor Author

basil commented Aug 29, 2024

Hey @jaikiran, great to see the 1.10.15 release of Ant is out. Can this enhancement be considered for the next Ant release?

@jaikiran
Copy link
Member

Hello Basil, I'll need inputs from others on this one. Specifically because this isn't just a technical change. I don't have enough historical knowledge about the commons compress project, but as far as I know we haven't brought in changes/code from that project into the Ant project. Instead, if I'm not wrong, the recommendation has been to provide enhancements in the commons compress project.

I am migrating from Commons Compress to Ant for tar/untar functionality, since new releases of Commons Compress pull in unwanted dependencies, while new releases of Ant are self-contained.

I understand the motivation. However, I don't follow the Commons Compress project, so just for additional context, which specific dependenices get pulled in when using the classes in that project?

@basil
Copy link
Contributor Author

basil commented Aug 30, 2024

Thanks for your reply @jaikiran. A search for "compress" in the Git history shows a long history of @bodewig porting enhancements and fixes from Commons Compress to Ant. Here are a few:

The particular dependency in commons-compress that we do not want in Jenkins core is commons-lang3. Due to the historical architecture of Jenkins core, third-party dependencies of Jenkins core are exposed to all plugins. We would rather deliver third-party libraries as Jenkins library plugins, which allows Jenkins plugins to consume them as plugin-to-plugin dependencies but prevents Jenkins core from consuming them. Here we just have a variation of this old problem. Ant's version of tar/untar functionality is ideal for us to embed in Jenkins core because it has few third-party dependencies.

I do not see any reason not to port fixes from Commons Compress to Ant as @bodewig has done for many years. This can only benefit both projects. I view it as collaboration rather than competition (a rising tide lifts all boats).

@bodewig
Copy link
Member

bodewig commented Aug 30, 2024

The Commons Compress code base has been seeded (among other things) by code originating from the Ant project. I used to be a committer (and release manager) for Commons Compress and served the Ant project in a similar role. When I ported changes from Commons Compress to Ant I mostly did so in order to port bug fixes or add new features also useful for Ant - like support for ZIP64.

I have left the Commons project several years ago, I don't follow the Compress project anymore and so I stopped porting any changes.

In the past I have strongly recommended people use Commons Compress rather than Ant's code for compression/archives as Ant's code really does not want to be a general purpose library for this. We deliberately wanted to support what is needed by Ant - but not more. Even though I haven't followed Compress in a long time I'd still hope the library is more useful than Ant's tar/zip/bzip2 packages. I fully understand your third-party dependencies problem and am sad the project has chosen this route, but that's not my call anymore.

So much for historical background. :-)

One reason to not port any changes that are not used by an Ant task directly is that we'd have more code that we need to maintain. Nobody in the Ant project tracks Commons Compress for changes anymore, so if we port a bug it is likely to remain undetected and unfixed over here. This is even more true as Ant wouldn't use the code and wouldn't be exposed to such bugs itself at all.

As far as I remember (and COMPRESS-191 supports that) this check is used to detect files as tar archives. It does not actually verify the checksum but only tries to remove a bunch of false positives. I don't really see where we'd use that in Ant.

@basil
Copy link
Contributor Author

basil commented Aug 30, 2024

Thanks for the context, @bodewig. I would be happy to use Commons Compress if it satisfied our dependency requirements, but I fully understand that this decision is up to the current maintainers and that they have chosen to keep the Commons Lang 3 dependency.

One reason to not port any changes that are not used by an Ant task directly is that we'd have more code that we need to maintain.

This is a good point, but allow me to offer another perspective. Ideally there would be only one Apache implementation of tar/untar functionality in Java. But if these two implementations are to continue to exist for the foreseeable future, perhaps they will be easier to maintain if they are more similar to each other than if they grow further apart. From this perspective, any change that brings the two copies closer together could be considered an improvement in maintainability, and perhaps even a stepping-stone toward an eventual unification of the two codebases.

@bodewig
Copy link
Member

bodewig commented Aug 31, 2024

but allow me to offer another perspective. Ideally there would be only one Apache implementation of tar/untar functionality in Java.

Commons as a whole was born because of this idea. The many projects that made up the Jakarta project twenty years ago realized they had competing implementations of the same functionality.

Ant as a project decided to restrict dependencies to the JDK and not more for the core parts - it has always been clear that creating jars was a core functionality and we also felt creating tar.gz/bz2 archives for distribution was part of the core. So we decided to keep our tar/zip/bzip2 packages even after they had been forked to Commons (not directly from Ant, BTW, it took an extra route via a fork inside the Apache Avalon project back then and we Ant folks didn't even knew it existed).

So Ant was and is in a similar position like Jenkins, we didn't want to add extra dependencies. And therefore we decided to keep a fork tailored to our needs. And we do not want to have our fork to be the general purpose single source that others would depend on. Depending on Ant as a whole would be strange for people who only want to create tars.

But if these two implementations are to continue to exist for the foreseeable future, perhaps they will be easier to maintain if they are more similar to each other than if they grow further apart.

No, we don't really change the code much. It's been serving Ant well for a long time and we only change things if bugs are reported. One could say it is in maintenance mode. New feature would only be added if/when they are needed by Ant.

The Apache Software License certainly allows forking the code. So one way for Jenkins could be to fork the tar package from Commons Compress and strip it off its dependencies. Of course you'd have to carry the burden of maintaining it, then. And might come to similar decisions as the Ant community has arrived at.

@basil
Copy link
Contributor Author

basil commented Aug 31, 2024

And we do not want to have our fork to be the general purpose single source that others would depend on. Depending on Ant as a whole would be strange for people who only want to create tars.

I think there is demand for a dependency-free general-purpose tar/untar implementation (as evidenced by the fact that I was not the only Commons Compress user to complain about the new dependency on Commons Lang 3), should the Ant maintainers choose to go in this direction (e.g. by accepting PRs like this). Depending on Ant as a whole was not strange for us, since we already use Ant for other functionality inside Jenkins.

No, we don't really change the code much. It's been serving Ant well for a long time and we only change things if bugs are reported.

If you do need to change the code, though, it would be easier to port future bug fixes from Commons Compress if the two codebases were more similar. The more the two codebases diverge, the more challenging it becomes to port fixes from Commons Compress to Ant for the portions that are actually used by Ant.

The Apache Software License certainly allows forking the code. So one way for Jenkins could be to fork the tar package from Commons Compress and strip it off its dependencies. Of course you'd have to carry the burden of maintaining it, then.

Indeed. I think we can all agree that maintaining forks is a maintenance burden that should be avoided if possible. This PR is an attempt to optimize globally rather than locally, trying to find an equilibrium between all three projects—should the Ant project judge the maintenance cost associated with this (unit-tested) PR to be acceptable.

@bodewig
Copy link
Member

bodewig commented Sep 1, 2024

I think there is demand for a dependency-free general-purpose tar/untar implementation (as evidenced by the fact that I was not the only Commons Compress user to complain about the new dependency on Commons Lang 3),

In that case I believe the best way for people having this demand is to maintain such an implementation. Personally I don't think Ant should be the place who maintains it - as the group of people maintaining Ant is pretty small by now - and some of us are already struggling to find time to do so.

This PR is an attempt to optimize globally rather than locally, trying to find an equilibrium between all three projects

This "global optimum" probably depends on the function you want to optimize ;-)

@basil
Copy link
Contributor Author

basil commented Sep 1, 2024

It is great to have the historical context and clearly articulated arguments for and against this PR. This should help current maintainers to make an informed decision. While I believe this unit-tested PR should have a negligible maintenance cost (which might even be lower than the cost of this discussion), this decision ultimately belongs to the current Ant project maintainers. Whether this PR is accepted or rejected, I wanted to thank everyone for having this discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants