zipfile.write should check that it isn't appending an archive to itself #104527

gabevenberg · 2023-05-16T04:13:33Z

A common use pattern for zipfile is to recursively compress an entire directory into a single .zip file. A common implementation of this pattern looks like this:

#!/usr/bin/env python3

import zipfile, pathlib

rootpath=pathlib.Path('targetdir')

with zipfile.ZipFile('outputfile', 'w') as archive:
    for file_path in sorted(rootpath.rglob('*')):
        arcname=file_path.relative_to(rootpath)
        archive.write(file_path, arcname.as_posix())

However, if outputfile is a path that is a child of targetdir, this results in the operation hanging once the rglob operation eventually causes the archive to attempt to write outfile into outfile, causing the write operation to continue indefinitely until the filesystem runs out of space or the archive hits its max file size, like can be observed in this example:

#!/usr/bin/env python3

import zipfile, pathlib

rootpath=pathlib.Path('./')

with zipfile.ZipFile('./foo.zip', 'w') as archive:
    for file_path in sorted(rootpath.rglob('*')):
        arcname=file_path.relative_to(rootpath)
        archive.write(file_path, arcname.as_posix())

Needless to say, this is hardly an intuitive error path, and can cause difficulties with debugging.

Note that it is not simply third party libraries that allow this error to happen. Neither zipapp nor shutil.make_archive include a check making sure the output file is not a child of the target dir.

There are two ways I think this could be fixed:

make zipfile.write simply check that self.file and filename are not equal, raising a ValueError if they are
patch all users of zipfile to silently skip over outputfile when they are compressing.

I think the first, at the least, should be implemented, in order to provide an actual error message in this situation, instead of hanging for however long it takes for someone to notice a multi-gb zip file growing by the second. Ill be submitting a PR doing so shortly.

Linked PRs

The text was updated successfully, but these errors were encountered:

…-106076) zippapp will now avoid appending an archive to itself.

serhiy-storchaka · 2024-02-01T10:35:11Z

It is inefficient. realpath() is expensive, and it will be called twice for every added file and directory. samefile() or samestat() could be more efficient, but even they add an overhead. It should be measured.
It is not comprehensive. It does not always work when an open file was passed to ZipFile constructor.
It needs tests. Several tests for different corner cases.

It would be nice to have such feature, but we should estimate its cost and make it as small as possible. If it will still be significant, we only can add a warning in the documentation or make this feature optional and off by default.

gabevenberg · 2024-03-26T21:32:39Z

It is inefficient. realpath() is expensive, and it will be called twice for every added file and directory. samefile() or samestat() could be more efficient, but even they add an overhead. It should be measured.

It is not comprehensive. It does not always work when an open file was passed to ZipFile constructor.

It needs tests. Several tests for different corner cases.

It would be nice to have such feature, but we should estimate its cost and make it as small as possible. If it will still be significant, we only can add a warning in the documentation or make this feature optional and off by default.

Im open to make new PRs, I was a student when I discovered the issue and made the PR. Are there any corner cases your thinking of specifically? And how would you recommend timing the impact? What time impact would you consider acceptable? What level should the check be made at, at zipfile.write itself (given it is public) or just the consumers of it in the standard library?

gabevenberg added the type-bug An unexpected behavior, bug, or error label May 16, 2023

hugovk changed the title ~~zipfile.write should check that it isnt apending an archive to itself.~~ zipfile.write should check that it isn't appending an archive to itself May 17, 2023

gpshead added the type-feature A feature request or enhancement label May 20, 2023

PurityLake mentioned this issue May 24, 2023

gh-104527: Add check to not recursively write zipfile #104857

Closed

gabevenberg mentioned this issue Jun 25, 2023

gh-104527: zippapp will now avoid appending an archive to itself. #106076

Merged

arhadthedev added the stdlib Python modules in the Lib dir label Jun 26, 2023

pfmoore pushed a commit that referenced this issue Jun 26, 2023

gh-104527: zippapp will now avoid appending an archive to itself. (gh…

dac3d38

…-106076) zippapp will now avoid appending an archive to itself.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zipfile.write should check that it isn't appending an archive to itself #104527

zipfile.write should check that it isn't appending an archive to itself #104527

gabevenberg commented May 16, 2023 •

edited by bedevere-bot

Loading

serhiy-storchaka commented Feb 1, 2024

gabevenberg commented Mar 26, 2024

zipfile.write should check that it isn't appending an archive to itself #104527

zipfile.write should check that it isn't appending an archive to itself #104527

Comments

gabevenberg commented May 16, 2023 • edited by bedevere-bot Loading

Linked PRs

serhiy-storchaka commented Feb 1, 2024

gabevenberg commented Mar 26, 2024

gabevenberg commented May 16, 2023 •

edited by bedevere-bot

Loading