Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent handling of duplicate ZipFile entries #117779

Open
obfusk opened this issue Apr 11, 2024 · 2 comments
Open

inconsistent handling of duplicate ZipFile entries #117779

obfusk opened this issue Apr 11, 2024 · 2 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@obfusk
Copy link
Contributor

obfusk commented Apr 11, 2024

Bug report

Bug description:

Create a ZIP file with duplicate central directory entries pointing to the same local file header (these can be found in the wild, see e.g. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1068705, this is just an easy way to create one for testing).

>>> import zipfile
>>> with zipfile.ZipFile("foo.zip", "w") as zf:
...     info = zipfile.ZipInfo(filename="foo")
...     zf.writestr(info, "FOO")
...     zf.filelist.append(info)

Opening the duplicate entry fails if using the name or the later entry in infolist(), but works using the earlier entry (since the later one is considered to overlap with the earlier one, but the earlier one isn't considered to overlap with another entry or the central directory).

>>> import zipfile
>>> zf = zipfile.ZipFile("foo.zip")
>>> zf.infolist()[0]
<ZipInfo filename='foo' filemode='?rw-------' file_size=3>
>>> zf.infolist()[1]
<ZipInfo filename='foo' filemode='?rw-------' file_size=3>
>>> zf.open("foo") # fails
zipfile.BadZipFile: Overlapped entries: 'foo' (possible zip bomb)
>>> zf.open(zf.infolist()[1]) # fails
zipfile.BadZipFile: Overlapped entries: 'foo' (possible zip bomb)
>>> zf.open(zf.infolist()[0]) # works fine
<zipfile.ZipExtFile name='foo' mode='r'>

If I modify NameToInfo to contain the earlier entry instead, f.open("foo") works fine. On the one hand these ZIP files are broken. On the other hand, it would be easy to simply not overwrite existing entries in NameToInfo, allowing these files to be opened. And this affects real-world programs trying to open real-world files. So it could be considered a regression caused by #110016). Perhaps a warning would be in order when duplicates are detected; e.g. unzip shows an error but does extract the files.

CPython versions tested on:

3.11, 3.12

Operating systems tested on:

Linux

@obfusk obfusk added the type-bug An unexpected behavior, bug, or error label Apr 11, 2024
@h01ger
Copy link

h01ger commented Apr 12, 2024

obfusk: thank you for also filing a bug here!

@raininja
Copy link

raininja commented Aug 25, 2024

This is still evident, and rather annoying!

[ 18/291] Writing tensor blk.1.attn_norm.weight                 | size   4096           | type F32  | T+   3
[ 19/291] Writing tensor blk.1.ffn_norm.weight                  | size   4096           | type F32  | T+   3
[ 20/291] Writing tensor blk.2.attn_q.weight                    | size   4096 x   4096  | type F32  | T+   3
[ 21/291] Writing tensor blk.2.attn_k.weight                    | size   4096 x   4096  | type F32  | T+   3
[ 22/291] Writing tensor blk.2.attn_v.weight                    | size   4096 x   4096  | type F32  | T+   3
[ 23/291] Writing tensor blk.2.attn_output.weight               | size   4096 x   4096  | type F32  | T+   3
Traceback (most recent call last):
  File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 1219, in <module>
    main()
  File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 1214, in main
    OutputFile.write_all(outfile, ftype, params, model, vocab, special_vocab, concurrency = args.concurrency, endianess=endianess)
  File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 941, in write_all
    for i, ((name, lazy_tensor), ndarray) in enumerate(zip(model.items(), ndarrays)):
                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 781, in bounded_parallel_map
    result = futures.pop(0).result()
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 905, in do_item
    tensor = lazy_tensor.load().to_ggml()
             ^^^^^^^^^^^^^^^^^^
  File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 510, in load
    ret = self._load()
          ^^^^^^^^^^^^
  File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 520, in load
    return self.load().astype(data_type)
           ^^^^^^^^^^^
  File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 510, in load
    ret = self._load()
          ^^^^^^^^^^^^
  File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 668, in load
    return UnquantizedTensor(storage.load(storage_offset, elm_count).reshape(size))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/raijin/aur/powerinfer/PowerInfer/convert-dense.py", line 652, in load
    fp = self.zip_file.open(info)
         ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/zipfile/__init__.py", line 1652, in open
    raise BadZipFile(f"Overlapped entries: {zinfo.orig_filename!r} (possible zip bomb)")
zipfile.BadZipFile: Overlapped entries: 'pytorch_model-00001-of-00003/data/23' (possible zip bomb)

this conversion works with python 3.10.4

[290/291] Writing tensor output_norm.weight                     | size   4096           | type F32  | T+ 110
[291/291] Writing tensor output.weight                          | size  32000 x   4096  | type F32  | T+ 110
Wrote /home/raijin/workspace/model_repo/ReluLLaMA-7B-PowerInfer-GGUF/ReluLLaMA-7B.powerinfer.gguf

@picnixz picnixz added the stdlib Python modules in the Lib dir label Aug 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
Status: No status
Development

No branches or pull requests

4 participants