-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-128131: Completely support random read access of uncompressed unencrypted files in ZipFile #128143
base: main
Are you sure you want to change the base?
Conversation
5ec1cff
commented
Dec 21, 2024
•
edited by bedevere-app
bot
Loading
edited by bedevere-app
bot
- Issue: random access uncompressed unencrypted ZipExtFile #128131
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Closing as a duplicate of #128132. Please do not copy PRs made by others. Thanks. EDIT: It appears that this is either the same person or two people collaborating with each other. Anyway, please do not update two identical PRs. Thanks |
We discussed about this issue and decide that I submit the pr. vvb2060 has already closed his pr. Please reopen my pr, sorry for inconvenience. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have tests for that please?
Misc/NEWS.d/next/Library/2024-12-21-03-20-12.gh-issue-128131.QpPmNt.rst
Outdated
Show resolved
Hide resolved
Added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at the actual implementation until the tests are (hopefully) simpler.
…pPmNt.rst Co-authored-by: Bénédikt Tran <[email protected]>
add unittest
2c34891
to
57cb51c
Compare
Misc/NEWS.d/next/Library/2024-12-21-03-20-12.gh-issue-128131.QpPmNt.rst
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address this comment: https://github.com/python/cpython/pull/128143/files#r1894920324.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is: should we treat this as a bug or a feature? it's half-half but I'm not a maintainer of this module so I don't really know. @jaraco WDYT?
The PR addressed the requested changes.
Co-authored-by: Bénédikt Tran <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits and otherwise I'd like to wait for @jaraco's review (but since we're on holidays, this can take some time)
Thanks for adding the comments. I'm happy with the PR once you've added @picnixz's suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me but I'll wait for Jason's review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't have a good understanding of what the problem is, who cares, and thus what solution or solutions should be considered. Please start by addressing my comment in the main issue.
Lib/test/test_zipfile/test_core.py
Outdated
|
||
|
||
class StoredZipExtFileRandomReadTest(unittest.TestCase): | ||
def test_random_read(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a docstring describing what this function is testing. Give an overview of what the problem is and what is being guaranteed by the test. Also, link to the reported issue, where the problem should be described in exquisite detail.
Lib/test/test_zipfile/test_core.py
Outdated
self.assertEqual(fp.tell(), 10102) | ||
self.assertEqual(arr, txt[10002:10102]) | ||
self.assertEqual(fp._left, fp._compress_left) | ||
d = sio.bytes_read - old_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use a better name for d
that indicates its meaning, or just inline it if it's only needed once and its meaning is inconsequential.
Lib/test/test_zipfile/test_core.py
Outdated
# The seek length must be greater than ZipExtFile.MIN_READ_SIZE | ||
# as `ZipExtFile._read2()` reads in blocks of this size and we | ||
# need to seek out of the buffered data | ||
min_size = zipfile.ZipExtFile.MIN_READ_SIZE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe call this variable "seek_length", since that's what's described in the preceding comment.
Lib/test/test_zipfile/test_core.py
Outdated
# eof flags test | ||
fp.seek(0, os.SEEK_END) | ||
self.assertTrue(fp._eof) | ||
fp.seek(12345, os.SEEK_SET) | ||
self.assertFalse(fp._eof) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like it should be a separate test (or two). In fact, I'm not even sure I understand why this private flag is even relevant to the issue at hand. If it's not a separate test with a separate justification, can you explain why it's related to the issue at hand? If these are private attributes, what is the public effect that's being validated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is to ensure that the eof flag is correctly updated after seeking to the end of the file and then seeking back.
Lib/test/test_zipfile/test_core.py
Outdated
|
||
|
||
class StoredZipExtFileRandomReadTest(unittest.TestCase): | ||
def test_random_read(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of this function seems too broad. It's not simply testing a random read, but it's also testing the case where it's following an optimized path for an uncompressed file. Consider test_stored_seek
.
Lib/test/test_zipfile/test_core.py
Outdated
self.assertGreaterEqual(10002, min_size) # for forward seek test | ||
self.assertGreaterEqual(5003, min_size) # for backward seek test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's set variables for these constants.
self.assertGreaterEqual(10002, min_size) # for forward seek test | |
self.assertGreaterEqual(5003, min_size) # for backward seek test | |
forward_seek = 10002 | |
backward_seek = 5003 | |
self.assertGreaterEqual(forward_seek, min_size) | |
self.assertGreaterEqual(backward_seek, min_size) |
Lib/test/test_zipfile/test_core.py
Outdated
arr = fp.read(100) | ||
self.assertEqual(fp.tell(), 10102) | ||
self.assertEqual(arr, txt[10002:10102]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-use the variables defined above.
arr = fp.read(100) | |
self.assertEqual(fp.tell(), 10102) | |
self.assertEqual(arr, txt[10002:10102]) | |
arr = fp.read(read_length) | |
self.assertEqual(fp.tell(), forward_seek + read_length) | |
self.assertEqual(arr, txt[forward_seek:forward_seek + read_length]) |
Lib/test/test_zipfile/test_core.py
Outdated
|
||
# backward seek | ||
old_count = sio.bytes_read | ||
fp.seek(-5003, os.SEEK_CUR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fp.seek(-5003, os.SEEK_CUR) | |
fp.seek(-backward_seek, os.SEEK_CUR) |
Lib/test/test_zipfile/test_core.py
Outdated
self.assertEqual(fp.tell(), 5099) # 5099 = 10102 - 5003 | ||
self.assertEqual(fp._left, fp._compress_left) | ||
arr = fp.read(100) | ||
self.assertEqual(fp.tell(), 5199) | ||
self.assertEqual(arr, txt[5099:5199]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.assertEqual(fp.tell(), 5099) # 5099 = 10102 - 5003 | |
self.assertEqual(fp._left, fp._compress_left) | |
arr = fp.read(100) | |
self.assertEqual(fp.tell(), 5199) | |
self.assertEqual(arr, txt[5099:5199]) | |
self.assertEqual(fp.tell(), forward_seek - backward_seek + read_length) | |
self.assertEqual(fp._left, fp._compress_left) | |
arr = fp.read(read_length) | |
backward_pos = forward_seek - backward_seek + read_length | |
self.assertEqual(fp.tell(), backward_pos + read_length) | |
self.assertEqual(arr, txt[backward_pos:backward_pos + read_length]) |
Lib/zipfile/__init__.py
Outdated
@@ -1162,13 +1162,15 @@ def seek(self, offset, whence=os.SEEK_SET): | |||
self._offset = buff_offset | |||
read_offset = 0 | |||
# Fast seek uncompressed unencrypted file | |||
elif self._compress_type == ZIP_STORED and self._decrypter is None and read_offset > 0: | |||
elif self._compress_type == ZIP_STORED and self._decrypter is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm uneasy about this change, especially because the next block is elif read_offset < 0
. This change affects both read_offset == 0
and read_offset < 0
. Is that what you intended? I get the feeling what you're really aiming to address is the condition where:
elif self._compress_type == ZIP_STORED and self._decrypter is None: | |
elif self._compress_type == ZIP_STORED and self._decrypter is None and read_offset >= 0: |
This is why it's so important to provide a detailed description of the problem, how you discovered it, and what cases it affects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, here we consider the case of read_offset < 0, and should exclude the case of read_offset = 0
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
e004281
to
806b1a5
Compare
806b1a5
to
1239005
Compare
I have made the requested changes; please review again |
Thanks for making the requested changes! @jaraco: please review the changes made to this pull request. |