Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-128131: Completely support random read access of uncompressed unencrypted files in ZipFile #128143

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

5ec1cff
Copy link

@5ec1cff 5ec1cff commented Dec 21, 2024

@bedevere-app
Copy link

bedevere-app bot commented Dec 21, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@picnixz
Copy link
Contributor

picnixz commented Dec 21, 2024

Closing as a duplicate of #128132. Please do not copy PRs made by others. Thanks.

EDIT: It appears that this is either the same person or two people collaborating with each other. Anyway, please do not update two identical PRs. Thanks

@picnixz picnixz closed this Dec 21, 2024
@5ec1cff 5ec1cff deleted the 5ec1cff-patch-1 branch December 21, 2024 01:59
@5ec1cff 5ec1cff restored the 5ec1cff-patch-1 branch December 21, 2024 03:19
@5ec1cff
Copy link
Author

5ec1cff commented Dec 21, 2024

Closing as a duplicate of #128132. Please do not copy PRs made by others. Thanks.

EDIT: It appears that this is either the same person or two people collaborating with each other. Anyway, please do not update two identical PRs. Thanks

We discussed about this issue and decide that I submit the pr. vvb2060 has already closed his pr. Please reopen my pr, sorry for inconvenience.

@picnixz picnixz reopened this Dec 21, 2024
Copy link
Contributor

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have tests for that please?

@5ec1cff 5ec1cff changed the title GH-128131: Support forward seek in uncompressed unencrypted ZipExtFile GH-128131: Support fast forward seek in uncompressed unencrypted ZipExtFile Dec 21, 2024
@5ec1cff
Copy link
Author

5ec1cff commented Dec 21, 2024

Can we have tests for that please?

Added.

Copy link
Contributor

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the actual implementation until the tests are (hopefully) simpler.

Lib/test/test_zipfile/test_core.py Outdated Show resolved Hide resolved
Lib/test/test_zipfile/test_core.py Outdated Show resolved Hide resolved
Lib/test/test_zipfile/test_core.py Outdated Show resolved Hide resolved
Lib/test/test_zipfile/test_core.py Outdated Show resolved Hide resolved
Lib/test/test_zipfile/test_core.py Outdated Show resolved Hide resolved
Lib/test/test_zipfile/test_core.py Outdated Show resolved Hide resolved
Lib/test/test_zipfile/test_core.py Outdated Show resolved Hide resolved
Lib/test/test_zipfile/test_core.py Show resolved Hide resolved
Lib/test/test_zipfile/test_core.py Outdated Show resolved Hide resolved
@5ec1cff 5ec1cff changed the title GH-128131: Support fast forward seek in uncompressed unencrypted ZipExtFile GH-128131: Completely support random access of uncompressed unencrypted ZipExtFile Dec 22, 2024
@5ec1cff 5ec1cff changed the title GH-128131: Completely support random access of uncompressed unencrypted ZipExtFile GH-128131: Completely support random access of uncompressed unencrypted files inside ZipFile Dec 22, 2024
@5ec1cff 5ec1cff changed the title GH-128131: Completely support random access of uncompressed unencrypted files inside ZipFile GH-128131: Completely support random read access of uncompressed unencrypted files in ZipFile Dec 22, 2024
@5ec1cff 5ec1cff requested a review from picnixz December 22, 2024 14:59
Copy link
Contributor

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lib/test/test_zipfile/test_core.py Outdated Show resolved Hide resolved
Lib/test/test_zipfile/test_core.py Outdated Show resolved Hide resolved
@5ec1cff 5ec1cff requested a review from danifus December 24, 2024 06:17
Copy link
Contributor

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is: should we treat this as a bug or a feature? it's half-half but I'm not a maintainer of this module so I don't really know. @jaraco WDYT?

Lib/test/test_zipfile/test_core.py Outdated Show resolved Hide resolved
Lib/test/test_zipfile/test_core.py Outdated Show resolved Hide resolved
Lib/test/test_zipfile/test_core.py Outdated Show resolved Hide resolved
Lib/test/test_zipfile/test_core.py Outdated Show resolved Hide resolved
Lib/test/test_zipfile/test_core.py Outdated Show resolved Hide resolved
Lib/test/test_zipfile/test_core.py Show resolved Hide resolved
@picnixz picnixz dismissed their stale review December 24, 2024 15:55

The PR addressed the requested changes.

@5ec1cff 5ec1cff requested a review from picnixz December 25, 2024 04:08
Copy link
Contributor

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits and otherwise I'd like to wait for @jaraco's review (but since we're on holidays, this can take some time)

Lib/test/test_zipfile/test_core.py Outdated Show resolved Hide resolved
Lib/test/test_zipfile/test_core.py Show resolved Hide resolved
Lib/test/test_zipfile/test_core.py Show resolved Hide resolved
@danifus
Copy link
Contributor

danifus commented Dec 26, 2024

Thanks for adding the comments. I'm happy with the PR once you've added @picnixz's suggestions.

@5ec1cff 5ec1cff requested a review from picnixz December 26, 2024 14:52
Copy link
Contributor

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me but I'll wait for Jason's review.

@picnixz picnixz requested review from jaraco and removed request for danifus December 26, 2024 17:09
Copy link
Member

@jaraco jaraco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't have a good understanding of what the problem is, who cares, and thus what solution or solutions should be considered. Please start by addressing my comment in the main issue.



class StoredZipExtFileRandomReadTest(unittest.TestCase):
def test_random_read(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a docstring describing what this function is testing. Give an overview of what the problem is and what is being guaranteed by the test. Also, link to the reported issue, where the problem should be described in exquisite detail.

self.assertEqual(fp.tell(), 10102)
self.assertEqual(arr, txt[10002:10102])
self.assertEqual(fp._left, fp._compress_left)
d = sio.bytes_read - old_count
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a better name for d that indicates its meaning, or just inline it if it's only needed once and its meaning is inconsequential.

# The seek length must be greater than ZipExtFile.MIN_READ_SIZE
# as `ZipExtFile._read2()` reads in blocks of this size and we
# need to seek out of the buffered data
min_size = zipfile.ZipExtFile.MIN_READ_SIZE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this variable "seek_length", since that's what's described in the preceding comment.

Comment on lines 3515 to 3519
# eof flags test
fp.seek(0, os.SEEK_END)
self.assertTrue(fp._eof)
fp.seek(12345, os.SEEK_SET)
self.assertFalse(fp._eof)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it should be a separate test (or two). In fact, I'm not even sure I understand why this private flag is even relevant to the issue at hand. If it's not a separate test with a separate justification, can you explain why it's related to the issue at hand? If these are private attributes, what is the public effect that's being validated?

Copy link
Author

@5ec1cff 5ec1cff Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is to ensure that the eof flag is correctly updated after seeking to the end of the file and then seeking back.



class StoredZipExtFileRandomReadTest(unittest.TestCase):
def test_random_read(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this function seems too broad. It's not simply testing a random read, but it's also testing the case where it's following an optimized path for an uncompressed file. Consider test_stored_seek.

Comment on lines 3476 to 3477
self.assertGreaterEqual(10002, min_size) # for forward seek test
self.assertGreaterEqual(5003, min_size) # for backward seek test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's set variables for these constants.

Suggested change
self.assertGreaterEqual(10002, min_size) # for forward seek test
self.assertGreaterEqual(5003, min_size) # for backward seek test
forward_seek = 10002
backward_seek = 5003
self.assertGreaterEqual(forward_seek, min_size)
self.assertGreaterEqual(backward_seek, min_size)

Comment on lines 3496 to 3498
arr = fp.read(100)
self.assertEqual(fp.tell(), 10102)
self.assertEqual(arr, txt[10002:10102])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-use the variables defined above.

Suggested change
arr = fp.read(100)
self.assertEqual(fp.tell(), 10102)
self.assertEqual(arr, txt[10002:10102])
arr = fp.read(read_length)
self.assertEqual(fp.tell(), forward_seek + read_length)
self.assertEqual(arr, txt[forward_seek:forward_seek + read_length])


# backward seek
old_count = sio.bytes_read
fp.seek(-5003, os.SEEK_CUR)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fp.seek(-5003, os.SEEK_CUR)
fp.seek(-backward_seek, os.SEEK_CUR)

Comment on lines 3506 to 3510
self.assertEqual(fp.tell(), 5099) # 5099 = 10102 - 5003
self.assertEqual(fp._left, fp._compress_left)
arr = fp.read(100)
self.assertEqual(fp.tell(), 5199)
self.assertEqual(arr, txt[5099:5199])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.assertEqual(fp.tell(), 5099) # 5099 = 10102 - 5003
self.assertEqual(fp._left, fp._compress_left)
arr = fp.read(100)
self.assertEqual(fp.tell(), 5199)
self.assertEqual(arr, txt[5099:5199])
self.assertEqual(fp.tell(), forward_seek - backward_seek + read_length)
self.assertEqual(fp._left, fp._compress_left)
arr = fp.read(read_length)
backward_pos = forward_seek - backward_seek + read_length
self.assertEqual(fp.tell(), backward_pos + read_length)
self.assertEqual(arr, txt[backward_pos:backward_pos + read_length])

@@ -1162,13 +1162,15 @@ def seek(self, offset, whence=os.SEEK_SET):
self._offset = buff_offset
read_offset = 0
# Fast seek uncompressed unencrypted file
elif self._compress_type == ZIP_STORED and self._decrypter is None and read_offset > 0:
elif self._compress_type == ZIP_STORED and self._decrypter is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm uneasy about this change, especially because the next block is elif read_offset < 0. This change affects both read_offset == 0 and read_offset < 0. Is that what you intended? I get the feeling what you're really aiming to address is the condition where:

Suggested change
elif self._compress_type == ZIP_STORED and self._decrypter is None:
elif self._compress_type == ZIP_STORED and self._decrypter is None and read_offset >= 0:

This is why it's so important to provide a detailed description of the problem, how you discovered it, and what cases it affects.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, here we consider the case of read_offset < 0, and should exclude the case of read_offset = 0

@bedevere-app
Copy link

bedevere-app bot commented Dec 27, 2024

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@5ec1cff
Copy link
Author

5ec1cff commented Jan 2, 2025

I have made the requested changes; please review again

@bedevere-app
Copy link

bedevere-app bot commented Jan 2, 2025

Thanks for making the requested changes!

@jaraco: please review the changes made to this pull request.

@bedevere-app bedevere-app bot requested a review from jaraco January 2, 2025 03:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants