-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xopen: open gzipped files transparently #76
base: master
Are you sure you want to change the base?
Conversation
Looks nice, but have you tested this on python3? Try writing an emoji or
other Unicode into your file :-/ might blow up unfortunately
…On Thu, Apr 19, 2018 at 6:21 PM Kristjan Eerik Kaseniit < ***@***.***> wrote:
Here's an implementation of stor.xopen which works just like stor.open
for regular files, but uses a layer of gzip.GzipFile for files ending in
.gz such that you don't have to worry about compression, it happens
behind the scenes.
Not sure if I did it correctly with respect to forcing the mode etc..
The portion of the test that runs with stor.xopen(stor.join(self.drive,
'A/C/utf8_file_with_unicode.txt'), 'rb') as xfp: doesn't work if the mode
is r. I suspect it's something to do with the mocking rather than the
implementation, since the code above it with mode r works. ¯\_(ツ)_/¯
Suggestions for better approaches to testing appreciated!
Inspiration from https://github.com/marcelm/xopen. Would be great if we
could just use that, but I'm not sure we can.
------------------------------
You can view, comment on, or merge this pull request online at:
#76
Commit Summary
- Merge pull request #1 from counsyl/master
- Sem-Ver: feature - xopen: transparently handle gzipped-files
- add file with some unicode
File Changes
- *M* stor/__init__.py
<https://github.com/counsyl/stor/pull/76/files#diff-0> (1)
- *M* stor/base.py
<https://github.com/counsyl/stor/pull/76/files#diff-1> (17)
- *A* stor/tests/file_data/utf8_file_with_unicode.txt
<https://github.com/counsyl/stor/pull/76/files#diff-2> (2)
- *M* stor/tests/shared_obs.py
<https://github.com/counsyl/stor/pull/76/files#diff-3> (33)
Patch Links:
- https://github.com/counsyl/stor/pull/76.patch
- https://github.com/counsyl/stor/pull/76.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#76>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABhjqzVpNOZtPd8dyQO2kqOURHss5wliks5tqTgOgaJpZM4TcuTN>
.
|
(If it works on python 3 then I’m super down and will review more in-depth!)
On Thu, Apr 19, 2018 at 7:15 PM Jeffrey Tratner <[email protected]>
wrote:
… Looks nice, but have you tested this on python3? Try writing an emoji or
other Unicode into your file :-/ might blow up unfortunately
On Thu, Apr 19, 2018 at 6:21 PM Kristjan Eerik Kaseniit <
***@***.***> wrote:
> Here's an implementation of stor.xopen which works just like stor.open
> for regular files, but uses a layer of gzip.GzipFile for files ending in
> .gz such that you don't have to worry about compression, it happens
> behind the scenes.
>
> Not sure if I did it correctly with respect to forcing the mode etc..
>
> The portion of the test that runs with stor.xopen(stor.join(self.drive,
> 'A/C/utf8_file_with_unicode.txt'), 'rb') as xfp: doesn't work if the
> mode is r. I suspect it's something to do with the mocking rather than
> the implementation, since the code above it with mode r works. ¯\_(ツ)_/¯
>
> Suggestions for better approaches to testing appreciated!
>
> Inspiration from https://github.com/marcelm/xopen. Would be great if we
> could just use that, but I'm not sure we can.
> ------------------------------
> You can view, comment on, or merge this pull request online at:
>
> #76
> Commit Summary
>
> - Merge pull request #1 from counsyl/master
> - Sem-Ver: feature - xopen: transparently handle gzipped-files
> - add file with some unicode
>
> File Changes
>
> - *M* stor/__init__.py
> <https://github.com/counsyl/stor/pull/76/files#diff-0> (1)
> - *M* stor/base.py
> <https://github.com/counsyl/stor/pull/76/files#diff-1> (17)
> - *A* stor/tests/file_data/utf8_file_with_unicode.txt
> <https://github.com/counsyl/stor/pull/76/files#diff-2> (2)
> - *M* stor/tests/shared_obs.py
> <https://github.com/counsyl/stor/pull/76/files#diff-3> (33)
>
> Patch Links:
>
> - https://github.com/counsyl/stor/pull/76.patch
> - https://github.com/counsyl/stor/pull/76.diff
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <#76>, or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABhjqzVpNOZtPd8dyQO2kqOURHss5wliks5tqTgOgaJpZM4TcuTN>
> .
>
|
I added an emoji to the test file. All of these give tox -e py27 -- stor/tests/test_swift.py::TestSwiftShared::test_xopen_gzip
tox -e py27 -- stor/tests/test_swift.py::TestSwiftShared::test_xopen_regular
tox -e py36 -- stor/tests/test_swift.py::TestSwiftShared::test_xopen_gzip
tox -e py36 -- stor/tests/test_swift.py::TestSwiftShared::test_xopen_regular |
Awesome!
…On Fri, Apr 20, 2018 at 2:07 PM Kristjan Eerik Kaseniit < ***@***.***> wrote:
I added an emoji to the test file.
All of these give PASSED as the result for the single test being run:
tox -e py27 -- stor/tests/test_swift.py::TestSwiftShared::test_xopen_gzip
tox -e py27 -- stor/tests/test_swift.py::TestSwiftShared::test_xopen_regular
tox -e py36 -- stor/tests/test_swift.py::TestSwiftShared::test_xopen_gzip
tox -e py36 -- stor/tests/test_swift.py::TestSwiftShared::test_xopen_regular
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#76 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABhjq2sm_HtH2NSobBeRsWYF_Puhghnuks5tqk4SgaJpZM4TcuTN>
.
|
if self.endswith('.gz'): | ||
if mode == 'r': | ||
mode = 'rb' | ||
if mode == 'w': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these codepaths aren't actually tested (that's what is causing CI to fail).
Additionally, you need to pass the original mode to the gzip.GzipFile object, so perhaps you want something like:
if mode in ('r', 'rb'): fp_mode='rb'
if mode in ('w', 'wb'): fp_mode = 'wb'
fp = self.open(fp_mode, *args, **kwargs)
gzfp = gzip.GzipFile(mode=mode, fileobj=fp)
Additionally, GzipFile's docs say that it doesn't automatically close the underlying file object - which means that if you do something like this:
with stor.xopen('s3://unauthed-bucket/myfile.csv.gz') as fp:
fp.write('somedata')
the exception will not bubble up to the user (I believe)
Here's an implementation of
stor.xopen
which works just likestor.open
for regular files, but uses a layer ofgzip.GzipFile
for files ending in.gz
such that you don't have to worry about compression, it happens behind the scenes.Not sure if I did it correctly with respect to forcing the mode etc..
The portion of the test that runs
with stor.xopen(stor.join(self.drive, 'A/C/utf8_file_with_unicode.txt'), 'rb') as xfp:
doesn't work if the mode isr
. I suspect it's something to do with the mocking rather than the implementation, since the code above it with moder
works. ¯\_(ツ)_/¯Suggestions for better approaches to testing appreciated!
Inspiration from https://github.com/marcelm/xopen. Would be great if we could just use that, but I'm not sure we can.