Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xopen: open gzipped files transparently #76

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

kristjaneerik
Copy link
Collaborator

Here's an implementation of stor.xopen which works just like stor.open for regular files, but uses a layer of gzip.GzipFile for files ending in .gz such that you don't have to worry about compression, it happens behind the scenes.

Not sure if I did it correctly with respect to forcing the mode etc..

The portion of the test that runs with stor.xopen(stor.join(self.drive, 'A/C/utf8_file_with_unicode.txt'), 'rb') as xfp: doesn't work if the mode is r. I suspect it's something to do with the mocking rather than the implementation, since the code above it with mode r works. ¯\_(ツ)_/¯

Suggestions for better approaches to testing appreciated!

Inspiration from https://github.com/marcelm/xopen. Would be great if we could just use that, but I'm not sure we can.

@jtratner
Copy link
Contributor

jtratner commented Apr 20, 2018 via email

@jtratner
Copy link
Contributor

jtratner commented Apr 20, 2018 via email

@kristjaneerik
Copy link
Collaborator Author

I added an emoji to the test file.

All of these give PASSED as the result for the single test being run:

tox -e py27 -- stor/tests/test_swift.py::TestSwiftShared::test_xopen_gzip
tox -e py27 -- stor/tests/test_swift.py::TestSwiftShared::test_xopen_regular
tox -e py36 -- stor/tests/test_swift.py::TestSwiftShared::test_xopen_gzip
tox -e py36 -- stor/tests/test_swift.py::TestSwiftShared::test_xopen_regular

@jtratner
Copy link
Contributor

jtratner commented Apr 20, 2018 via email

if self.endswith('.gz'):
if mode == 'r':
mode = 'rb'
if mode == 'w':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these codepaths aren't actually tested (that's what is causing CI to fail).

Additionally, you need to pass the original mode to the gzip.GzipFile object, so perhaps you want something like:

if mode in ('r', 'rb'): fp_mode='rb'
if mode in ('w', 'wb'): fp_mode = 'wb'
fp = self.open(fp_mode, *args, **kwargs)
gzfp = gzip.GzipFile(mode=mode, fileobj=fp)

Additionally, GzipFile's docs say that it doesn't automatically close the underlying file object - which means that if you do something like this:

with stor.xopen('s3://unauthed-bucket/myfile.csv.gz') as fp:
    fp.write('somedata')

the exception will not bubble up to the user (I believe)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants