Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-UTF-8 characters cause traceback. #11

Open
zetasyanthis opened this issue Jul 17, 2016 · 1 comment
Open

Non-UTF-8 characters cause traceback. #11

zetasyanthis opened this issue Jul 17, 2016 · 1 comment

Comments

@zetasyanthis
Copy link

zetasyanthis commented Jul 17, 2016

Looks like there's a bit of an explosion on Python 2.7.6 (Ubuntu 14.04) if the entry has non-UTF8 characters in it. I'm investigating... This is again the 0.2 release.

Traceback (most recent call last):
  File "./src/myarchive/main.py", line 154, in <module>
    main()
  File "./src/myarchive/main.py", line 147, in main
    ljapi.download_journals_and_comments(db_session=tag_db.session)
  File "/mnt/bulk/repos/projects/myarchive/src/myarchive/ljlib.py", line 59, in download_journals_and_comments
    nc = update_journal_comments(server=self._server, journal=self.journal)
  File "/usr/local/lib/python2.7/dist-packages/lj/backup.py", line 143, in update_journal_comments
    initial_meta = get_meta_since(journal['last_comment'], server, session)
  File "/usr/local/lib/python2.7/dist-packages/lj/backup.py", line 167, in get_meta_since
    meta = server.fetch_comment_meta(highest, session)
  File "/usr/local/lib/python2.7/dist-packages/lj/lj.py", line 547, in fetch_comment_meta
    self.host + "export_comments.bml?get=comment_meta&startid=%d" % int(startid), session)
  File "/usr/local/lib/python2.7/dist-packages/lj/lj.py", line 517, in __request_with_cookie
    data = io.StringIO(response.read().decode('utf8'))
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: invalid start byte
@zetasyanthis
Copy link
Author

zetasyanthis commented Jul 19, 2016

This resolved the issue for me on Python 3.5. Not sure if it's a good solution for the library generally, however.

response_data = response.read()
data = io.StringIO(response_data.decode(chardet.detect(response_data)["encoding"]))
#data = io.StringIO(response.read().decode('utf8'))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant