Non-UTF-8 characters cause traceback. #11

zetasyanthis · 2016-07-17T00:06:05Z

Looks like there's a bit of an explosion on Python 2.7.6 (Ubuntu 14.04) if the entry has non-UTF8 characters in it. I'm investigating... This is again the 0.2 release.

Traceback (most recent call last):
  File "./src/myarchive/main.py", line 154, in <module>
    main()
  File "./src/myarchive/main.py", line 147, in main
    ljapi.download_journals_and_comments(db_session=tag_db.session)
  File "/mnt/bulk/repos/projects/myarchive/src/myarchive/ljlib.py", line 59, in download_journals_and_comments
    nc = update_journal_comments(server=self._server, journal=self.journal)
  File "/usr/local/lib/python2.7/dist-packages/lj/backup.py", line 143, in update_journal_comments
    initial_meta = get_meta_since(journal['last_comment'], server, session)
  File "/usr/local/lib/python2.7/dist-packages/lj/backup.py", line 167, in get_meta_since
    meta = server.fetch_comment_meta(highest, session)
  File "/usr/local/lib/python2.7/dist-packages/lj/lj.py", line 547, in fetch_comment_meta
    self.host + "export_comments.bml?get=comment_meta&startid=%d" % int(startid), session)
  File "/usr/local/lib/python2.7/dist-packages/lj/lj.py", line 517, in __request_with_cookie
    data = io.StringIO(response.read().decode('utf8'))
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: invalid start byte

The text was updated successfully, but these errors were encountered:

zetasyanthis · 2016-07-19T06:25:31Z

This resolved the issue for me on Python 3.5. Not sure if it's a good solution for the library generally, however.

response_data = response.read()
data = io.StringIO(response_data.decode(chardet.detect(response_data)["encoding"]))
#data = io.StringIO(response.read().decode('utf8'))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-UTF-8 characters cause traceback. #11

Non-UTF-8 characters cause traceback. #11

zetasyanthis commented Jul 17, 2016 •

edited

Loading

zetasyanthis commented Jul 19, 2016 •

edited

Loading

Non-UTF-8 characters cause traceback. #11

Non-UTF-8 characters cause traceback. #11

Comments

zetasyanthis commented Jul 17, 2016 • edited Loading

zetasyanthis commented Jul 19, 2016 • edited Loading

zetasyanthis commented Jul 17, 2016 •

edited

Loading

zetasyanthis commented Jul 19, 2016 •

edited

Loading