UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 12192: invalid start byte #31

hereischen · 2018-05-15T10:08:50Z

When I was calling self.serial_console.get_output(), I met UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 12192: invalid start byte.

One possible quick solution will be:
aexpect/client.py +319

@@ -314,8 +315,8 @@  def get_output(self):
         Return the STDOUT and STDERR output of the process so far.
         """
         try:
-            with open(self.output_filename, 'r') as output_file:
-                return output_file.read()
+            with open(self.output_filename, 'rb') as output_file:
+                return output_file.read().decode(errors="ignore")
         except IOError:
             return None

What is your opinion?
@clebergnu @ldoktor @chunfuwen

chunfuwen · 2018-05-15T12:25:29Z

You may refer to avocado-framework/avocado-vt#1518,which use locale.getpreferredencoding() to resolve codec decode issue. Chunfu Wen

…

----- 原始邮件 ----- 发件人: "Haotong Chen" <[email protected]> 收件人: "avocado-framework/aexpect" <[email protected]> 抄送: "chunfu wen" <[email protected]>, "Mention" <[email protected]> 发送时间: 星期二, 2018年 5 月 15日下午 6:08:51 主题: [avocado-framework/aexpect] UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 12192: invalid start byte (#31) When I am calling self.serial_console.get_output(), I met UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 12192: invalid start byte. One possible sulotion will be: aexpect/client.py +319 ```patch @@ -314,8 +315,8 @@ def get_output(self): Return the STDOUT and STDERR output of the process so far. """ try: - with open(self.output_filename, 'r') as output_file: - return output_file.read() + with open(self.output_filename, 'rb') as output_file: + return output_file.read().decode(errors="ignore") except IOError: return None ``` What is your opinion? @clebergnu @ldoktor @chunfuwen

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: #31

ldoktor · 2018-05-17T07:48:56Z

Well this looks like a different issue. According to traceback it already uses "utf-8". Anyway even "utf-8" doesn't support all kinds of combinations and \xff is reserved and can't be used by default.

To compile with py3 the get_output should return bytes, therefor the rb mode is correct. We might also consider adding get_output_text which is becoming a standard. Anyway this change will affect all places where we use it (on py3, py2 doesn't care much about bytes vs. strings).

Anyway I'll take another look at the code as more changes like this should be performed.

ldoktor · 2018-05-17T11:07:17Z

OK I looked at it and as it requires changes to "astring" Avocado library (due to "get_stripped_output") let's prioritize it in the next sprint. https://trello.com/c/MRIDgGi7/1322-aexpect-return-bytes-in-getoutput-and-support-for-text-version-as-well

pevogam · 2019-01-15T13:55:28Z

What is the progress on this issue? Can this issue be considered as a general "make aexpect sessions and similar return raw byte output similarly to avocado's process module" or do you think I should open a new issue regarding this problem? As far as I can tell not enforcing decoding of the output of various command execution functionality would solve this entirely and also be more compatible with providing reasonable outputs for commands returning pure byte output (not necessarily strings).

ldoktor · 2019-01-22T13:55:08Z

It looks like this card got buried under other higher-priority tasks. Anyway the basic idea of the feature is there so volunteers are welcome. @clebergnu can we move this task a bit higher?

pevogam · 2019-01-23T18:04:34Z

Still, if I misunderstand and this task is not relevant, I could open a new issue specifically about extending aexpect to returning bytes as cmd_output that users can decode according to their own use cases.

ldoktor · 2019-02-01T06:56:52Z

Yes, definitely. If cmd_output_raw (or such) is what you're looking for than that can be easily added (PR shouldn't take longer than writing an issue... 😉 )

pevogam · 2019-02-07T14:32:15Z

@ldoktor This will be no doubt true if adding a single method with 'rb' flags like cmd_output_raw would be sufficient. However, the entire class hierarchy is performing encoding/decoding according to an encoding attribute which is set in their constructors. What we would like for instance is to have something like

session.cmd_output("cmd-with-output-encoding-that-only-the-caller-knows-of")

and this requires fixing the entire chain of calls from the ShellSession object up to the base object. Could we maybe avoid encoding/decoding and simply deal with bytes if encoding is set to None? I guess this could break things for older code calls though.

Also, making the default output bytes and adding newer _text methods rather than keeping the current behavior and adding _raw methods is more compatible with the way it is done in other avocado repos. What do you think?

ldoktor · 2019-02-09T07:41:32Z

aexpect issue

ldoktor · 2019-02-09T07:43:56Z

@pevogam the problem is compatibility and time. I added it to our trello so we'll discuss it next week. It'd be nice to have you there (but we can follow-up on email/here)

pevogam · 2019-02-11T08:49:12Z

@ldoktor No problem, I will be there just in case.

ldoktor · 2019-02-14T07:36:55Z

Haven't seen you there (on Monday meeting), anyway we updated the card description https://trello.com/c/fqmMWhXs/1491-aexpect-enconding-issue-sync-with-avocado-usage-of-bytes and intend to sync the implementation to Avocado. That means we'll break the current behavior and Aexpect will return bytes by default while adding _text methods to return string. On py2 it shouldn't matter much, but py3 applications will have to adjust. Still the benefit of consistency across all utils seems worth the breakage.

ldoktor · 2019-05-15T15:05:59Z

Hello guys, as there are not many resources I created #57 that should at least easy the situation while (hopefully) not breaking anyone's workflow.

pevogam · 2019-05-25T09:11:38Z

Hi @ldoktor, the change is good short term solution but I hope it does not decrease the priority of the actual request here which also includes API compatibility to avocado(-vt). It also doesn't handle our case since we have a wrapper utility that would best make use of actual bytes and is not expected to modify the aexpect output in any way (read: simply pass the bytes data around without touching/converting it).

ldoktor · 2019-05-29T06:49:55Z

Hello @pevogam, this hot-fix is because arm's firmware puts non-utf8 chars in the output which breaks migration.with_reboot tests. We do want to fix aexpect, at this point I just don't know when as we are currently short of 2 people.

ldoktor · 2019-05-29T06:50:27Z

btw reviews are always welcome 😉

pevogam · 2019-05-29T07:09:13Z

btw reviews are always welcome wink

Hehe, I just don't think I am worthy to make reviews which is why I avoided commenting there. Then I thought it might be brought up here and it seems I was right 😃 As far as I know the most I can do is leave general comments on the PR but I can do that as well.

kiranbeethoju · 2019-12-30T13:42:24Z

The best method of reading/accessing that data is by below snippet
pd.read_csv('data.csv', encoding='latin1')
in your case try to add encoding='latin1' that's it

mclovin-felipe · 2020-10-01T23:22:19Z

ok...

exeptionerror · 2021-06-22T10:47:19Z

All Possible solution added [Solved] error UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xff in position 0: invalid start byte

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 12192: invalid start byte #31

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 12192: invalid start byte #31

hereischen commented May 15, 2018 •

edited

Loading

chunfuwen commented May 15, 2018 via email

ldoktor commented May 17, 2018

ldoktor commented May 17, 2018 •

edited

Loading

pevogam commented Jan 15, 2019

ldoktor commented Jan 22, 2019

pevogam commented Jan 23, 2019

ldoktor commented Feb 1, 2019 •

edited

Loading

pevogam commented Feb 7, 2019 •

edited

Loading

ldoktor commented Feb 9, 2019

ldoktor commented Feb 9, 2019

pevogam commented Feb 11, 2019

ldoktor commented Feb 14, 2019 •

edited

Loading

ldoktor commented May 15, 2019

pevogam commented May 25, 2019

ldoktor commented May 29, 2019

ldoktor commented May 29, 2019

pevogam commented May 29, 2019

kiranbeethoju commented Dec 30, 2019

mclovin-felipe commented Oct 1, 2020

exeptionerror commented Jun 22, 2021

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 12192: invalid start byte #31

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 12192: invalid start byte #31

Comments

hereischen commented May 15, 2018 • edited Loading

chunfuwen commented May 15, 2018 via email

ldoktor commented May 17, 2018

ldoktor commented May 17, 2018 • edited Loading

pevogam commented Jan 15, 2019

ldoktor commented Jan 22, 2019

pevogam commented Jan 23, 2019

ldoktor commented Feb 1, 2019 • edited Loading

pevogam commented Feb 7, 2019 • edited Loading

ldoktor commented Feb 9, 2019

ldoktor commented Feb 9, 2019

pevogam commented Feb 11, 2019

ldoktor commented Feb 14, 2019 • edited Loading

ldoktor commented May 15, 2019

pevogam commented May 25, 2019

ldoktor commented May 29, 2019

ldoktor commented May 29, 2019

pevogam commented May 29, 2019

kiranbeethoju commented Dec 30, 2019

mclovin-felipe commented Oct 1, 2020

exeptionerror commented Jun 22, 2021

hereischen commented May 15, 2018 •

edited

Loading

ldoktor commented May 17, 2018 •

edited

Loading

ldoktor commented Feb 1, 2019 •

edited

Loading

pevogam commented Feb 7, 2019 •

edited

Loading

ldoktor commented Feb 14, 2019 •

edited

Loading