Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 12192: invalid start byte #31

Open
hereischen opened this issue May 15, 2018 · 20 comments

Comments

@hereischen
Copy link

hereischen commented May 15, 2018

When I was calling self.serial_console.get_output(), I met UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 12192: invalid start byte.

One possible quick solution will be:
aexpect/client.py +319

@@ -314,8 +315,8 @@  def get_output(self):
         Return the STDOUT and STDERR output of the process so far.
         """
         try:
-            with open(self.output_filename, 'r') as output_file:
-                return output_file.read()
+            with open(self.output_filename, 'rb') as output_file:
+                return output_file.read().decode(errors="ignore")
         except IOError:
             return None

What is your opinion?
@clebergnu @ldoktor @chunfuwen

@chunfuwen
Copy link

chunfuwen commented May 15, 2018 via email

@ldoktor
Copy link
Contributor

ldoktor commented May 17, 2018

Well this looks like a different issue. According to traceback it already uses "utf-8". Anyway even "utf-8" doesn't support all kinds of combinations and \xff is reserved and can't be used by default.

To compile with py3 the get_output should return bytes, therefor the rb mode is correct. We might also consider adding get_output_text which is becoming a standard. Anyway this change will affect all places where we use it (on py3, py2 doesn't care much about bytes vs. strings).

Anyway I'll take another look at the code as more changes like this should be performed.

@ldoktor
Copy link
Contributor

ldoktor commented May 17, 2018

OK I looked at it and as it requires changes to "astring" Avocado library (due to "get_stripped_output") let's prioritize it in the next sprint. https://trello.com/c/MRIDgGi7/1322-aexpect-return-bytes-in-getoutput-and-support-for-text-version-as-well

@pevogam
Copy link
Contributor

pevogam commented Jan 15, 2019

What is the progress on this issue? Can this issue be considered as a general "make aexpect sessions and similar return raw byte output similarly to avocado's process module" or do you think I should open a new issue regarding this problem? As far as I can tell not enforcing decoding of the output of various command execution functionality would solve this entirely and also be more compatible with providing reasonable outputs for commands returning pure byte output (not necessarily strings).

@ldoktor
Copy link
Contributor

ldoktor commented Jan 22, 2019

It looks like this card got buried under other higher-priority tasks. Anyway the basic idea of the feature is there so volunteers are welcome. @clebergnu can we move this task a bit higher?

@pevogam
Copy link
Contributor

pevogam commented Jan 23, 2019

Still, if I misunderstand and this task is not relevant, I could open a new issue specifically about extending aexpect to returning bytes as cmd_output that users can decode according to their own use cases.

@ldoktor
Copy link
Contributor

ldoktor commented Feb 1, 2019

Yes, definitely. If cmd_output_raw (or such) is what you're looking for than that can be easily added (PR shouldn't take longer than writing an issue... 😉 )

@pevogam
Copy link
Contributor

pevogam commented Feb 7, 2019

@ldoktor This will be no doubt true if adding a single method with 'rb' flags like cmd_output_raw would be sufficient. However, the entire class hierarchy is performing encoding/decoding according to an encoding attribute which is set in their constructors. What we would like for instance is to have something like

session.cmd_output("cmd-with-output-encoding-that-only-the-caller-knows-of")

and this requires fixing the entire chain of calls from the ShellSession object up to the base object. Could we maybe avoid encoding/decoding and simply deal with bytes if encoding is set to None? I guess this could break things for older code calls though.

Also, making the default output bytes and adding newer _text methods rather than keeping the current behavior and adding _raw methods is more compatible with the way it is done in other avocado repos. What do you think?

@ldoktor
Copy link
Contributor

ldoktor commented Feb 9, 2019

aexpect issue

@ldoktor
Copy link
Contributor

ldoktor commented Feb 9, 2019

@pevogam the problem is compatibility and time. I added it to our trello so we'll discuss it next week. It'd be nice to have you there (but we can follow-up on email/here)

@pevogam
Copy link
Contributor

pevogam commented Feb 11, 2019

@ldoktor No problem, I will be there just in case.

@ldoktor
Copy link
Contributor

ldoktor commented Feb 14, 2019

Haven't seen you there (on Monday meeting), anyway we updated the card description https://trello.com/c/fqmMWhXs/1491-aexpect-enconding-issue-sync-with-avocado-usage-of-bytes and intend to sync the implementation to Avocado. That means we'll break the current behavior and Aexpect will return bytes by default while adding _text methods to return string. On py2 it shouldn't matter much, but py3 applications will have to adjust. Still the benefit of consistency across all utils seems worth the breakage.

@ldoktor
Copy link
Contributor

ldoktor commented May 15, 2019

Hello guys, as there are not many resources I created #57 that should at least easy the situation while (hopefully) not breaking anyone's workflow.

@pevogam
Copy link
Contributor

pevogam commented May 25, 2019

Hi @ldoktor, the change is good short term solution but I hope it does not decrease the priority of the actual request here which also includes API compatibility to avocado(-vt). It also doesn't handle our case since we have a wrapper utility that would best make use of actual bytes and is not expected to modify the aexpect output in any way (read: simply pass the bytes data around without touching/converting it).

@ldoktor
Copy link
Contributor

ldoktor commented May 29, 2019

Hello @pevogam, this hot-fix is because arm's firmware puts non-utf8 chars in the output which breaks migration.with_reboot tests. We do want to fix aexpect, at this point I just don't know when as we are currently short of 2 people.

@ldoktor
Copy link
Contributor

ldoktor commented May 29, 2019

btw reviews are always welcome 😉

@pevogam
Copy link
Contributor

pevogam commented May 29, 2019

btw reviews are always welcome wink

Hehe, I just don't think I am worthy to make reviews which is why I avoided commenting there. Then I thought it might be brought up here and it seems I was right 😃 As far as I know the most I can do is leave general comments on the PR but I can do that as well.

@kiranbeethoju
Copy link

The best method of reading/accessing that data is by below snippet
pd.read_csv('data.csv', encoding='latin1')
in your case try to add encoding='latin1' that's it

@mclovin-felipe
Copy link

ok...

@exeptionerror
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants