Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid byte sequence crashes ksdump 0.7 #61

Open
ross-spencer opened this issue Nov 4, 2023 · 3 comments
Open

Invalid byte sequence crashes ksdump 0.7 #61

ross-spencer opened this issue Nov 4, 2023 · 3 comments

Comments

@ross-spencer
Copy link

I an seeing ksdump crash with the following:

ksdump prescription-sample-1.0-le.eygl kaitai/eyeglass_little_endian.ksy 
Compilation OK
... processing kaitai/eyeglass_little_endian.ksy 0
...... loading eyeglass_format.rb
Classes loaded OK, main class = EyeglassFormat
/usr/lib/ruby/3.0.0/psych/visitors/yaml_tree.rb:268:in `visit_String': invalid byte sequence in UTF-8 (ArgumentError)
	from /usr/lib/ruby/3.0.0/psych/visitors/yaml_tree.rb:136:in `accept'
	from /usr/lib/ruby/3.0.0/psych/visitors/yaml_tree.rb:330:in `block in visit_Hash'
	from /usr/lib/ruby/3.0.0/psych/visitors/yaml_tree.rb:328:in `each'
	from /usr/lib/ruby/3.0.0/psych/visitors/yaml_tree.rb:328:in `visit_Hash'
	from /usr/lib/ruby/3.0.0/psych/visitors/yaml_tree.rb:136:in `accept'
	from /usr/lib/ruby/3.0.0/psych/visitors/yaml_tree.rb:118:in `push'
	from /usr/lib/ruby/3.0.0/psych.rb:513:in `dump'
	from /usr/lib/ruby/3.0.0/psych/core_ext.rb:13:in `to_yaml'
	from /var/lib/gems/3.0.0/gems/kaitai-struct-visualizer-0.7/bin/ksdump:122:in `<top (required)>'
	from /usr/local/bin/ksdump:25:in `load'
	from /usr/local/bin/ksdump:25:in `<main>'

Example file.

Example ksy.

Via: https://github.com/ross-spencer/eyeglass/tree/main

I'm not clear on if this is a user error, or if there's a newer version of ksdump I should be looking at?

@GreyCat
Copy link
Member

GreyCat commented Nov 4, 2023

@ross-spencer, thanks for reporting this!

It looks like in this specific example, the caveat is with the fact that Ruby UTF-8 parsing is quite "forgiving" in terms of getting byte sequences which are not valid UTF-8 characters as strings, e.g.

[.] magic = "\xBB\r\neyeglass\u001A\n\xAB"
[.] eof = "\xBBeof"

This backfires when trying to use standard YAML or JSON libraries trying to dump these, though, as they give up.

To be honest, I'm not sure what behavior we'd want in this case. Probably reporting a more precise error should be a good idea (e.g. down to a specific attribute path). Showing that string is not a valid UTF-8 in ksv might be a good idea too, but then ksv is supposed to be "all forgiving", and, it actually is right now.

The simplest fix on .ksy side would be removing type: str and encoding: utf-8. This yields raw bytes rendition:

axis_right_left:
- 130
- 80
base_right_left:
- 0.0
- 0.0
cylinder_right_left:
- -0.25
- -1.0
datetime: '2012-11-08T12:37:50'
distance_acuity_right_left:
- 0.6600000262260437
- 0.5
endianness: 1
eof: BB 65 6F 66
format_expansion_room: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  00 00 00 00 00 00 00 00 00 00 00 00 00 00
magic: BB 0D 0A 65 79 65 67 6C 61 73 73 1A 0A AB
near_acuity_right_left:
- 12
- 12
next_checkup_years: 1.0
observation9: "Patient's eyesight needs correction. History of diabetes in family
  but indicators found. Standard checkup interval recommended.\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
prism_right_left:
- 0.0
- 0.0
purpose: "Distance and Close Work.\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
sphere_right_left:
- -3.3499999046325684
- 0.5
version: '01'

@GreyCat
Copy link
Member

GreyCat commented Nov 4, 2023

From ksdump's side, I can propose several solutions. While doing recursive traversal, check for string validity (e.g. value.valid_encoding?). Then, if we detect the problem:

  • Note where this is in the tree and report the error, pointing to that location
  • Potentially, we can substitute offending characters with something (e.g. nothing or replacement character U+FFFD)
  • Potentially, we can dump such strings as raw bytes (e.g. forcing behavior shown above)

Thoughts?

@ross-spencer
Copy link
Author

@GreyCat thanks for that. I see where my assumptions were wrong now. I took your approach to remove the offending str and encoding, but then decided I could get a lot of value out of turning the magic into a type which gives me some more granular output. I'll need to see what that does for me if using kaitai to auto-generate parsing code.

ksdump has been useful for debugging here, I'll reflect on that in my writeup. From my perspective, out of the potential changes, then:

Note where this is in the tree and report the error, pointing to that location

Would be a useful starting place, as for string substitution, or dumping raw bytes, then they seem nice options, but providing they didn't obscure the user error in any way, i.e. it was still clear that the ksy file contains a potentially invalid instruction/the input file contains some unexpected data.

ross-spencer added a commit to ross-spencer/eyeglass that referenced this issue Nov 5, 2023
The added granularity allows us to correctly decode the BOF and EOF
sequences in ksdump, annotating each part of the objects that form
those. NB. the description could be more detailed than initially
committed here.

Related to: kaitai-io/kaitai_struct_visualizer#61
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants