Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to understand how nested types are resolved #1019

Closed
wader opened this issue Apr 11, 2023 · 16 comments
Closed

Trying to understand how nested types are resolved #1019

wader opened this issue Apr 11, 2023 · 16 comments
Labels

Comments

@wader
Copy link

wader commented Apr 11, 2023

Hello

I'm trying to understand how nested types and :: it used to resolves types. Here is an example that confuses me:

meta:
  id: nested_types
  endian: le
seq:
  - id: s0
    type: t0
instances:
  e0_hello_0:
    value: t0::e0::hello
  e0_hello_1:
    value: ::t0::e0::hello
  e0_hello_2:
    value: t0::t0::e0::hello
  e0_hello_3:
    value: t0::t0::t0::e0::hello
types:
  t0:
    seq:
      - id: t0s0
        type: u1
    enums:
      e0:
        1: hello
    types:
      t0:
        seq:
          - id: t0t0s0
            type: u2
        enums:
          e0:
            2: hello

Using https://ide.kaitai.io/ results in: (using the default zip input)

{
  "s0": {
    "t0t0s0": 19280
  },
  "e0Hello0": { "name": "HELLO", "value": 2 },
  "e0Hello1": { "name": "HELLO", "value": 2 },
  "e0Hello2": { "name": "HELLO", "value": 2 },
  "e0Hello3": { "name": "HELLO", "value": 2 }
}

type: t0 seems to resolve to the nested "t0::t0", if i rename it to "t0::t1" the result is:

{
  "s0": {
    "t0s0": 80
  },
  "e0Hello0": { "name": "HELLO", "value": 1 },
  "e0Hello1": { "name": "HELLO", "value": 1 },
  "e0Hello2": { "name": "HELLO", "value": 1 },
  "e0Hello3": { "name": "HELLO", "value": 1 }
}

now type: t0 resolves to the "top" t0.

I get a feeling type name are global somehow even when using ::?

@wader
Copy link
Author

wader commented Apr 11, 2023

Should mention that i'm experiment with adding basic runtime support for kaitai to https://github.com/wader/fq so i'm trying to get a deeper understanding how types, enums etc are namespaces and resolved.

@Mingun
Copy link

Mingun commented Apr 11, 2023

There is a bug in WebIDE about equally named types: kaitai-io/kaitai_struct_webide#105

@wader
Copy link
Author

wader commented Apr 11, 2023

Ah yeap with ksdump i get the same result for both, thanks.

Is the type/enum path resolution described more in detailed somewhere? looking at the ksy schema it says

one can reference a nested user-defined type by specifying a relative path to it from the current type, with a double colon as a path delimiter (e.g. foo::bar::my_type)

I'm trying to understand what "relative path to it from the current type" means. After playing around a bit i understand it as this: First try to be resolved "downwards" from the current type, if no type is found, try resolve from each parent type down to the root. Would that also mean that if there are duplicate type names you might no be able to reach it from some places? example:

meta:
  id: nested_types
  endian: le
seq:
  - id: s0
    type: t0::t0
enums:
  e:
    1: hello_e
types:
  t0:
    enums:
      e:
        1: hello_t0e
    types:
      t0:
        enums:
          e:
            1: hello_t0t0e
        instances:
          # is it possible to reach <root>::e or <root>::t0::e" here?
          t0t0e_0:
            value: t0::e::hello_t0t0e
          t0t0e_1:
            value: e::hello_t0t0e

@generalmimon
Copy link
Member

generalmimon commented Apr 11, 2023

@wader The method to resolve a "normal" type reference (no nested types yet) used to be described in the (now removed) old KSY reference documentation page, so this can maybe help you too: https://github.com/kaitai-io/kaitai_struct_doc/blob/88c32183b26e4d2265bfecdce0a3dfeaea6975ff/ksy_reference.adoc#type

The type path to refer to a "nested type" should probably follow intuitively - the important thing is to know in which type the path starts (which should be the same method as for normal type refs), and then you look directly into the types section and expect a type named as the next segment of the path there.

PS: I don't recommend relying on the linked KSY reference document too much, because it was removed exactly because it was largely out of date. But this section is probably solid and I don't think the behavior has changed since then.

@wader
Copy link
Author

wader commented Apr 11, 2023

@generalmimon Thanks! that was exactly what i was looking for, and will keep in mind to not trust it too much :) to be clear, by "probably follow intuitively" you mean it should be quite straight forward to extend the described algorithm to support paths?

In the 3rd step when it says

  1. If that fails too, it goes one level up in the hierarchy of nested types and tries to resolve it there.

Should one only look one level up or continue for each parent down to the root? when i tested both type and enums seems to go all the way to the root:

meta:
  id: enum_to_i
  endian: le
seq:
  - id: s
    type: a
enums:
  e:
    0x50: name
types:
  b:
    seq:
      - id: s
        type: u1
  a:
    types:
      a:
        types:
          a:
            seq:
              - id: s0
                type: b
              - id: s1
                type: u1
                enum: e

webide result:

{
  "s": {
    "s0": {
      "s": 80
    },
    "s1": { "name": "NAME", "value": 75 }
  }
}

@generalmimon
Copy link
Member

generalmimon commented Apr 11, 2023

@wader:

In the 3rd step when it says

  1. If that fails too, it goes one level up in the hierarchy of nested types and tries to resolve it there.

Should one only look one level up or continue for each parent down to the root? when i tested both type and enums seems to go all the way to the root:

Yes, step 3 is definitely meant to be recursive (i.e. step 3 is actually "go back to step 1 and follow the process again, but now the current type is the type one level up in the hierarchy).

@generalmimon
Copy link
Member

generalmimon commented Apr 11, 2023

@wader:

to be clear, by "probably follow intuitively" you mean it should be quite straight forward to extend the described algorithm to support paths?

Yes, only the first component of the type path has the special handling with fallbacks (it's the same handling as described in the KSY reference, i.e. the actual type resolved as foo in type: foo and type: foo::{...} is always the same). All subsequent components of the path should have no fallbacks.

I'm not sure whether examples in this case are not more confusing than helpful, but I'll try anyway.

meta:
  id: nested_types_paths
seq:
  - id: ab
    type: a::b
types:
  a:
    seq: []
    types:
      b:
        seq:
          - id: obj_cd
            type: c::d
        types: {}

  c:
    seq: []
    types:
      d:
        seq:
          - id: d_attr
            type: s1

Notice the obj_cd attribute in the /types/a/types/b type. It references a type c::d. Finding what specific type c refers to is non-trivial (and it's using the same algorithm as for plain type: c). First, you need to

  1. check /types/a/types/b/types, only to find that no c is defined there (in this case);
  2. check if the current type (still /types/a/types/b) has the name c. It doesn't, so we're now moving to the type one level up in the nested types hierarchy (/types/a).

No luck finding the c type under /types/a/types/b/ => the new "current type" is /types/a.

  1. check /types/a/types - nope, no c there;
  2. check if the current type (/types/a) is called c: it's not. We're moving to the top-level type, i.e. the YAML path / (in KS language, it can be referred to using the name in /meta/id, i.e. nested_types_paths).

Now searching in /:

  1. check /types - yes, we've finally found the definition of c in /types/c.

Now when you know what type: c corresponds to, type: c::d is easy. Given that type: c has been resolved as /types/c, the only valid location for type: c::d is /types/c/types/d. If d is not there, game over - the spec is invalid.

@generalmimon
Copy link
Member

generalmimon commented Apr 11, 2023

@wader Also I'd note two more things you might not know:

  1. As I already mentioned, the name of the top-level type is in /meta/id and this name can be used to explicitly refer to it if needed (this is especially useful for recursive formats, see for example renderware_binary_stream.ksy). If I borrow a bit from the previous example, I can also do this:

    meta:
      id: nested_types_paths
    seq:
      - id: obj_a
        type: a
    types:
      a:
        seq:
          - id: obj_cd
            type: nested_types_paths::c::d
    
      c:
        seq: []
        types:
          d:
            seq:
              - id: d_attr
                type: s1

    The algorithm works the same, just now the rule of checking the name of the current type kicks in in the top-level type and passes.

  2. I noticed you tried using absolute paths:

      e0_hello_1:
        value: ::t0::e0::hello

    I'd rather consider these not supported. We have zero tests for this and I'm not even sure how they're supposed to work (see also User type resolution broken when using paths #786 (comment)).

@wader
Copy link
Author

wader commented Apr 11, 2023

@generalmimon Thanks! much appreciated with nice example. Don't think i would gotten all those details right by testing things.

Another thing: Is the types and enums namespace shared for a type? I tried having a type and enum with the same name and the type seems to shadow/override the enum in webide. When i tried with ksdump the generated ruby fails with:

meta:
  id: type_enum_collision
seq:
  - id: s0
    type: e

types:
  e:
    seq:
      - id: a
        type: u1
        enum: e

enums:
  e:
    0: name
type_enum_collision.rb:29:in `<class:TypeEnumCollision>': E is not a class (TypeError)
ype_enum_collision.rb:12: previous definition of E was here

Hope it's ok with all questions. Maybe some of it could be used to clarify documentation?

If you want to follow the progress in fq the issue is wader/fq#627

No code has been pushed yet but current progress is that i have a quite complete expression language parser/evaluator that passes most of the tests i've found, somewhat working ksy schema parser, kaitai style "bit-endian" integer kind-of works. The biggest mess currently it the code to glue it all together with the fq decode "DSL", and that also depend quite a lot on how types are resolved etc. But it's looking promising i think, some days ago mp4.ksy and png.ksy produced a somewhat good looking decode tree :)

@wader
Copy link
Author

wader commented Apr 11, 2023

@wader Also I'd note two more things you might not know:

  1. As I already mentioned, the name of the top-level type is in /meta/id and this name can be used to explicitly refer to it if needed (this is especially useful for recursive formats, see for example

Ah good point, thanks.

  1. I noticed you tried using absolute paths:

Ok! I just happen to notice it parsed ok so i tried :) In my expression parser it's parser error at the moment i think 👍

@generalmimon
Copy link
Member

generalmimon commented Apr 11, 2023

@wader:

Is the types and enums namespace shared for a type?

Good question :) I don't think we have ever put much thought in this. I think from the perspective of the KS compiler, they don't have to share namespace, because it's always clear from context whether you're referring to an enum or a type.

But in reality (given that KS works by generating source code), I guess the answer for most targets is that they do share namespace, and if you define an enum and a type with the same name on the same level, you'll either get an error or the second element shadows the first. Enums and types are usually implemented using same concepts (often a "class" in languages with OOP support), and you obviously can't have two classes named the same next to each other.

Off the top of my head, I think "they share namespace" is the real answer for all targets at the moment. So the compiler should probably cross-check the names in types and enums too to catch this collision already at compile time - that would be a good thing to add.


The situation is not always that clear - C# is notorious for sharing even the namespace of attributes with subtypes, see #86. This is very annoying and many .ksy specs have problems with this when compiled to C#, because quite often you want to have an attribute id: header and type: header (because what special name would you invent for the header type, right?) - this works in all languages except C#.

Last but not least, we also unwillingly share namespace with all reserved words in each target language (#90), and the KS compiler currently ignores this too. But the main issue is to collect the list of problematic identifiers for each target (which will be not only reserved words but also probably library classes that the generated code uses) to be able to detect the collisions and then figure out a rule (ideally recommended by the language "code style" guidelines) what to do about them. And all potentially problematic words need to be tested.

@wader
Copy link
Author

wader commented Apr 14, 2023

Thanks again, will help a lot! Should we keep this issue open or close it? guess i will probably run into more questions around types but could maybe open new issues if this was specific to resolving.

@generalmimon
Copy link
Member

Thanks again, will help a lot! Should we keep this issue open or close it?

I think this can be closed - you can open new issues for further questions.

@generalmimon
Copy link
Member

@wader For the sake of completeness of this issue, I've nearly forgotten that there are in fact various verbose modes of KSC available via the --verbose option which allow you to see how some parts of KSC operate, see Log.scala:43-57. For detailed info from the component responsible for resolving types, there is --verbose type_resolve (or you can use --verbose all which will log everything):

$ kaitai-struct-compiler-0.10 -- --verbose type_resolve -t java nested_types_paths.ksy
resolveUserType: at List(nested_types_paths) doing a|b
resolveUserType: at List(nested_types_paths, a) doing b
    => nested_types_paths::a::b
    => nested_types_paths::a::b
resolveUserType: at List(nested_types_paths, a, b) doing c|d
resolveUserType: at List(nested_types_paths, a) doing c|d
resolveUserType: at List(nested_types_paths) doing c|d
resolveUserType: at List(nested_types_paths, c) doing d
    => nested_types_paths::c::d
    => nested_types_paths::c::d
    => nested_types_paths::c::d
    => nested_types_paths::c::d

@wader
Copy link
Author

wader commented Apr 23, 2023

@generalmimon Thanks that is good know. Been away traveling this weekend but back now and hope i get some time this week to work on it.

@generalmimon
Copy link
Member

generalmimon commented Apr 23, 2023

Hmm, seeing the verbose output made me worried that when resolving the type path c::d, the compiler uses the same complex algorithm to resolve d starting from c (and not just a simple lookup whether there is a direct subtype called d in /types/c/types/d):

resolveUserType: at List(nested_types_paths, a, b) doing c|d
resolveUserType: at List(nested_types_paths, a) doing c|d
resolveUserType: at List(nested_types_paths) doing c|d
resolveUserType: at List(nested_types_paths, c) doing d

And my worries were unfortunately confirmed - this is "happily" compiled by KSC, even though the c::d type reference should be flagged as invalid (since c has no subtype d):

meta:
  id: nested_types_paths
seq:
  - id: obj_a
    type: a
types:
  a:
    seq:
      - id: obj_cd
        type: nested_types_paths::c::d

  c:
    seq: []
    # types:
    #   d:
    #     seq:
    #       - id: d_attr
    #         type: s1
  d:
    seq:
      - id: d_attr
        type: s1

Parsers generated from this spec don't seem to work in any language (which is not surprising - the spec above shouldn't be allowed to compile in the first place). In JavaScript (via the Web IDE) an error is thrown:

Parse error (TypeError): NestedTypesPaths.C.D is not a constructor
Call stack: TypeError: NestedTypesPaths.C.D is not a constructor
    at NestedTypesPaths.A.A._read (eval at initCode (https://ide.kaitai.io/js/v1/kaitaiWorker.js:89:9), <anonymous>:37:20)
    at NestedTypesPaths._read (eval at initCode (https://ide.kaitai.io/js/v1/kaitaiWorker.js:89:9), <anonymous>:23:15)
    at reparse (https://ide.kaitai.io/js/v1/kaitaiWorker.js:96:17)
    at myself.onmessage (https://ide.kaitai.io/js/v1/kaitaiWorker.js:117:47)

and the generated Java code doesn't compile:

$ shopt -s globstar

$ javac -encoding UTF-8 /c/temp/kaitai_struct/runtime/java/**/*.java NestedTypesPaths.java
NestedTypesPaths.java:52: error: cannot find symbol
        private NestedTypesPaths.C.D objCd;
                                  ^
  symbol:   class D
  location: class C
NestedTypesPaths.java:55: error: cannot find symbol
        public NestedTypesPaths.C.D objCd() { return objCd; }
                                 ^
  symbol:   class D
  location: class C
NestedTypesPaths.java:50: error: cannot find symbol
            this.objCd = new NestedTypesPaths.C.D(this._io);
                                               ^
  symbol:   class D
  location: class C
3 errors

Which all confirms that it shouldn't be accepted, and the fact that KSC does is a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants