Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on canonical representation for Tag_RISCV_arch #203

Open
cmuellner opened this issue Jul 22, 2021 · 6 comments
Open

Clarification on canonical representation for Tag_RISCV_arch #203

cmuellner opened this issue Jul 22, 2021 · 6 comments

Comments

@cmuellner
Copy link
Collaborator

The attribute Tag_RISCV_arch is defined as a valid argument for -march to describe the ISA extension as defined in the ISA extension naming convention, which is part of the RISC-V unpriv specification.

I propose to either require a canonical representation to be stored there (I am aware that there is no defined canonical form atm),
or to add a comment to not expect a canonical representation in that string (i.e. a user of the field needs to parse the string before further processing because two different strings might be semantically identical).

Related issues (with some examples):
riscv-non-isa/riscv-toolchain-conventions#11
riscv/riscv-isa-manual#670

@jrtc27
Copy link
Collaborator

jrtc27 commented Jul 22, 2021

We already require some amount of canonicalisation:

Note that the version information for target architecture must be presented explicitly in the attribute and abbreviations must be expanded. The version information, if not given by -march, must agree with the default specified by the tool. For example, the architecture RV32I has to be recorded in the attribute as RV32I2P0 in which 2P0 stands for the default version of its based ISA. On the other hand, the architecture RV32G has to be presented as RV32I2P0_M2P0_A2P0_F2P0_D2P0 in which the abbreviation G is expanded to the IMAFD combination with default versions of the standard extensions.

I guess your issue is with a literal strcmp vs having to split on _ and do set comparisons? Is there a situation where a strcmp is actually useful though? I'd expect some amount of parsing to be required to work out if it's a subset of the current environment (in the loader case) or if the union of two strings is a valid arch string (in the static linker case to detect conflicting extensions).

@jrtc27
Copy link
Collaborator

jrtc27 commented Jul 22, 2021

i.e. the only thing we don't currently impose is canonicalisation on the order of the non-single-letter extensions (i.e. nothing beyond what the ISA unprivileged spec says)

@kito-cheng
Copy link
Collaborator

kito-cheng commented Jul 22, 2021

ISA spec isn't clear enough for following case:

rv32iv vs rv32iv_zvlsseg
rv32if_zfh vs rv32if_zfh_zfhmin
rv32i_zkn vs rv32i_zbkb_zbkc_zbkx_zknd_zknh

In theory they are equal on left and right side, but ... what is one canonical order for them? if both are canonical order, we must write down more detail rule there.

I prefer expanded and most verbose form (right side), it's less ambiguous during look-up some extensions are enabled or not, for example, the ISA string split into a list by _, and we want to check vector segment load store, if using rv32iv, we must check v or zvlsseg is there or not, but is we always expand to right side form, just check zvlsseg is enough, and that prevent the extension implication knowledge (e.g. v implied/included zvlsseg) need to appear everywhere.

Let see what happen on ISA spec side :p

@cmuellner
Copy link
Collaborator Author

Set comparisons only work, if extensions are composed within a trivial hierarchy.
That is not guaranteed by the ISA spec.
E.g. the proposal for Zce includes the following logic:

  • Zce implies Zceb only if D is not available.
  • C.MUL of Zcea is only available if M or Zmmul is available

I don't mind requiring tools to parse the string, but I would like to see a note so that people are aware of that.

And if a canonical form is not required, then we could reconsider the following requirement:
version information for target architecture must be presented explicitly in the attribute and abbreviations must be expanded.
Versions can default to the ratified version. Expansions are not required (as strcmp would not work anyway).

@jrtc27
Copy link
Collaborator

jrtc27 commented Jul 22, 2021

That requirement exists so that kernels and loaders don't all need to embed a full parser of the arch string, which was deemed a non-starter (and I would entirely agree with that, the rules are really complicated).

The Zce proposal sounds horrendous to me, that is a violation of sensible principles for extensions. Having rv32idzce not be a superset of rv32izce is all kinds of "god no don't do that". Extensions are extensions, period. Zceb must therefore be an explicit extension that's incompatible with D from my perspective.

The Zcea situation is completely fine. Any binary using C.MUL will have Zcea and either M or Zmmul in its attributes, so the requirements are entirely captured with no implication logic needed. If the hart has Zmmul (or full-blown M) then the binary will work. If the hart doesn't then the loader will see that and the binary won't be loaded.

@asb
Copy link
Collaborator

asb commented Aug 1, 2022

I feel the Tag_RISCV_arch is underspecified without additional guidance on what caonicalisation of the string is required when one extension implies another, and it would be worth seeking to address this in someway ahead of 1.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants