-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
style_guide ideas by KOLANICH #7
base: master
Are you sure you want to change the base?
Conversation
Uhh, there's a lot of stuff at once, and it's actually hard to cherry-pick them, as they're a single monolithic commit. I'll try to comment in-line. |
ksy_style_guide.adoc
Outdated
@@ -249,28 +248,36 @@ guesswork. | |||
|
|||
* For simple non-repeated fields, use a simple singular form — | |||
i.e. `width`, `header`, `transaction_id`, `file`. | |||
* For repeated files (i.e. with `repeat: something`), use plular form | |||
* For a sequence (i.e. with `repeat: something`), use plular form |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
* Repeated fileds which cannot be packed into a sequence should | ||
have `id`s containing a number in the end. Numbers may have a | ||
visually-obvious structure, like "the first digit is major, the second | ||
one is minor". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pretty vague and can be interpreted in multiple ways. Probably an example would help. Right now, it's not clear (1) whether foo_1
or foo1
fit this, (2) how one starts the numbering and how it should progress, (3) what is "visually-obvious structure".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 I use foo0
2,3 at KSY author's disposal. There are differrent situations and we want to be flexible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Um, yet again, the whole point of having a standard is not to be flexible, but to be rigid and provide no-brains-needed solutions, which are also easy to code into machine checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the whole point of having a standard is not to be flexible, but to be rigid and provide no-brains-needed solutions, which are also easy to code into machine checks.
If we wanna some checks, IMHO it's better to extend KS language instead of introducing a convention making the names ugly.
These numbers are not for checks, but for developer's convenience.
"magic values"), use `magic` name, or, if there are several of them, | ||
`magic1`, `magic2`, etc. | ||
"magic values"), use `signature` or `magic` name, or, if there are | ||
several of them, like `signature` or `magic0`, `magic1`, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so, basically:
- You specify "
signature
ormagic
", and don't specify any way to choose between these two. Given that its essentially the same stuff, I see no point in separation this into two names, and adding headache for ksy developers to choose between two proposed alternatives. - You want every numbered thing to start counting from 0. Any reasons to do so, i.e. any major examples of someone doing that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 at KSY author's disposal. I personally prefer "signature" because it clearly states that this field is used for identification.
2 Just taste. If we start arrays from index 0 in the most of languages why not to follow the same convention here?
ksy_style_guide.adoc
Outdated
particular, number of repetitions of some other structure), use either | ||
`_count` suffix or `count_of_` (what sounds better in your opinion) | ||
prefix and a plular form — i.e. `count_of_questions`, `blocks_count`, | ||
`nodes_count` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I don't think it's a good idea. The standard must be clear and avoid ambiguilities whenever it's possible. There is no point to make ksy developer to choose between several possible alternatives, especially using very subjective means (i.e. "sounds better"). The whole point of having a hard standard is to make it possible to use automated machinery to process stuff, and it's very hard to program a converter from num_*
to *_count
or count_of_*
.
Also, both sound kinda quirky. Stuff like "blocks count", "nodes count" is plain wrong from English point of view. It should be "block count", "node count", etc, see this SE entry for examples. "Count of X" sounds like a title of some person for me (i.e. count of Champagne, count of Monte Cristo, count of Barcelona).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no point to make ksy developer to choose between several possible alternatives, especially using very subjective means (i.e. "sounds better").
No point, but I guess we want to have readable names.
The whole point of having a hard standard is to make it possible to use automated machinery to process stuff, and it's very hard to program a converter from num_* to count or count_of.
2 regexes with 1 capturing group each + one logical operation to recognize and parse such kind of names.
Also, both sound kinda quirky. Stuff like "blocks count", "nodes count" is plain wrong from English point of view. It should be "block count", "node count", etc, see this SE entry for examples.
You are right. But you can also think about this as about count
property of blocks
with a _
instead of .
, since .
is not allowed.
"Count of X" sounds like a title of some person for me (i.e. count of Champagne, count of Monte Cristo, count of Barcelona).
IMHO it's completely OK since there isn't many counts, dukes and baronets in variable names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're kind of contradicting yourself:
No point, but I guess we want to have readable names.
But you can also think about this as ...
Either you want "normal English" names, then you can't have "blocks count" and "count of blocks". If you allow some special naming, then "num_blocks" is about as readable as your versions. Hungarian apps notation recommends "nBlocks" for these purposes, and even that is understandable by majority of people.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I especially dislike num
prefix because of its redundancy and ambiguity. It says that the field is number, but we see that it is a number from the type. It says nothing about the purpose of that number. cnt
should be better here. And again, I'm against prefixes and unreadable tokens.
Hungarian notation
is evil.
entry), `len_blocks` (total length of whole `blocks` array, made of | ||
(in bytes or some other fixed units), use `_size` suffix and name of | ||
that data structure — i.e. `block_size` (length of a single `block` | ||
entry), `blocks_size` (total length of whole `blocks` array, made of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, it boils down to:
- Prefix vs suffix
- Long of short spelling
In my opinion, "prefix" is better and definitely more widespread, i.e. as in majority of languages that use prefixes to differentiate parameters vs instance members, or Apps variety of Hungarian notation. Prefix does not try to sound like "proper English", and it avoids lots of ambiguilities and spelling problems (like that "count" stuff I've demonstrated above). For example, there are tricky nouns in English (like "water" or "money" or "news" or "advice" or "hair"), which make proper English phrasing difficult.
Last, but not least, I definitely prefer shorter form of spelling. Given that we have to make "one size fits all or most" identifiers, we're not going to have full-length Java-style identifiers like "NormalBucketDistributionProgressionOfLongIntegerConstantsForHadoopNodeFactory" — (1) there are languages that have certain ID length limits, (2) a vast majority people I've interviewed do not like such long identifiers very much, (3) it's actually hard to work with them without a IDE with expression autocompletion and stuff — and we don't have one (and I'm not very keen about having IDE as the only way to work with the language).
So, all and all, clear 3 character prefix looks like a good trade-off for me — it's non-ambiguous, it makes it very easy to detect whatever this attribute is about by always looking at the beginning, and it fits concise C-like style well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, "prefix" is better and definitely more widespread, i.e. as in majority of languages that use prefixes to differentiate parameters vs instance members, or Apps variety of Hungarian notation.
It doesn't mean that we should do this too.
The good part in postfix is that since we read left-to-right all the properties of the same member look similar to each other. Especially when aligned. But with prefix they look too differrent. Also take in mind the argument with the point.
(1) there are languages that have certain ID length limits,
Valid.
(2) a vast majority people I've interviewed do not like such long identifiers very much
Neither do I. But In the case of short enough identifiers, which is the majority of them, I prefer to have them readable. It is only a guide, not a strict requirement.
(3) it's actually hard to work with them without a IDE with expression autocompletion and stuff
Valid. BTW some text editors, like Notepad++, have autocompletion which autocompletes any tokens.
`block` entries). | ||
* Fields of unknown/undetermined purpose, i.e. unfinished reverse | ||
engineering work SHOULD either NOT TO HAVE `id` or HAVE an | ||
`id` matching the `/unkn(?:own)?(_\w+)?\d*/` regular expression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is, again, not how style guide should be written. It's pointless to leave so much choices to do for ksy developer. It's better to decide on this once and for all, and it will be easy for everyone — to remember, to follow, to understand, to write automated tools for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 Having an id
for a property of unknown meaning is useful when doing reverse-engineering, especially if you have more than one such a property in the same struct.
2 But if a developer doesn't need that id, why to force him to have it?
* In the case of multiple enums with the same name it's usually | ||
better to append the integer value rather than a sequence | ||
number to its `id`. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some problems understanding what you've meant here. Probably an example should help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good idea. But we should describe it in more detail, with an example, and think of some other choices to be made here. For example, should it be unknown31
or unknown_31
? unknown_31
or unknown_1f
or unknown0x1f
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, should it be unknown31 or unknown_31? unknown_31 or unknown_1f or unknown0x1f?
No 0x
, 0o
and 0b
prefixes, no underscore, the number part is the same as in the left part.
I will carefully review this PR and attempt to solve any existing issues. |
Co-Authored-By: Arkadiusz Bulski <[email protected]>
1834ea9
to
00ed64a
Compare
In the wake of kaitai-io/kaitai_struct_formats#68