-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalizing Condition
/ ConditionSet
#147
Comments
This was my email on the subject
This will clearly work -- lots of systems are implemented similar to this. I'm still not sure if the positives outweigh the negatives. Behdad believes it will reduce sharing; I think that while that is true in an abstract sense that representing things this way will still be more compact, even if there sharing opportunities. |
Deprecated or not, the behavior when combined with |
The behavior is well-defined. The |
By deprecate I just mean that in new tables, instead of using a |
If we go the direction of this condition tree, the proposal on the mailing list was to:
So if someone wants to do a hybrid thing, such as AND two tree-conditions in the place where a condition set is still used, they're free to do so and the meaning is well-understood. |
One reason I don't like this design is that it needs a custom compiler, whereas in the other design the existing table packers do the job. Also we would have to define a stack limit. |
Is there really a substantial difference between the depth of a postfix operator expression and the depth of an offset-based tree structure? We don't seem to care about limits on the latter ... |
Fair. In actual implementation we'll probably do a work-counter in either case. |
I’m probably missing something, then: how is |
|
Functionally, a |
I guess the mention of deprecation threw me off, then. So this is proposing additional formats for use within a |
Each of these would be a new condition format, so instead of the current working draft proposal of format 1 (the current), format 2 (its negation), format 3 (condition value), and format 4 (negated condition value), we would have
|
It will be obviated by the
Right. And in new places we need conditions, like the |
My 2c:
If it's a fine distinction between the two ideas, as I think it is, I would prefer the one which feels cleaner - explicit subtables. |
Another issue with using magical offset values is that old code will misparse the data. |
Offset 0 is at least somewhat magical in some contexts. I thought I recalled one or two other cases with non-zero offsets not treated as offsets but I'm not recalling where at present. I should note that there are two ideas being conflated here. One is using a list to represent the expression rather than a tree structure for the operators. The other is re-using the existing condition-set, which has no versioning, for that purpose. I'm only suggesting the former, largely on the grounds of compactness and, in some senses but not others, simplicity. Along the lines of what Behdad has suggested for the separate operators, we could just have a new versioned condition type that contains the list and treats those offsets as special, and then there are no versioning issues with the existing condition set. |
I suppose the argument here is going to boil down to "This is like COLR v1 so it's better from a simplicity standpoint". That will probably win the day, but I don't personally think COLR v1 is a good example we should be emulating. "Let's have dozens of tiny subtables pointing at each other" is a huge pain in the ass when it comes to some things like dynamic subsetting, and means that everyone's code has to protect against cycles (another difference between these two proposals). It's more general but less compact than some alternative, which is strange coming from Google-aligned contexts that otherwise advocate for space savings above all other considerations as if that was just an obvious conclusion. It's there, we'll live with it, but I'm not eager to reproduce that approach in other contexts. |
Cycles are not possible because offsets are always positive. Note how negating an entire condition expression can be done with just a few bytes with table-sharing, which is not possible with the stack-machine. |
If we were to go the stack-machine route, I suggest using:
instead of magic offset numbers. |
Also, it can become prefix, to allow short-circuiting as Simon suggested. And the 24bits for the AND/OR formats can be used to encode number of operands. |
I just chose postfix because it tends to be more familiar. If there are engineering reasons to do prefix it should be prefix. |
This RPN idea is interesting, but I agree with @simoncozens about the additional cognitive load. The small offset coding, the stack itself, and the fact that AND and OR require strictly 2 operands, all feel novel and not in a good way. Smaller size may be true for random condition formulae, on average, but it isn’t always true: for example an OR of 10 conditions, which I think is a reasonable use case (e.g. simpler forms being used in multiple corners of the design space). |
Not quite. One still has to parse and skip over short-circuited items, unless we're at the top-level. |
Based on the original desire (negating an entire ConditionSet) I suggest we stick to my design, which allows negation with 5 bytes. In Skef's proposal it would be 4 bytes plus a copy of the ConditionSet to be negated. So I don't understand why it's desirable. |
Alright, consider my concerns addressed. |
With more Condition formats. harfbuzz/boring-expansion-spec@df7dcc4 harfbuzz/boring-expansion-spec#147
With more Condition formats. harfbuzz/boring-expansion-spec@df7dcc4 harfbuzz/boring-expansion-spec#147
Okay, thanks. I've gone ahead and updated |
@behdad could we just do away with empty ConditionAnd and ConditionOr tables? I can't think of any function they would serve and what I'm seeing in the proposal about them is counter-intuitive. |
It's just a clarification. From the definitions it follows what the empty ones should do. I just don't want anyone to implement them otherwise. But I'm fine removing those. |
I'm confused about empty ||s being false. If anything I would have expected empty &&s to be false. Maybe this is some degenerative logic thing I've forgotten about. Oh, I guess I see it based on the particular language you've chosen. Still, I'd rather just prohibit it. It doesn't help anything. |
I'm just not a fan of prohibiting. But am fine if others want it. In our implementation we'll happily accept them and there's no special-casing needed. That said, I would hate it if ots started rejecting them. A subsetter should be smart enough to not generate those empty rules. I just don't see what the prohibition offers. |
I don't like the old-style-Google-interview question aspect of making an empty OR false. Very elegant, also a subtle detail inviting problems. "If they had read the spec carefully they would have properly understood this elegant and completely useless aspect of it, so it's their fault." |
I would object less if all empty lists were treated as true, but I presume the objection is that it goes against the language. |
An empty OR cannot be true, because there does NOT exist any true value in its list. |
Did you read my messages? |
Anyway, an empty OR can absolutely be treated as true by stipulation. "it offends my sensibilities" != "cannot". |
Anyway, if we're doing this formal correctness in the face of questionable data thing you're going to need a note in the condition section about offset 0 being true. Can't just note that at the "edges". |
(OK, backing up once again, I would have to think though the situations likely to lead to empty tables to determine what the right guidance for empty tables should be, or if there even is a place for consistent guidance. For example, one might be starting from some existing boolean expression, stripping terms from it, and either not noticing that one wound up with an empty term or not knowing what to do about that. That could happen when subsetting, for example. The desirable outcome would be having the degenerative result retain consistency among potentially distributed boolean expressions, so e.g. you don't wind up with two or zero of some element type when there should be one. Maybe that winds up being equivalent to following the formal definitions. Maybe it's the opposite. Maybe there is no consistent way of achieving that result. I can't think these things through that fast.) |
That is my understanding. Here's some Python code to back my opinion: from operator import and_, or_
from functools import reduce If you do: conditions = []
print(reduce(and_, conditions))
print(reduce(or_, conditions)) you get an exception:
We can add an initial value to reduce, to make the empty container work. The correct initial value, for normal cases to work, is conditions = []
print(reduce(and_, conditions, True))
print(reduce(or_, conditions, False)) If you do this, you will see that it prints True, then False. This is not arbitrary IMO. |
I have to write updated language for Liam right now so I can't think about this tonight. |
in formal logic (first order predicate calculus) which is the basis for and/or/not, A disjunction (or) over C is true if there is some c in C that is true. If there is no c in C (because C is empty), then the condition is not satisfied and the or is false. I can write text like that in the draft if it helps. |
I've studied formal logic. It's also true in formal logic that a contradiction can be used to prove anything. And "but" means "and". These don't seem relevant to the present domain, the question is whether what we're discussing is. We should be basing our spec on what is likely to yield the best results in practical situations. I'm 60% convinced that the formal definitions will yield that result having now thought about it for maybe an hour, but I'll note that we seem to be exploring that because I pointed it out. Before that we were arguing about language. |
I didn’t mean to imply that i thought you were unfamiliar with logic (although i didn't write my comment very well, sorry), but rather to explore more formal wording. |
Another wrinkle: If I were facing the subsetting problem I posed earlier I would probably avoid reducing any elements of an expression and simply replace elements with "true" for "false" depending on what I was eliminating. That arguably calls for clean expressions of those, which perhaps should be documented. I have suggested moving the idea that a 0 offset is equivalent to truth into the condition section. So a negation with a zero offset could be "false". Maybe that's sufficient, especially given that conditions can be shared. Since 0 is a magic value we also have the option of making offset 1 be false. But I understand that many people consider 0 to be more validly magical than other values, hence "NULL" in C and such. |
Regardless, I think there should be some documented way of expressing true and false for these reasons that doesn't involve cobbling up a particular format 1 or 2 condition that happens to have the right property. |
I think we can and should stipulate what the default fallback for combinatorial booleans is, and I'm going to argue that it should be False. For consistency, I agree with Skef that "Empty-and is false and empty-or is true" is just too cute. The empty list is inherently falsey; a NULL offset signifies a lack of something. If you want an explicit "always true" value, I don't think there's a need for a magical I think also for semantic reasons, having false as the fallback for these things makes sense. Think of Feature Variation tables - they are exceptions to the rule, the conditions when something changes from the norm. When does the dollar glyph switch to a simpler form? You have to tell me when. If you don't tell me when, it doesn't. "It changes when |
Another note: I mentioned the problem of expression cycles in this model, especially given the expressed (although not documented) goal of sharing. Behdad argued that that's dealt with by offset direction restrictions. I saw nothing about that in the draft. |
Do we need to put anything in the spec about cycles? Offsets are already typed as unsigned. I don't see a need to explicitly point out all the things that you just logically can't do - in a commentary perhaps, but not in a spec. |
Well, that's a good point. Sorry, I just hadn't absorbed the implications of that earlier comment. |
With more Condition formats. harfbuzz/boring-expansion-spec@df7dcc4 harfbuzz/boring-expansion-spec#147
Now that
VARC
table has conditional components, it's desirable to be able to negate an entireConditionSet
. My proposal is to makeConditionSet
a legacy thing, and addCondition
s that perform AND, OR, and NEGATE bool operators. This is a natural design in style of GSUB/GPOS, and allows for automatic byte-sharing. To quote from my email to the MPEG list a few days ago:@skef has other ideas. I let him explain himself.
The text was updated successfully, but these errors were encountered: