Generalizing `Condition` / `ConditionSet` #147

behdad · 2024-04-18T21:12:14Z

Now that VARC table has conditional components, it's desirable to be able to negate an entire ConditionSet. My proposal is to make ConditionSet a legacy thing, and add Conditions that perform AND, OR, and NEGATE bool operators. This is a natural design in style of GSUB/GPOS, and allows for automatic byte-sharing. To quote from my email to the MPEG list a few days ago:

struct ConditionAnd
{
  uint16 format; // 2
  uint8 conditionCount; // Number of conditions for this conjunction expression.
  Offset24To<Condition> conditionOffsets[conditionCount];
}

struct ConditionOr
{
  uint16 format; // 3
  uint8 conditionCount; // Number of conditions for this disjunction expression.
  Offset24To<Condition> conditionOffsets[conditionCount];
}

struct ConditionNegate
{
  uint16 format; // 4
  Offset24To<Condition> condition;
}

@skef has other ideas. I let him explain himself.

The text was updated successfully, but these errors were encountered:

behdad · 2024-04-18T21:13:03Z

cc @nedley @PeterCon

skef · 2024-04-18T22:29:13Z

This was my email on the subject

I wondered if you had considered an option that is more
"postfix-y" and therefore more compact. For example, consider:

(~A & B & C) | ((D & ~E) | F)

where the letters are, in effect, offsets to conditions. That could be encoded with
something more like the existing condition set as:

A~B&C&DE~&F||

where ~ is offset==1, | is offset==2 and & is offset==3 (or whatever). Basically
just fold the parse tree structure into an array. I'm not absolutely sure of it, but
I think that would mean that the "operators" have to be unary or binary -- that
you can't do "A~BC&" with the "&" collecting the previous entries up because the
format as a whole would be ambiguous.

Anyway, I'm not sure if the positives outweigh the negatives but I think it's
worth either switching to something like this or coming up with an argument
against it in case someone else brings it up.

This will clearly work -- lots of systems are implemented similar to this. I'm still not sure if the positives outweigh the negatives. Behdad believes it will reduce sharing; I think that while that is true in an abstract sense that representing things this way will still be more compact, even if there sharing opportunities.

nedley · 2024-04-18T22:55:50Z

Deprecated or not, the behavior when combined with ConditionSet should be defined.

behdad · 2024-04-18T22:57:59Z

Deprecated or not, the behavior when combined with ConditionSet should be defined.

The behavior is well-defined. The ConditionSet is AND of conditions, and will remain so.

behdad · 2024-04-18T22:58:28Z

By deprecate I just mean that in new tables, instead of using a ConditionSet, we can use just a Condition.

skef · 2024-04-18T23:01:54Z

If we go the direction of this condition tree, the proposal on the mailing list was to:

Update the VARC condition spec and the (new, so easily alterable) LookupVarriation record to just point to a (possibly tree-) condition, and then
Leave the ConditionSet in the FeatureVariation record with the presumption that eventually it may just be used to point to a tree-condition.

So if someone wants to do a hybrid thing, such as AND two tree-conditions in the place where a condition set is still used, they're free to do so and the meaning is well-understood.

behdad · 2024-04-18T23:04:58Z

I'm still not sure if the positives outweigh the negatives.

One reason I don't like this design is that it needs a custom compiler, whereas in the other design the existing table packers do the job.

Also we would have to define a stack limit.

skef · 2024-04-18T23:09:32Z

Also we would have to define a stack limit.

Is there really a substantial difference between the depth of a postfix operator expression and the depth of an offset-based tree structure? We don't seem to care about limits on the latter ...

behdad · 2024-04-18T23:11:54Z

Also we would have to define a stack limit.

Is there really a substantial difference between the depth of a postfix operator expression and the depth of an offset-based tree structure? We don't seem to care about limits on the latter ...

Fair. In actual implementation we'll probably do a work-counter in either case.

nedley · 2024-04-18T23:55:58Z

The behavior is well-defined. The ConditionSet is AND of conditions, and will remain so.

I’m probably missing something, then: how is ConditionAnd different from ConditionSet?

behdad · 2024-04-18T23:57:37Z

I’m probably missing something, then: how is ConditionAnd different from ConditionSet?

ConditionAnd has a format number at the beginning to differentiate it from other types. ConditionSet has no version/format and as such cannot be modified.

skef · 2024-04-19T00:00:57Z

Functionally, a ConditionSet doesn't identify itself or otherwise act like a condition, so it can't serve the job of ANDing the conditions in any other position in an expression tree besides the root.

nedley · 2024-04-19T00:06:10Z

I guess the mention of deprecation threw me off, then. So this is proposing additional formats for use within a ConditionSet of a FeatureVariationRecord?

skef · 2024-04-19T00:11:07Z

Each of these would be a new condition format, so instead of the current working draft proposal of format 1 (the current), format 2 (its negation), format 3 (condition value), and format 4 (negated condition value), we would have

the current, axis-range condition
condition value
NEG (with one "child" condition)
AND (with an arbitrary number of child conditions)
OR (with an arbitrary number of child conditions)

behdad · 2024-04-19T01:00:49Z

I guess the mention of deprecation threw me off, then.

It will be obviated by the ConditionAnd.

So this is proposing additional formats for use within a ConditionSet of a FeatureVariationRecord?

Right. And in new places we need conditions, like the VARC table.

simoncozens · 2024-04-19T20:36:54Z

My 2c:

Skef's "RPN calculator" approach is cute and of course would work. And I can see that as an extension of the ConditionSet table it is the most minimal change - we keep the idea of an array of conditions, and just sprinkle some magic elements inside the array - but it just doesn't sit well with me. Minimally intrusive, maximally compatible changes are good for when you're keeping things broadly the same, but if you're going to make them different, you should make use of the flexibility that a clean break allows.
What I mean by that is that magical anything always starts off seeming like a good idea but always comes back to bite you in the end. In this case, the use of small-integer values in offset fields to mean an operation instead of an offset is just asking for trouble; or more specifically, asking for special-casing behaviour in parsing that particular table that it would be better to avoid. I don't think we have any other magical offset values elsewhere in the spec? (I hope not.) My hope is that as OFF develops it gets more regular to parse, not less.
Within COLRv1 paint trees, we have the situation where "single-argument" paint tables (such as PaintGlyph and PaintSolid) have AND-like behaviour, in that they form a mask over their children. So Behdad's approach is not a dissimilar situation. Anything where there are obvious parallels between different parts of the spec helps to minimize the cognitive load; anything which adds new concepts increases cognitive load.
In the case of VARC and FeatureVariations, having a "breadth-first" tree structure instead of a "depth-first" RPN allows a renderer to potentially short-circuit the condition evaluation. As soon as I hit a subtable with a ConditionAnd and something's not true, we're done. So potentially there's a micro-optimization there.

If it's a fine distinction between the two ideas, as I think it is, I would prefer the one which feels cleaner - explicit subtables.

behdad · 2024-04-19T20:49:36Z

Another issue with using magical offset values is that old code will misparse the data.

skef · 2024-04-19T21:21:09Z

Offset 0 is at least somewhat magical in some contexts. I thought I recalled one or two other cases with non-zero offsets not treated as offsets but I'm not recalling where at present.

I should note that there are two ideas being conflated here. One is using a list to represent the expression rather than a tree structure for the operators. The other is re-using the existing condition-set, which has no versioning, for that purpose. I'm only suggesting the former, largely on the grounds of compactness and, in some senses but not others, simplicity. Along the lines of what Behdad has suggested for the separate operators, we could just have a new versioned condition type that contains the list and treats those offsets as special, and then there are no versioning issues with the existing condition set.

skef · 2024-04-19T21:29:04Z

I suppose the argument here is going to boil down to "This is like COLR v1 so it's better from a simplicity standpoint". That will probably win the day, but I don't personally think COLR v1 is a good example we should be emulating. "Let's have dozens of tiny subtables pointing at each other" is a huge pain in the ass when it comes to some things like dynamic subsetting, and means that everyone's code has to protect against cycles (another difference between these two proposals). It's more general but less compact than some alternative, which is strange coming from Google-aligned contexts that otherwise advocate for space savings above all other considerations as if that was just an obvious conclusion.

It's there, we'll live with it, but I'm not eager to reproduce that approach in other contexts.

behdad · 2024-04-19T21:31:17Z

I suppose the argument here is going to boil down to "This is like COLR v1 so it's better from a simplicity standpoint". That will probably win the day, but I don't personally think COLR v1 is a good example we should be emulating. "Let's have dozens of tiny subtables pointing at each other" is a huge pain in the ass when it comes to some things like dynamic subsetting, and means that everyone's code has to protect against cycles (another difference between these two proposals). It's more general but less compact than some alternative, which is strange coming from Google-aligned contexts that otherwise advocate for space savings above all other considerations as if that was just an obvious conclusion.

Cycles are not possible because offsets are always positive.

Note how negating an entire condition expression can be done with just a few bytes with table-sharing, which is not possible with the stack-machine.

behdad · 2024-04-19T21:39:44Z

If we were to go the stack-machine route, I suggest using:

struct {
  uint8 format;
  uint24 offset;
}

instead of magic offset numbers.

behdad · 2024-04-19T21:40:43Z

Also, it can become prefix, to allow short-circuiting as Simon suggested. And the 24bits for the AND/OR formats can be used to encode number of operands.

skef · 2024-04-20T01:02:04Z

I just chose postfix because it tends to be more familiar. If there are engineering reasons to do prefix it should be prefix.

Lorp · 2024-04-20T01:03:09Z

This RPN idea is interesting, but I agree with @simoncozens about the additional cognitive load. The small offset coding, the stack itself, and the fact that AND and OR require strictly 2 operands, all feel novel and not in a good way. Smaller size may be true for random condition formulae, on average, but it isn’t always true: for example an OR of 10 conditions, which I think is a reasonable use case (e.g. simpler forms being used in multiple corners of the design space).

behdad · 2024-04-21T18:57:33Z

Also, it can become prefix, to allow short-circuiting as Simon suggested.

Not quite. One still has to parse and skip over short-circuited items, unless we're at the top-level.

behdad · 2024-04-21T18:58:53Z

Based on the original desire (negating an entire ConditionSet) I suggest we stick to my design, which allows negation with 5 bytes. In Skef's proposal it would be 4 bytes plus a copy of the ConditionSet to be negated. So I don't understand why it's desirable.

skef · 2024-04-22T05:39:29Z

Alright, consider my concerns addressed.

behdad · 2024-04-22T16:19:55Z

Thanks Skef.

What's procedurally the next step? @davelab6 @liamquin

With more Condition formats. harfbuzz/boring-expansion-spec@df7dcc4 harfbuzz/boring-expansion-spec#147

behdad · 2024-04-22T20:53:50Z

Okay, thanks. I've gone ahead and updated VARC.md and fonttools & harfbuzz varc-table branches.

skef · 2024-04-23T02:26:55Z

@behdad could we just do away with empty ConditionAnd and ConditionOr tables? I can't think of any function they would serve and what I'm seeing in the proposal about them is counter-intuitive.

behdad · 2024-04-23T02:28:29Z

@behdad could we just do away with empty ConditionAnd and ConditionOr tables? I can't think of any function they would serve and what I'm seeing in the proposal about them is counter-intuitive.

It's just a clarification. From the definitions it follows what the empty ones should do. I just don't want anyone to implement them otherwise. But I'm fine removing those.

skef · 2024-04-23T02:38:09Z

I'm confused about empty ||s being false. If anything I would have expected empty &&s to be false. Maybe this is some degenerative logic thing I've forgotten about.

Oh, I guess I see it based on the particular language you've chosen. Still, I'd rather just prohibit it. It doesn't help anything.

behdad · 2024-04-23T02:40:51Z

I'm just not a fan of prohibiting. But am fine if others want it. In our implementation we'll happily accept them and there's no special-casing needed. That said, I would hate it if ots started rejecting them.

A subsetter should be smart enough to not generate those empty rules. I just don't see what the prohibition offers.

skef · 2024-04-23T02:46:35Z

I don't like the old-style-Google-interview question aspect of making an empty OR false. Very elegant, also a subtle detail inviting problems.

"If they had read the spec carefully they would have properly understood this elegant and completely useless aspect of it, so it's their fault."

skef · 2024-04-23T02:47:46Z

I would object less if all empty lists were treated as true, but I presume the objection is that it goes against the language.

behdad · 2024-04-23T02:48:44Z

An empty OR cannot be true, because there does NOT exist any true value in its list.

skef · 2024-04-23T02:49:43Z

Did you read my messages?

skef · 2024-04-23T02:52:52Z

Anyway, an empty OR can absolutely be treated as true by stipulation. "it offends my sensibilities" != "cannot".

skef · 2024-04-23T02:55:18Z

Anyway, if we're doing this formal correctness in the face of questionable data thing you're going to need a note in the condition section about offset 0 being true. Can't just note that at the "edges".

skef · 2024-04-23T03:12:58Z

(OK, backing up once again, I would have to think though the situations likely to lead to empty tables to determine what the right guidance for empty tables should be, or if there even is a place for consistent guidance. For example, one might be starting from some existing boolean expression, stripping terms from it, and either not noticing that one wound up with an empty term or not knowing what to do about that. That could happen when subsetting, for example. The desirable outcome would be having the degenerative result retain consistency among potentially distributed boolean expressions, so e.g. you don't wind up with two or zero of some element type when there should be one.

Maybe that winds up being equivalent to following the formal definitions. Maybe it's the opposite. Maybe there is no consistent way of achieving that result. I can't think these things through that fast.)

behdad · 2024-04-23T03:21:04Z

Maybe that winds up being equivalent to following the formal definitions.

That is my understanding.

Here's some Python code to back my opinion:

from operator import and_, or_
from functools import reduce

If you do:

conditions = []
print(reduce(and_, conditions))
print(reduce(or_, conditions))

you get an exception:

TypeError: reduce() of empty iterable with no initial value

We can add an initial value to reduce, to make the empty container work. The correct initial value, for normal cases to work, is True for and_, and False for or_:

conditions = []
print(reduce(and_, conditions, True))
print(reduce(or_, conditions, False))

If you do this, you will see that it prints True, then False. This is not arbitrary IMO.

skef · 2024-04-23T03:26:36Z

I have to write updated language for Liam right now so I can't think about this tonight.

liamquin · 2024-04-23T03:41:33Z

in formal logic (first order predicate calculus) which is the basis for and/or/not,
given a collection C of conditions, a conjunction (and) over C is true if there is no c in C such that c evaluates to false.
If C is empty, there is indeed no c in C that is false, so an empty and should be true.

A disjunction (or) over C is true if there is some c in C that is true. If there is no c in C (because C is empty), then the condition is not satisfied and the or is false.

I can write text like that in the draft if it helps.

skef · 2024-04-23T03:55:45Z

I've studied formal logic. It's also true in formal logic that a contradiction can be used to prove anything. And "but" means "and". These don't seem relevant to the present domain, the question is whether what we're discussing is.

We should be basing our spec on what is likely to yield the best results in practical situations. I'm 60% convinced that the formal definitions will yield that result having now thought about it for maybe an hour, but I'll note that we seem to be exploring that because I pointed it out. Before that we were arguing about language.

liamquin · 2024-04-23T03:59:11Z

I didn’t mean to imply that i thought you were unfamiliar with logic (although i didn't write my comment very well, sorry), but rather to explore more formal wording.

skef · 2024-04-23T03:59:57Z

Another wrinkle: If I were facing the subsetting problem I posed earlier I would probably avoid reducing any elements of an expression and simply replace elements with "true" for "false" depending on what I was eliminating. That arguably calls for clean expressions of those, which perhaps should be documented. I have suggested moving the idea that a 0 offset is equivalent to truth into the condition section. So a negation with a zero offset could be "false". Maybe that's sufficient, especially given that conditions can be shared. Since 0 is a magic value we also have the option of making offset 1 be false. But I understand that many people consider 0 to be more validly magical than other values, hence "NULL" in C and such.

skef · 2024-04-23T04:01:06Z

Regardless, I think there should be some documented way of expressing true and false for these reasons that doesn't involve cobbling up a particular format 1 or 2 condition that happens to have the right property.

simoncozens · 2024-04-23T05:53:07Z

I think we can and should stipulate what the default fallback for combinatorial booleans is, and I'm going to argue that it should be False.

For consistency, I agree with Skef that "Empty-and is false and empty-or is true" is just too cute. The empty list is inherently falsey; a NULL offset signifies a lack of something. If you want an explicit "always true" value, I don't think there's a need for a magical 1 offset - just negate the empty list.

I think also for semantic reasons, having false as the fallback for these things makes sense. Think of Feature Variation tables - they are exceptions to the rule, the conditions when something changes from the norm. When does the dollar glyph switch to a simpler form? You have to tell me when. If you don't tell me when, it doesn't. "It changes when or([wght>600, wdth<50])" gives the conditions when it changes. "It changes when or([])" means "I have not given you conditions when it changes", which means it doesn't change. So an empty condition, or an empty-and or an empty-or, means "do nothing".

skef · 2024-04-23T05:56:29Z

Another note: I mentioned the problem of expression cycles in this model, especially given the expressed (although not documented) goal of sharing. Behdad argued that that's dealt with by offset direction restrictions. I saw nothing about that in the draft.

simoncozens · 2024-04-23T06:13:27Z

Do we need to put anything in the spec about cycles? Offsets are already typed as unsigned. I don't see a need to explicitly point out all the things that you just logically can't do - in a commentary perhaps, but not in a spec.

skef · 2024-04-23T06:30:15Z

Well, that's a good point. Sorry, I just hadn't absorbed the implications of that earlier comment.

With more Condition formats. harfbuzz/boring-expansion-spec@df7dcc4 harfbuzz/boring-expansion-spec#147

behdad added a commit to fonttools/fonttools that referenced this issue Apr 22, 2024

[varc] Use Condition instead of ConditionSet

7193680

With more Condition formats. harfbuzz/boring-expansion-spec@df7dcc4 harfbuzz/boring-expansion-spec#147

behdad added a commit to harfbuzz/harfbuzz that referenced this issue Apr 22, 2024

[varc] Use Condition instead of ConditionSet

6129c72

With more Condition formats. harfbuzz/boring-expansion-spec@df7dcc4 harfbuzz/boring-expansion-spec#147

behdad added a commit to fonttools/fonttools that referenced this issue May 23, 2024

[varc] Use Condition instead of ConditionSet

973dc5c

With more Condition formats. harfbuzz/boring-expansion-spec@df7dcc4 harfbuzz/boring-expansion-spec#147

Generalizing Condition / ConditionSet #147

Generalizing Condition / ConditionSet #147

Comments

behdad commented Apr 18, 2024 • edited Loading

behdad commented Apr 18, 2024

skef commented Apr 18, 2024

nedley commented Apr 18, 2024

behdad commented Apr 18, 2024

behdad commented Apr 18, 2024

skef commented Apr 18, 2024

behdad commented Apr 18, 2024

skef commented Apr 18, 2024

behdad commented Apr 18, 2024

nedley commented Apr 18, 2024 • edited Loading

behdad commented Apr 18, 2024

skef commented Apr 19, 2024

nedley commented Apr 19, 2024

skef commented Apr 19, 2024

behdad commented Apr 19, 2024

simoncozens commented Apr 19, 2024

behdad commented Apr 19, 2024

skef commented Apr 19, 2024

skef commented Apr 19, 2024

behdad commented Apr 19, 2024

behdad commented Apr 19, 2024

behdad commented Apr 19, 2024

skef commented Apr 20, 2024

Lorp commented Apr 20, 2024

behdad commented Apr 21, 2024

behdad commented Apr 21, 2024 • edited Loading

skef commented Apr 22, 2024

behdad commented Apr 22, 2024

behdad commented Apr 22, 2024

skef commented Apr 23, 2024

behdad commented Apr 23, 2024

skef commented Apr 23, 2024 • edited Loading

behdad commented Apr 23, 2024

skef commented Apr 23, 2024

skef commented Apr 23, 2024

behdad commented Apr 23, 2024

skef commented Apr 23, 2024

skef commented Apr 23, 2024

skef commented Apr 23, 2024

skef commented Apr 23, 2024

behdad commented Apr 23, 2024

skef commented Apr 23, 2024

liamquin commented Apr 23, 2024

skef commented Apr 23, 2024

liamquin commented Apr 23, 2024

skef commented Apr 23, 2024

skef commented Apr 23, 2024

simoncozens commented Apr 23, 2024 • edited Loading

skef commented Apr 23, 2024

simoncozens commented Apr 23, 2024

skef commented Apr 23, 2024

Generalizing `Condition` / `ConditionSet` #147

Generalizing `Condition` / `ConditionSet` #147

behdad commented Apr 18, 2024 •

edited

Loading

nedley commented Apr 18, 2024 •

edited

Loading

behdad commented Apr 21, 2024 •

edited

Loading

skef commented Apr 23, 2024 •

edited

Loading

simoncozens commented Apr 23, 2024 •

edited

Loading