Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalizing Condition / ConditionSet #147

Open
behdad opened this issue Apr 18, 2024 · 51 comments
Open

Generalizing Condition / ConditionSet #147

behdad opened this issue Apr 18, 2024 · 51 comments

Comments

@behdad
Copy link
Member

behdad commented Apr 18, 2024

Now that VARC table has conditional components, it's desirable to be able to negate an entire ConditionSet. My proposal is to make ConditionSet a legacy thing, and add Conditions that perform AND, OR, and NEGATE bool operators. This is a natural design in style of GSUB/GPOS, and allows for automatic byte-sharing. To quote from my email to the MPEG list a few days ago:

struct ConditionAnd
{
  uint16 format; // 2
  uint8 conditionCount; // Number of conditions for this conjunction expression.
  Offset24To<Condition> conditionOffsets[conditionCount];
}

struct ConditionOr
{
  uint16 format; // 3
  uint8 conditionCount; // Number of conditions for this disjunction expression.
  Offset24To<Condition> conditionOffsets[conditionCount];
}

struct ConditionNegate
{
  uint16 format; // 4
  Offset24To<Condition> condition;
}

@skef has other ideas. I let him explain himself.

@behdad
Copy link
Member Author

behdad commented Apr 18, 2024

cc @nedley @PeterCon

@skef
Copy link
Contributor

skef commented Apr 18, 2024

This was my email on the subject

I wondered if you had considered an option that is more
"postfix-y" and therefore more compact. For example, consider:

(~A & B & C) | ((D & ~E) | F)

where the letters are, in effect, offsets to conditions. That could be encoded with
something more like the existing condition set as:

A~B&C&DE~&F||

where ~ is offset==1, | is offset==2 and & is offset==3 (or whatever). Basically
just fold the parse tree structure into an array. I'm not absolutely sure of it, but
I think that would mean that the "operators" have to be unary or binary -- that
you can't do "A~BC&" with the "&" collecting the previous entries up because the
format as a whole would be ambiguous.

Anyway, I'm not sure if the positives outweigh the negatives but I think it's
worth either switching to something like this or coming up with an argument
against it in case someone else brings it up.

This will clearly work -- lots of systems are implemented similar to this. I'm still not sure if the positives outweigh the negatives. Behdad believes it will reduce sharing; I think that while that is true in an abstract sense that representing things this way will still be more compact, even if there sharing opportunities.

@nedley
Copy link

nedley commented Apr 18, 2024

Deprecated or not, the behavior when combined with ConditionSet should be defined.

@behdad
Copy link
Member Author

behdad commented Apr 18, 2024

Deprecated or not, the behavior when combined with ConditionSet should be defined.

The behavior is well-defined. The ConditionSet is AND of conditions, and will remain so.

@behdad
Copy link
Member Author

behdad commented Apr 18, 2024

By deprecate I just mean that in new tables, instead of using a ConditionSet, we can use just a Condition.

@skef
Copy link
Contributor

skef commented Apr 18, 2024

If we go the direction of this condition tree, the proposal on the mailing list was to:

  1. Update the VARC condition spec and the (new, so easily alterable) LookupVarriation record to just point to a (possibly tree-) condition, and then
  2. Leave the ConditionSet in the FeatureVariation record with the presumption that eventually it may just be used to point to a tree-condition.

So if someone wants to do a hybrid thing, such as AND two tree-conditions in the place where a condition set is still used, they're free to do so and the meaning is well-understood.

@behdad
Copy link
Member Author

behdad commented Apr 18, 2024

I'm still not sure if the positives outweigh the negatives.

One reason I don't like this design is that it needs a custom compiler, whereas in the other design the existing table packers do the job.

Also we would have to define a stack limit.

@skef
Copy link
Contributor

skef commented Apr 18, 2024

Also we would have to define a stack limit.

Is there really a substantial difference between the depth of a postfix operator expression and the depth of an offset-based tree structure? We don't seem to care about limits on the latter ...

@behdad
Copy link
Member Author

behdad commented Apr 18, 2024

Also we would have to define a stack limit.

Is there really a substantial difference between the depth of a postfix operator expression and the depth of an offset-based tree structure? We don't seem to care about limits on the latter ...

Fair. In actual implementation we'll probably do a work-counter in either case.

@nedley
Copy link

nedley commented Apr 18, 2024

The behavior is well-defined. The ConditionSet is AND of conditions, and will remain so.

I’m probably missing something, then: how is ConditionAnd different from ConditionSet?

@behdad
Copy link
Member Author

behdad commented Apr 18, 2024

I’m probably missing something, then: how is ConditionAnd different from ConditionSet?

ConditionAnd has a format number at the beginning to differentiate it from other types. ConditionSet has no version/format and as such cannot be modified.

@skef
Copy link
Contributor

skef commented Apr 19, 2024

Functionally, a ConditionSet doesn't identify itself or otherwise act like a condition, so it can't serve the job of ANDing the conditions in any other position in an expression tree besides the root.

@nedley
Copy link

nedley commented Apr 19, 2024

I guess the mention of deprecation threw me off, then. So this is proposing additional formats for use within a ConditionSet of a FeatureVariationRecord?

@skef
Copy link
Contributor

skef commented Apr 19, 2024

Each of these would be a new condition format, so instead of the current working draft proposal of format 1 (the current), format 2 (its negation), format 3 (condition value), and format 4 (negated condition value), we would have

  1. the current, axis-range condition
  2. condition value
  3. NEG (with one "child" condition)
  4. AND (with an arbitrary number of child conditions)
  5. OR (with an arbitrary number of child conditions)

@behdad
Copy link
Member Author

behdad commented Apr 19, 2024

I guess the mention of deprecation threw me off, then.

It will be obviated by the ConditionAnd.

So this is proposing additional formats for use within a ConditionSet of a FeatureVariationRecord?

Right. And in new places we need conditions, like the VARC table.

@simoncozens
Copy link

My 2c:

  • Skef's "RPN calculator" approach is cute and of course would work. And I can see that as an extension of the ConditionSet table it is the most minimal change - we keep the idea of an array of conditions, and just sprinkle some magic elements inside the array - but it just doesn't sit well with me. Minimally intrusive, maximally compatible changes are good for when you're keeping things broadly the same, but if you're going to make them different, you should make use of the flexibility that a clean break allows.
  • What I mean by that is that magical anything always starts off seeming like a good idea but always comes back to bite you in the end. In this case, the use of small-integer values in offset fields to mean an operation instead of an offset is just asking for trouble; or more specifically, asking for special-casing behaviour in parsing that particular table that it would be better to avoid. I don't think we have any other magical offset values elsewhere in the spec? (I hope not.) My hope is that as OFF develops it gets more regular to parse, not less.
  • Within COLRv1 paint trees, we have the situation where "single-argument" paint tables (such as PaintGlyph and PaintSolid) have AND-like behaviour, in that they form a mask over their children. So Behdad's approach is not a dissimilar situation. Anything where there are obvious parallels between different parts of the spec helps to minimize the cognitive load; anything which adds new concepts increases cognitive load.
  • In the case of VARC and FeatureVariations, having a "breadth-first" tree structure instead of a "depth-first" RPN allows a renderer to potentially short-circuit the condition evaluation. As soon as I hit a subtable with a ConditionAnd and something's not true, we're done. So potentially there's a micro-optimization there.

If it's a fine distinction between the two ideas, as I think it is, I would prefer the one which feels cleaner - explicit subtables.

@behdad
Copy link
Member Author

behdad commented Apr 19, 2024

Another issue with using magical offset values is that old code will misparse the data.

@skef
Copy link
Contributor

skef commented Apr 19, 2024

Offset 0 is at least somewhat magical in some contexts. I thought I recalled one or two other cases with non-zero offsets not treated as offsets but I'm not recalling where at present.

I should note that there are two ideas being conflated here. One is using a list to represent the expression rather than a tree structure for the operators. The other is re-using the existing condition-set, which has no versioning, for that purpose. I'm only suggesting the former, largely on the grounds of compactness and, in some senses but not others, simplicity. Along the lines of what Behdad has suggested for the separate operators, we could just have a new versioned condition type that contains the list and treats those offsets as special, and then there are no versioning issues with the existing condition set.

@skef
Copy link
Contributor

skef commented Apr 19, 2024

I suppose the argument here is going to boil down to "This is like COLR v1 so it's better from a simplicity standpoint". That will probably win the day, but I don't personally think COLR v1 is a good example we should be emulating. "Let's have dozens of tiny subtables pointing at each other" is a huge pain in the ass when it comes to some things like dynamic subsetting, and means that everyone's code has to protect against cycles (another difference between these two proposals). It's more general but less compact than some alternative, which is strange coming from Google-aligned contexts that otherwise advocate for space savings above all other considerations as if that was just an obvious conclusion.

It's there, we'll live with it, but I'm not eager to reproduce that approach in other contexts.

@behdad
Copy link
Member Author

behdad commented Apr 19, 2024

I suppose the argument here is going to boil down to "This is like COLR v1 so it's better from a simplicity standpoint". That will probably win the day, but I don't personally think COLR v1 is a good example we should be emulating. "Let's have dozens of tiny subtables pointing at each other" is a huge pain in the ass when it comes to some things like dynamic subsetting, and means that everyone's code has to protect against cycles (another difference between these two proposals). It's more general but less compact than some alternative, which is strange coming from Google-aligned contexts that otherwise advocate for space savings above all other considerations as if that was just an obvious conclusion.

Cycles are not possible because offsets are always positive.

Note how negating an entire condition expression can be done with just a few bytes with table-sharing, which is not possible with the stack-machine.

@behdad
Copy link
Member Author

behdad commented Apr 19, 2024

If we were to go the stack-machine route, I suggest using:

struct {
  uint8 format;
  uint24 offset;
}

instead of magic offset numbers.

@behdad
Copy link
Member Author

behdad commented Apr 19, 2024

Also, it can become prefix, to allow short-circuiting as Simon suggested. And the 24bits for the AND/OR formats can be used to encode number of operands.

@skef
Copy link
Contributor

skef commented Apr 20, 2024

I just chose postfix because it tends to be more familiar. If there are engineering reasons to do prefix it should be prefix.

@Lorp
Copy link
Collaborator

Lorp commented Apr 20, 2024

This RPN idea is interesting, but I agree with @simoncozens about the additional cognitive load. The small offset coding, the stack itself, and the fact that AND and OR require strictly 2 operands, all feel novel and not in a good way. Smaller size may be true for random condition formulae, on average, but it isn’t always true: for example an OR of 10 conditions, which I think is a reasonable use case (e.g. simpler forms being used in multiple corners of the design space).

@behdad
Copy link
Member Author

behdad commented Apr 21, 2024

Also, it can become prefix, to allow short-circuiting as Simon suggested.

Not quite. One still has to parse and skip over short-circuited items, unless we're at the top-level.

@behdad
Copy link
Member Author

behdad commented Apr 21, 2024

Based on the original desire (negating an entire ConditionSet) I suggest we stick to my design, which allows negation with 5 bytes. In Skef's proposal it would be 4 bytes plus a copy of the ConditionSet to be negated. So I don't understand why it's desirable.

@skef
Copy link
Contributor

skef commented Apr 22, 2024

Alright, consider my concerns addressed.

@behdad
Copy link
Member Author

behdad commented Apr 22, 2024

Thanks Skef.

What's procedurally the next step? @davelab6 @liamquin

behdad added a commit to fonttools/fonttools that referenced this issue Apr 22, 2024
behdad added a commit to harfbuzz/harfbuzz that referenced this issue Apr 22, 2024
@behdad
Copy link
Member Author

behdad commented Apr 22, 2024

Okay, thanks. I've gone ahead and updated VARC.md and fonttools & harfbuzz varc-table branches.

@skef
Copy link
Contributor

skef commented Apr 23, 2024

@behdad could we just do away with empty ConditionAnd and ConditionOr tables? I can't think of any function they would serve and what I'm seeing in the proposal about them is counter-intuitive.

@behdad
Copy link
Member Author

behdad commented Apr 23, 2024

@behdad could we just do away with empty ConditionAnd and ConditionOr tables? I can't think of any function they would serve and what I'm seeing in the proposal about them is counter-intuitive.

It's just a clarification. From the definitions it follows what the empty ones should do. I just don't want anyone to implement them otherwise. But I'm fine removing those.

@skef
Copy link
Contributor

skef commented Apr 23, 2024

I'm confused about empty ||s being false. If anything I would have expected empty &&s to be false. Maybe this is some degenerative logic thing I've forgotten about.

Oh, I guess I see it based on the particular language you've chosen. Still, I'd rather just prohibit it. It doesn't help anything.

@behdad
Copy link
Member Author

behdad commented Apr 23, 2024

I'm just not a fan of prohibiting. But am fine if others want it. In our implementation we'll happily accept them and there's no special-casing needed. That said, I would hate it if ots started rejecting them.

A subsetter should be smart enough to not generate those empty rules. I just don't see what the prohibition offers.

@skef
Copy link
Contributor

skef commented Apr 23, 2024

I don't like the old-style-Google-interview question aspect of making an empty OR false. Very elegant, also a subtle detail inviting problems.

"If they had read the spec carefully they would have properly understood this elegant and completely useless aspect of it, so it's their fault."

@skef
Copy link
Contributor

skef commented Apr 23, 2024

I would object less if all empty lists were treated as true, but I presume the objection is that it goes against the language.

@behdad
Copy link
Member Author

behdad commented Apr 23, 2024

An empty OR cannot be true, because there does NOT exist any true value in its list.

@skef
Copy link
Contributor

skef commented Apr 23, 2024

Did you read my messages?

@skef
Copy link
Contributor

skef commented Apr 23, 2024

Anyway, an empty OR can absolutely be treated as true by stipulation. "it offends my sensibilities" != "cannot".

@skef
Copy link
Contributor

skef commented Apr 23, 2024

Anyway, if we're doing this formal correctness in the face of questionable data thing you're going to need a note in the condition section about offset 0 being true. Can't just note that at the "edges".

@skef
Copy link
Contributor

skef commented Apr 23, 2024

(OK, backing up once again, I would have to think though the situations likely to lead to empty tables to determine what the right guidance for empty tables should be, or if there even is a place for consistent guidance. For example, one might be starting from some existing boolean expression, stripping terms from it, and either not noticing that one wound up with an empty term or not knowing what to do about that. That could happen when subsetting, for example. The desirable outcome would be having the degenerative result retain consistency among potentially distributed boolean expressions, so e.g. you don't wind up with two or zero of some element type when there should be one.

Maybe that winds up being equivalent to following the formal definitions. Maybe it's the opposite. Maybe there is no consistent way of achieving that result. I can't think these things through that fast.)

@behdad
Copy link
Member Author

behdad commented Apr 23, 2024

Maybe that winds up being equivalent to following the formal definitions.

That is my understanding.

Here's some Python code to back my opinion:

from operator import and_, or_
from functools import reduce

If you do:

conditions = []
print(reduce(and_, conditions))
print(reduce(or_, conditions))

you get an exception:

TypeError: reduce() of empty iterable with no initial value

We can add an initial value to reduce, to make the empty container work. The correct initial value, for normal cases to work, is True for and_, and False for or_:

conditions = []
print(reduce(and_, conditions, True))
print(reduce(or_, conditions, False))

If you do this, you will see that it prints True, then False. This is not arbitrary IMO.

@skef
Copy link
Contributor

skef commented Apr 23, 2024

I have to write updated language for Liam right now so I can't think about this tonight.

@liamquin
Copy link
Collaborator

in formal logic (first order predicate calculus) which is the basis for and/or/not,
given a collection C of conditions, a conjunction (and) over C is true if there is no c in C such that c evaluates to false.
If C is empty, there is indeed no c in C that is false, so an empty and should be true.

A disjunction (or) over C is true if there is some c in C that is true. If there is no c in C (because C is empty), then the condition is not satisfied and the or is false.

I can write text like that in the draft if it helps.

@skef
Copy link
Contributor

skef commented Apr 23, 2024

I've studied formal logic. It's also true in formal logic that a contradiction can be used to prove anything. And "but" means "and". These don't seem relevant to the present domain, the question is whether what we're discussing is.

We should be basing our spec on what is likely to yield the best results in practical situations. I'm 60% convinced that the formal definitions will yield that result having now thought about it for maybe an hour, but I'll note that we seem to be exploring that because I pointed it out. Before that we were arguing about language.

@liamquin
Copy link
Collaborator

I didn’t mean to imply that i thought you were unfamiliar with logic (although i didn't write my comment very well, sorry), but rather to explore more formal wording.

@skef
Copy link
Contributor

skef commented Apr 23, 2024

Another wrinkle: If I were facing the subsetting problem I posed earlier I would probably avoid reducing any elements of an expression and simply replace elements with "true" for "false" depending on what I was eliminating. That arguably calls for clean expressions of those, which perhaps should be documented. I have suggested moving the idea that a 0 offset is equivalent to truth into the condition section. So a negation with a zero offset could be "false". Maybe that's sufficient, especially given that conditions can be shared. Since 0 is a magic value we also have the option of making offset 1 be false. But I understand that many people consider 0 to be more validly magical than other values, hence "NULL" in C and such.

@skef
Copy link
Contributor

skef commented Apr 23, 2024

Regardless, I think there should be some documented way of expressing true and false for these reasons that doesn't involve cobbling up a particular format 1 or 2 condition that happens to have the right property.

@simoncozens
Copy link

simoncozens commented Apr 23, 2024

I think we can and should stipulate what the default fallback for combinatorial booleans is, and I'm going to argue that it should be False.

For consistency, I agree with Skef that "Empty-and is false and empty-or is true" is just too cute. The empty list is inherently falsey; a NULL offset signifies a lack of something. If you want an explicit "always true" value, I don't think there's a need for a magical 1 offset - just negate the empty list.

I think also for semantic reasons, having false as the fallback for these things makes sense. Think of Feature Variation tables - they are exceptions to the rule, the conditions when something changes from the norm. When does the dollar glyph switch to a simpler form? You have to tell me when. If you don't tell me when, it doesn't. "It changes when or([wght>600, wdth<50])" gives the conditions when it changes. "It changes when or([])" means "I have not given you conditions when it changes", which means it doesn't change. So an empty condition, or an empty-and or an empty-or, means "do nothing".

@skef
Copy link
Contributor

skef commented Apr 23, 2024

Another note: I mentioned the problem of expression cycles in this model, especially given the expressed (although not documented) goal of sharing. Behdad argued that that's dealt with by offset direction restrictions. I saw nothing about that in the draft.

@simoncozens
Copy link

Do we need to put anything in the spec about cycles? Offsets are already typed as unsigned. I don't see a need to explicitly point out all the things that you just logically can't do - in a commentary perhaps, but not in a spec.

@skef
Copy link
Contributor

skef commented Apr 23, 2024

Well, that's a good point. Sorry, I just hadn't absorbed the implications of that earlier comment.

behdad added a commit to fonttools/fonttools that referenced this issue May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants