harfbuzz: mapping from language.Language to OpenType tags seems to be case sensitive #150

dominikh · 2024-03-27T03:09:31Z

Looking at tagsFromComplexLanguage in ot_language_table.go, the input to the function seems to have the same case that the user set for SegmentProperties.Language, so for example zh-Hant-HK. However, the implementation of the function is case sensitive and doesn't normalize the case. As such, checks like if langMatches(langStr[1:], "h-hant-hk") { don't actually match.

The text was updated successfully, but these errors were encountered:

dominikh · 2024-03-27T03:18:36Z

This is probably user-error and the intended usage of language.Language is to use language.NewLanguage to canonicalize the string before assigning it to SegmentProperties.Language. Unfortunately, it's easy to misuse this API. Feel free to close this issue, however.

benoitkugler · 2024-03-27T14:53:30Z

This is probably user-error and the intended usage of language.Language is to use language.NewLanguage to canonicalize the string before assigning it to SegmentProperties.Language. Unfortunately, it's easy to misuse this API. Feel free to close this issue, however.

Precisely.
I had hoped that using a defined type was a strong enough hint. Perhaps it could be added in the documentation of the Language type, that using the constructor is mandatory ?

dominikh · 2024-03-27T15:40:36Z

How do you feel about something like this instead?

type Language struct {
    lang string
}

func NewLanguage(lang string) Language { ... }

func (l Language) String() string { return l.lang }

The upside is that users are forced to go through NewLanguage to create languages. The downside is that manipulating languages, e.g. via slicing, is less trivial for everyone, and more costly for users as they'd have to go through NewLanguage again. Although arguably most users shouldn't have to manipulate languages in arbitrary ways, anyway.

benoitkugler · 2024-03-27T16:30:03Z

How do you feel about something like this instead?

Looks great. It would not be backward compatible though, so I'm not sure how much this change is worth doing. @andydotxyz @whereswaldon what do you think ?

dominikh · 2024-03-27T16:43:03Z

As an aside (which I'm happy to split into its own issue if need be), the handling of language.Script is also case-sensitive, but language.ParseScript doesn't do any normalization. So shaping with the script Arab doesn't work correctly, but shaping with arab does. That's particularly confusing because script names are preferably written in title-case.

benoitkugler · 2024-03-27T18:39:55Z

As an aside (which I'm happy to split into its own issue if need be), the handling of language.Script is also case-sensitive, but language.ParseScript doesn't do any normalization. So shaping with the script Arab doesn't work correctly, but shaping with arab does. That's particularly confusing because script names are preferably written in title-case.

Yes, I encountered this issue as well. I think I kept the lowercase convention because it simplifies thé interaction with fonts at some point. Not sure though, I'll take a deeper look. language.ParseScript should definitively do the appropriate normalisation.

whereswaldon · 2024-03-28T14:39:45Z

Looks great. It would not be backward compatible though, so I'm not sure how much this change is worth doing. @andydotxyz @whereswaldon what do you think ?

I'm okay with forcing language to be constructed in the interest of eliminating mistakes. I guess it's up to @andydotxyz

andydotxyz · 2024-04-02T13:46:06Z

Looks great. It would not be backward compatible though, so I'm not sure how much this change is worth doing. @andydotxyz @whereswaldon what do you think ?

I'm okay with forcing language to be constructed in the interest of eliminating mistakes. I guess it's up to @andydotxyz

Agreed. Sorry for having been out of contact last week.

benoitkugler mentioned this issue Mar 29, 2024

[language] Better script casing #152

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

harfbuzz: mapping from language.Language to OpenType tags seems to be case sensitive #150

harfbuzz: mapping from language.Language to OpenType tags seems to be case sensitive #150

dominikh commented Mar 27, 2024

dominikh commented Mar 27, 2024

benoitkugler commented Mar 27, 2024

dominikh commented Mar 27, 2024

benoitkugler commented Mar 27, 2024

dominikh commented Mar 27, 2024

benoitkugler commented Mar 27, 2024

whereswaldon commented Mar 28, 2024

andydotxyz commented Apr 2, 2024

harfbuzz: mapping from language.Language to OpenType tags seems to be case sensitive #150

harfbuzz: mapping from language.Language to OpenType tags seems to be case sensitive #150

Comments

dominikh commented Mar 27, 2024

dominikh commented Mar 27, 2024

benoitkugler commented Mar 27, 2024

dominikh commented Mar 27, 2024

benoitkugler commented Mar 27, 2024

dominikh commented Mar 27, 2024

benoitkugler commented Mar 27, 2024

whereswaldon commented Mar 28, 2024

andydotxyz commented Apr 2, 2024