fontscan: how to combine ResolveFace and ResolveFaceForLang? #139

dominikh · 2024-03-14T00:49:30Z

ResolveFace returns the first face that covers a given rune, while ResolveFaceForLang returns the first face that covers a given language. But how do I find the first face that covers a given rune in a given language?

For example, we might have two fonts cn0-4 and cn4-8 that cover disjoint sets of runes for Traditional Chinese, and two fonts jp0-4 and jp4-8 that cover the same runes as the Chinese fonts, but for Japanese, registered in the order cn0-4, cn4-8, jp0-4, jp4-8.

I cannot just look for "rune 5", nor for "japanese" to find jp4-8. The first search would find cn4-8, and the second search would find jp0-4.

This also impacts shaping.SplitByFace, which currently discards language information.

The text was updated successfully, but these errors were encountered:

dominikh · 2024-03-14T03:20:10Z

(It's probably also worth documenting that ResolveFaceForLang just maps languages to rune sets and uses those for the lookup; it doesn't consult the metadata of the fonts. The ideal combination of resolving by rune and face should probably also consult the LOCL table; though really, segmenting by face should use grapheme clusters, not individual runes, and also handle Unicode normalization, etc.)

benoitkugler · 2024-03-18T15:11:56Z

Hum.. I was not aware the same Unicode code point may have different glyphs presentation depending on the language. Have you some examples of fonts and languages that have this behavior ?

Perhaps this issue would be resolved by rules (like the ones used by fontconfig) such as "for given language and family, use this family instead of that one" (related to #82).

Its true that the segmentation process is limited, because we rely on Harfbuzz for normalization and cluster handling, which is a rather complex topic. I'm not sure how hard it would be to extract the Harfbuzz logic and apply it during segmentation..

dominikh · 2024-03-18T15:30:30Z

Hum.. I was not aware the same Unicode code point may have different glyphs presentation depending on the language. Have you some examples of fonts and languages that have this behavior ?

The most famous example is https://en.wikipedia.org/wiki/Han_unification. It also sometimes happens for different languages using Cyrillic. For Han, if you're not using a pan-CJK font like Noto Sans CJK, you will have different fonts for Japanese Kanji and Chinese Han. There are even regional differences, with mainland China, Taiwan, Hong Kong, and Singapore all having slight regional differences for the same code points.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fontscan: how to combine ResolveFace and ResolveFaceForLang? #139

fontscan: how to combine ResolveFace and ResolveFaceForLang? #139

dominikh commented Mar 14, 2024

dominikh commented Mar 14, 2024

benoitkugler commented Mar 18, 2024

dominikh commented Mar 18, 2024 •

edited

Loading

fontscan: how to combine ResolveFace and ResolveFaceForLang? #139

fontscan: how to combine ResolveFace and ResolveFaceForLang? #139

Comments

dominikh commented Mar 14, 2024

dominikh commented Mar 14, 2024

benoitkugler commented Mar 18, 2024

dominikh commented Mar 18, 2024 • edited Loading

dominikh commented Mar 18, 2024 •

edited

Loading