Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fontscan: how to combine ResolveFace and ResolveFaceForLang? #139

Open
dominikh opened this issue Mar 14, 2024 · 3 comments
Open

fontscan: how to combine ResolveFace and ResolveFaceForLang? #139

dominikh opened this issue Mar 14, 2024 · 3 comments

Comments

@dominikh
Copy link
Contributor

ResolveFace returns the first face that covers a given rune, while ResolveFaceForLang returns the first face that covers a given language. But how do I find the first face that covers a given rune in a given language?

For example, we might have two fonts cn0-4 and cn4-8 that cover disjoint sets of runes for Traditional Chinese, and two fonts jp0-4 and jp4-8 that cover the same runes as the Chinese fonts, but for Japanese, registered in the order cn0-4, cn4-8, jp0-4, jp4-8.

I cannot just look for "rune 5", nor for "japanese" to find jp4-8. The first search would find cn4-8, and the second search would find jp0-4.

This also impacts shaping.SplitByFace, which currently discards language information.

@dominikh
Copy link
Contributor Author

(It's probably also worth documenting that ResolveFaceForLang just maps languages to rune sets and uses those for the lookup; it doesn't consult the metadata of the fonts. The ideal combination of resolving by rune and face should probably also consult the LOCL table; though really, segmenting by face should use grapheme clusters, not individual runes, and also handle Unicode normalization, etc.)

@benoitkugler
Copy link
Contributor

Hum.. I was not aware the same Unicode code point may have different glyphs presentation depending on the language. Have you some examples of fonts and languages that have this behavior ?

Perhaps this issue would be resolved by rules (like the ones used by fontconfig) such as "for given language and family, use this family instead of that one" (related to #82).

Its true that the segmentation process is limited, because we rely on Harfbuzz for normalization and cluster handling, which is a rather complex topic. I'm not sure how hard it would be to extract the Harfbuzz logic and apply it during segmentation..

@dominikh
Copy link
Contributor Author

dominikh commented Mar 18, 2024

Hum.. I was not aware the same Unicode code point may have different glyphs presentation depending on the language. Have you some examples of fonts and languages that have this behavior ?

The most famous example is https://en.wikipedia.org/wiki/Han_unification. It also sometimes happens for different languages using Cyrillic. For Han, if you're not using a pan-CJK font like Noto Sans CJK, you will have different fonts for Japanese Kanji and Chinese Han. There are even regional differences, with mainland China, Taiwan, Hong Kong, and Singapore all having slight regional differences for the same code points.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants