-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fontscan: how to combine ResolveFace and ResolveFaceForLang? #139
Comments
(It's probably also worth documenting that ResolveFaceForLang just maps languages to rune sets and uses those for the lookup; it doesn't consult the metadata of the fonts. The ideal combination of resolving by rune and face should probably also consult the |
Hum.. I was not aware the same Unicode code point may have different glyphs presentation depending on the language. Have you some examples of fonts and languages that have this behavior ? Perhaps this issue would be resolved by rules (like the ones used by fontconfig) such as "for given language and family, use this family instead of that one" (related to #82). Its true that the segmentation process is limited, because we rely on Harfbuzz for normalization and cluster handling, which is a rather complex topic. I'm not sure how hard it would be to extract the Harfbuzz logic and apply it during segmentation.. |
The most famous example is https://en.wikipedia.org/wiki/Han_unification. It also sometimes happens for different languages using Cyrillic. For Han, if you're not using a pan-CJK font like Noto Sans CJK, you will have different fonts for Japanese Kanji and Chinese Han. There are even regional differences, with mainland China, Taiwan, Hong Kong, and Singapore all having slight regional differences for the same code points. |
ResolveFace
returns the first face that covers a given rune, whileResolveFaceForLang
returns the first face that covers a given language. But how do I find the first face that covers a given rune in a given language?For example, we might have two fonts
cn0-4
andcn4-8
that cover disjoint sets of runes for Traditional Chinese, and two fontsjp0-4
andjp4-8
that cover the same runes as the Chinese fonts, but for Japanese, registered in the ordercn0-4
,cn4-8
,jp0-4
,jp4-8
.I cannot just look for "rune 5", nor for "japanese" to find
jp4-8
. The first search would findcn4-8
, and the second search would findjp0-4
.This also impacts
shaping.SplitByFace
, which currently discards language information.The text was updated successfully, but these errors were encountered: