-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ParseTTC duplicates work for tables shared between fonts #147
Comments
Huh.. We could do it, but it would involve a new API for loading collections, since we would need to track the shared tables. And the |
It seems worth doing given the potential savings. We could (potentially) still offer the simpler, less performant API for use cases that don't need the extra complexity. |
Which is IMO warranted, anyway, to make it easier to load fonts from a collection on demand, instead of all at once. I'm currently tinkering on such a new API, I can send an RFC PR in a couple days if you'd like. Edit: I take that back. The parsing of some tables depends on other tables, which makes it harder to implement table reuse cleanly, as different tables would need different cache keys to encode the dependencies. Being on the "receiving end" of trying to implement it, I'd probably want to see some stats as to how often large tables get reused. My intuition tells me that this is only really the case for CJK fonts with language defaults. Most uses of collections vary fonts by weight, width, slant, etc, which all require unique glyphs. |
Here hare some numbers to illustrate @dominikh point . Details
/usr/share/fonts/opentype/noto/NotoSansCJK-Bold.ttc 10 faces /usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc 10 faces /usr/share/fonts/opentype/noto/NotoSerifCJK-Bold.ttc 5 faces /usr/share/fonts/opentype/noto/NotoSerifCJK-Regular.ttc 5 faces (I've not found any other collections on my system though.) Perhaps a first step would be to only consider CFF, CFF2, and glyf tables (which are by far the most heavy ones) ? |
I appreciate that this is complex - but I agree that a collections based API may be a good thing, so we can lazy load less than a full collection. I recently found that many OS provide all languages in a single file including all script based glyphs meaning big files and not particularly fast parses. |
In an OpenType font collection, some tables might be referenced by multiple fonts. For example, in the Noto Sans CJK font collection, all fonts refer to the same CFF2 table (and several others, but the CFF2 table is by far the largest). However, ParseTTC treats each font as an individual object, loading and parsing the same tables repeatedly. For Noto Sans CJK, this results in a 5x increase in I/O and processing time, loading the 30 MB CFF2 table five times, once per font.
I'm not sure that the ParseTTC API is a good idea in the first place (we may only ever want one font from the collection), but if it is to exist, it should at least exploit data deduplication.
The text was updated successfully, but these errors were encountered: