Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ParseTTC duplicates work for tables shared between fonts #147

Open
dominikh opened this issue Mar 25, 2024 · 5 comments
Open

ParseTTC duplicates work for tables shared between fonts #147

dominikh opened this issue Mar 25, 2024 · 5 comments

Comments

@dominikh
Copy link
Contributor

In an OpenType font collection, some tables might be referenced by multiple fonts. For example, in the Noto Sans CJK font collection, all fonts refer to the same CFF2 table (and several others, but the CFF2 table is by far the largest). However, ParseTTC treats each font as an individual object, loading and parsing the same tables repeatedly. For Noto Sans CJK, this results in a 5x increase in I/O and processing time, loading the 30 MB CFF2 table five times, once per font.

I'm not sure that the ParseTTC API is a good idea in the first place (we may only ever want one font from the collection), but if it is to exist, it should at least exploit data deduplication.

@benoitkugler
Copy link
Contributor

Huh..

We could do it, but it would involve a new API for loading collections, since we would need to track the shared tables. And the NewFont constructor would have to be adapted quite heavily..

@whereswaldon
Copy link
Member

It seems worth doing given the potential savings. We could (potentially) still offer the simpler, less performant API for use cases that don't need the extra complexity.

@dominikh
Copy link
Contributor Author

dominikh commented Mar 29, 2024

but it would involve a new API for loading collections

Which is IMO warranted, anyway, to make it easier to load fonts from a collection on demand, instead of all at once.

I'm currently tinkering on such a new API, I can send an RFC PR in a couple days if you'd like.

Edit: I take that back. The parsing of some tables depends on other tables, which makes it harder to implement table reuse cleanly, as different tables would need different cache keys to encode the dependencies. Being on the "receiving end" of trying to implement it, I'd probably want to see some stats as to how often large tables get reused. My intuition tells me that this is only really the case for CJK fonts with language defaults. Most uses of collections vary fonts by weight, width, slant, etc, which all require unique glyphs.

@benoitkugler
Copy link
Contributor

Here hare some numbers to illustrate @dominikh point .

Details

/usr/share/fonts/opentype/noto/NotoSansCJK-Bold.ttc 10 faces
CFF : 16023 KB -> used 10 times
hmtx : 262 KB -> used 10 times
vmtx : 261 KB -> used 10 times
VORG : 0 KB -> used 10 times
BASE : 0 KB -> used 10 times
vhea : 0 KB -> used 10 times
hhea : 0 KB -> used 10 times
post : 0 KB -> used 10 times
GDEF : 0 KB -> used 10 times
maxp : 0 KB -> used 10 times
OS/2 : 0 KB -> used 6 times
OS/2 : 0 KB -> used 4 times
GSUB : 177 KB -> used 2 times
GSUB : 171 KB -> used 2 times
GSUB : 167 KB -> used 2 times
GSUB : 166 KB -> used 2 times
GSUB : 166 KB -> used 2 times

/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc 10 faces
CFF : 15458 KB -> used 10 times
hmtx : 262 KB -> used 10 times
vmtx : 261 KB -> used 10 times
VORG : 0 KB -> used 10 times
BASE : 0 KB -> used 10 times
hhea : 0 KB -> used 10 times
vhea : 0 KB -> used 10 times
post : 0 KB -> used 10 times
GDEF : 0 KB -> used 10 times
maxp : 0 KB -> used 10 times
OS/2 : 0 KB -> used 6 times
OS/2 : 0 KB -> used 4 times
GSUB : 177 KB -> used 2 times
GSUB : 171 KB -> used 2 times
GSUB : 167 KB -> used 2 times
GSUB : 166 KB -> used 2 times
GSUB : 166 KB -> used 2 times

/usr/share/fonts/opentype/noto/NotoSerifCJK-Bold.ttc 5 faces
CFF : 24427 KB -> used 5 times
hmtx : 261 KB -> used 5 times
vmtx : 261 KB -> used 5 times
VORG : 0 KB -> used 5 times
BASE : 0 KB -> used 5 times
hhea : 0 KB -> used 5 times
vhea : 0 KB -> used 5 times
post : 0 KB -> used 5 times
GDEF : 0 KB -> used 5 times
maxp : 0 KB -> used 5 times
OS/2 : 0 KB -> used 3 times
OS/2 : 0 KB -> used 2 times

/usr/share/fonts/opentype/noto/NotoSerifCJK-Regular.ttc 5 faces
CFF : 23442 KB -> used 5 times
hmtx : 261 KB -> used 5 times
vmtx : 261 KB -> used 5 times
VORG : 1 KB -> used 5 times
BASE : 0 KB -> used 5 times
vhea : 0 KB -> used 5 times
hhea : 0 KB -> used 5 times
post : 0 KB -> used 5 times
GDEF : 0 KB -> used 5 times
maxp : 0 KB -> used 5 times
OS/2 : 0 KB -> used 3 times
OS/2 : 0 KB -> used 2 times

(I've not found any other collections on my system though.)

Perhaps a first step would be to only consider CFF, CFF2, and glyf tables (which are by far the most heavy ones) ?

@andydotxyz
Copy link
Contributor

Which is IMO warranted, anyway, to make it easier to load fonts from a collection on demand, instead of all at once.

I appreciate that this is complex - but I agree that a collections based API may be a good thing, so we can lazy load less than a full collection.

I recently found that many OS provide all languages in a single file including all script based glyphs meaning big files and not particularly fast parses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants