-
Notifications
You must be signed in to change notification settings - Fork 15.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Replace all regexes in TextFormat.Tokenizer with direct char scanning.
The JVM regex engine allocates garbage on every match (especially when calling Matcher.usePattern!). Since there are expected to be a lot of tokens, this caused substantial GC overhead. Direct char scanning also opens the possibility of other optimizations that aren't possible with regexes. For example: - direct reads from a char[] - streaming tokenization (rather than reading the complete source text) PiperOrigin-RevId: 687635759
- Loading branch information
1 parent
47613cf
commit 739b531
Showing
2 changed files
with
176 additions
and
65 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters