Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace slow string concatenation with faster StringBuilder (for long text content) #323

Merged
merged 2 commits into from
Dec 6, 2018

Conversation

ujay68
Copy link

@ujay68 ujay68 commented Nov 20, 2018

I found that small change to make a huge performance difference in some scenarios. In one case of processing a sample XML file of about 4 MB into POJOs, the CPU time sank by a factor of 100, from around 5–10 seconds to a few tens of milliseconds on a recent MacBook Pro with Azul OpenJDK 8.

@cowtowncoder
Copy link
Member

First of all: thank you for providing the patch.

Now... I believe that this can be useful for specific use case of larger content. But my concern is that for also common case of shorter text segments it leads to unnecessary construction of StringBuilder, slowing down other types of usage, where there would be 2 additional copies (one to populate StringBuilder, and with Java 8+, to construct String with copy of contained characters).

So it'd be good to have best of both worlds. Not sure what a good way would be : jackson-core does have TextBuffer which does something like that, but might not be optimal here. But the idea is that for first segment, String should be retained, and only if more segments are encountered, builder would be constructed.

… don't allocate StringBuilder until the second string
@ujay68
Copy link
Author

ujay68 commented Nov 20, 2018

Thanks for considering. Your point could be true, of course, even if current JVMs are very good at handling short-lived objects. As a suggestion, I've just pushed a second commit where the StringBuilder is not allocated until a second string segment needs to be appended. This keeps the magnitude of the initial optimization. In the simple case of only one string segment, it adds the (probably negligible) overhead of a single .toString() call to a CharSequence object that already is a String. Better this way? (We could go on with more micro-optimizations like tracing which type chars currently has with a boolean and saving some cycles with boolean comparisons instead of instanceof etc. but I'm not sure if that's worth it. Do you have a policy here on such matters?)

@cowtowncoder
Copy link
Member

@ujay68 Yes, I think this is better. As you say there would be room for further optimization, but I agree that difference is likely negligible and relative simplicity is better here.
Ideally it'd be great to have more performance tests, and if you did want to pursue that, there's:

https://github.com/FasterXML/jackson-benchmarks

which only has really one test, focused on "small" data. Or, rather, test I usually run (and works across formats) is like that -- there are some alternate input files too that I probably should use more often.
Anyway it would actually be interested to have alternative tests, runs, and see how they were.
I probably won't have much time but I could help if you were interested.

On short term, I'd be happy to merge PR. The one thing that is needed (if not done yet) is CLA, from:

https://github.com/FasterXML/jackson/contributor-agreement.pdf

(or Corporate-CLA)

Usual method is to print, fill & sign, scan, email to info at fasterxml dot com.
Once this is done, I can merge any and all contributions on Jackson projects.

Looking forward to merging this, thanks!

@cowtowncoder cowtowncoder merged commit 415f6da into FasterXML:master Dec 6, 2018
@cowtowncoder cowtowncoder modified the milestones: 2.9,8, 2.9.8 Dec 6, 2018
@cowtowncoder cowtowncoder changed the title Replaced slow string concatenation with faster StringBuilder Replace slow string concatenation with faster StringBuilder (for long text content) Dec 6, 2018
cowtowncoder added a commit that referenced this pull request Dec 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants