Performance improvement 3: implement string cache #196
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This replaces #195. Both PRs are not compatible with each other (though general idea from #195 can be implemented later after cache).
The idea is very simple - "just" use a string cache, and when the same string is requested twice, return a cached results.
There are a few implementation problems that make it more complicated than necessary:
std::vector<PrimitiveQuery>
as a parameter instead of just PrimitiveQuery. That makes a code just a bit more ugly.std::vector<PrimitiveQuery>
must be useable in a STL container, so we must give PrimitiveQuery a < operator.Nevertheless, the results look good:
https://github.com/msm-code/ursa-bench/blob/master/results/3_cache_hdd_all.txt
https://github.com/msm-code/ursa-bench/blob/master/results/hdd_all.html
http://65.21.130.153:8000/hdd_all.html
where it helps, it speeds things up 10%-50%. On a whole corpus this gives us 20% speedup which is not gamebreaking, but not terrible too.