Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small slowdown in tagging performance after moving to the Solr 7.4 built-in tagger handler #86

Open
simonatdrg opened this issue Oct 4, 2018 · 1 comment

Comments

@simonatdrg
Copy link

I've moved our tagging server from a Solr 6.5.1 instance running the SolrTextTagger code on github to the built-in tagger handler in Solr 7.4.0. The metrics we collect for bulk tagging indicate that there has been a small slowdown as a result of this, of the order of .0005 second per HTTP call to the tagger from our Python tagging application. While this isn't exactly earth shaking, it does add around 45 minutes to an 11 hour index generation job (which runs overnight, admittedly).

Nothing else has changed in our framework (same hardware, Java version, Python version, tagging dictionary) and it's already been optimized like crazy to minimze the number of tagger calls required, so I'm curious as to what you might think is the cause of this ; the changes needed to port the Tagger to Solr 7.4 (you mentioned the move to FST50 postings )? possible changes to the Jetty version ? or something else.

@dsmiley
Copy link
Member

dsmiley commented Oct 5, 2018

This is very likely the change in postings format from “Memory” to “FST50”. Memory still exists. I have ideas on how to resurrect a memory codec equivalent but no time for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants