Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed Requests #58

Open
MartinLoeper opened this issue Sep 26, 2016 · 1 comment
Open

Distributed Requests #58

MartinLoeper opened this issue Sep 26, 2016 · 1 comment
Labels

Comments

@MartinLoeper
Copy link

Hi,

I have a question concerning SolrCloud.
Is the TaggerRequestHandler capable of performing distributed requests over multiple shards?
I know that the standard solr select handler does it and can be adjusted using the shards query parameter.

Thanks, Martin

@dsmiley
Copy link
Member

dsmiley commented Sep 26, 2016

The TaggerRequestHandler does not support sharded/distributed requests. I was about to write it'll never happen but I suppose I can fathom how that might work in a reasonable manner. Nonetheless, I have no plans to work on that. Despite the single shard limitation right now, the current Tagger design inherits the flexibility/configurability of Lucene/Solr. So you can put crazy amount of documents into the one shard (over a billion) and this should work. Once you get over 10's of millions, the recommendations in the instructions here will need to be modified. For example, doing optimize=true is no longer sensible, though you might want to merge to a small segment count. You might also want to tweak Solr configuration of Lucene segment merging to produce more segments. And unless you have gobs of memory at crazy high doc counts, then remove postingsFormat="memory". These things will reduce tagger speed. But there would surely be overhead in a distributed search, which this doesn't support.

Another possibility, perhaps the "poor man's sharding" would divide the tag document set into shards (perhaps by some sensible grouping if you have a taxonomy/categories) and then issue requests in parallel and then it's up to you to combine and deconflict the overlaps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants