Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The update_document performs poorly if called multiple times #531

Open
nijel opened this issue Jan 19, 2019 · 0 comments
Open

The update_document performs poorly if called multiple times #531

nijel opened this issue Jan 19, 2019 · 0 comments

Comments

@nijel
Copy link
Collaborator

nijel commented Jan 19, 2019

The IndexWriter.update_document method performs really poorly when invoked several times on index with an unique field. The reason is that it leads to constructing searcher for every updated document (what in turn leads to reading index from disk several times):

unique_fields = self._unique_fields(fields)
if unique_fields:
with self.searcher() as s:
uniqueterms = [(name, fields[name]) for name in unique_fields]
docs = s._find_unique(uniqueterms)
for docnum in docs:
self.delete_document(docnum)
# Add the given fields
self.add_document(**fields)

What should be considered is to create one searcher per writer (lazily once it is needed for the first time) and close it when writer is closed.

This can be currently workarounded by copying the update_document logic and constructing searcher outside. For example I've done this in WeblateOrg/weblate@91cb357

@nijel nijel changed the title The update_document performs poorely if called multiple times The update_document performs poorly if called multiple times Jan 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant