New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add Vespa and rework Document Indices #317

Merged

yuhongsun96 merged 11 commits into main from index-consolidation

Aug 24, 2023

Contributor

yuhongsun96 commented Aug 19, 2023

No description provided.

vercel bot commented Aug 19, 2023 •

edited

Loading

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
internal-search	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Aug 24, 2023 6:39am

yuhongsun96 requested a review from Weves

August 19, 2023 04:27

vercel bot deployed to Preview

August 19, 2023 04:28

View deployment

vercel bot deployed to Preview

August 19, 2023 04:30

View deployment

yuhongsun96 marked this pull request as draft

August 19, 2023 04:30

vercel bot deployed to Preview

August 19, 2023 05:50

View deployment

vercel bot deployed to Preview

August 23, 2023 19:18

View deployment

yuhongsun96 force-pushed the index-consolidation branch from be1ad24 to 858ccda Compare

August 23, 2023 19:25

vercel bot deployed to Preview

August 23, 2023 19:25

View deployment

yuhongsun96 marked this pull request as ready for review

August 23, 2023 19:31

vercel bot deployed to Preview

August 23, 2023 19:32

View deployment

Weves reviewed

View reviewed changes

backend/alembic/versions/8aabb57f3b49_restructure_document_indices.py Outdated

    
              def upgrade() -> None:

                  op.drop_table("chunk")

                  op.drop_index(

Contributor

Weves Aug 23, 2023

Hmm, why is this here?

backend/alembic/versions/8aabb57f3b49_restructure_document_indices.py Outdated

    
              def downgrade() -> None:

                  op.create_index(

Contributor

Weves Aug 23, 2023

Same as above

backend/danswer/configs/app_configs.py

    
              DOCUMENT_INDEX_NAME = "danswer_index"  # Shared by vector/keyword indices

              # Vespa is now the default document index store for both keyword and vector

              DOCUMENT_INDEX_TYPE = os.environ.get(

                  "DOCUMENT_INDEX_TYPE", DocumentIndexType.SPLIT.value

Contributor

Weves Aug 23, 2023

is this intended? Seems to contradict the comment above

Contributor Author

yuhongsun96 Aug 24, 2023

Originally this PR was suppose to swap to vespa but its separate now

backend/danswer/datastores/qdrant/store.py

    
                              chunk_ids = _get_points_from_document_ids(

                                  doc_id_batch, self.collection, self.client

                              )

                              self.client.set_payload(

Contributor

Weves Aug 23, 2023

Note: we will be potentially updating quite a lot more than _BATCH_SIZE points here. Is that intended?

Contributor Author

yuhongsun96 Aug 24, 2023

_BATCH_SIZE is quite conservative, and this is a tiny request in terms of packet size so I figure it shouldn't be an issue. Plus this won't really be used anymore anyway...

backend/danswer/datastores/qdrant/store.py

    
                          chunk_ids = _get_points_from_document_ids(

                              doc_id_batch, self.collection, self.client

                          )

                          self.client.delete(

Contributor

Weves Aug 23, 2023

same comment as above

backend/danswer/datastores/vespa/store.py Show resolved Hide resolved

backend/danswer/datastores/vespa/store.py

    
                      if should_delete_doc:

                          # Processing the first chunk of the doc and the doc exists

                          deletion_success = _delete_vespa_doc_chunks(document.id)

                          if not deletion_success:

Contributor

Weves Aug 23, 2023

nit: prefer to fail loudly in case that we fail to delete (plus add retries to delete)

backend/danswer/datastores/vespa/store.py

    
                      for update_request in update_requests:

                          if update_request.boost is None and update_request.allowed_users is None:

                              logger.error("Update request received but nothing to update")

Contributor

Weves Aug 23, 2023

should continue here?

backend/danswer/datastores/vespa/store.py

    
                                  url = f"{DOCUMENT_ID_ENDPOINT}/{doc_chunk_id}"

                                  res = requests.put(url, headers=json_header, json=update_dict)

                                  if res.status_code != 200:

Contributor

Weves Aug 23, 2023

also fail loudly here if update fails to avoid marking the delete as successful without actually updating all permissions

backend/danswer/datastores/vespa/store.py

    
                  def delete(self, doc_ids: list[str]) -> None:

                      logger.info(f"Deleting {len(doc_ids)} documents from Vespa")

                      for doc_id in doc_ids:

Contributor

Weves Aug 23, 2023

this should probably also use _delete_vespa_doc_chunks?

yuhongsun96 added 11 commits

August 23, 2023 23:28


          document indices not fixed yet

face76d


          mypy issues fixed, logic still wrong

208f1fe


          test time

c650e08


          tiny cleanup

39c9c47


          more tidy up

ac68cda


          works

46c956f


          need verify delete

985cdab


          PR time

c37a3b0


          rebase and cleaned up

9cbf25d


          newlines

a37cace


          pr changes

8a83f83

yuhongsun96 force-pushed the index-consolidation branch from 9e2477f to 8a83f83 Compare

August 24, 2023 06:38

vercel bot deployed to Preview

August 24, 2023 06:39

View deployment

yuhongsun96 merged commit 8159fdc into main

4 of 5 checks passed

yuhongsun96 deleted the index-consolidation branch

August 24, 2023 15:46

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet