Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vectorstores: add mongovector #1005

Merged
merged 9 commits into from
Sep 13, 2024
Merged

vectorstores: add mongovector #1005

merged 9 commits into from
Sep 13, 2024

Conversation

prestonvasquez
Copy link
Contributor

@prestonvasquez prestonvasquez commented Sep 6, 2024

resolves #700
GODRIVER-3305

The goal of this PR is to provide a way for users to read and write to an Atlas cluster as a vector database using the MongoDB Go Driver.

A Store should be a wrapper for mongo.Collection, since adding and searching vectors is collection-specific. In this case AddDocuments() and SimilaritySearch() become analogues for Collection.InsertMany() and Collection.Aggregate(). This also keeps our implementation inline with the Python driver implementation.

Because a single collection can have multiple vector search indexes, we must provide an option to use a specific embedding when adding a document. However, to ensure there is at least a default behavior [where a user does not have to include an optional embedder] we should require that a default embedder be included at construction.

We've chosen to use vectorstores.Options.NameSpace to allow users to change the index on a per-operation basis.

This PR suggests mocking the embedding model to the specifications outlined in this blog post: https://dev.to/prestonvasquez/mocking-an-llm-embedder-targeting-mongodb-atlas-1glp

Integration testing will require setting up a free tier Atlas Cluster. The first time the tests run, the vector search indexes will be created. This takes a few minutes. Subsequent tests to the same cluster will not have this requirement.

PR Checklist

  • Read the Contributing documentation.
  • Read the Code of conduct documentation.
  • Name your Pull Request title clearly, concisely, and prefixed with the name of the primarily affected package you changed according to Good commit messages (such as memory: add interfaces for X, Y or util: add whizzbang helpers).
  • Check that there isn't already a PR that solves the problem the same way to avoid creating a duplicate.
  • Provide a description in this PR that addresses what the PR is solving, or reference the issue that it solves (e.g. Fixes #123).
  • Describes the source of new concepts.
  • References existing implementations as appropriate.
  • Contains test coverage for new functions.
  • Passes all golangci-lint checks.

@prestonvasquez prestonvasquez marked this pull request as ready for review September 9, 2024 22:58
@prestonvasquez
Copy link
Contributor Author

prestonvasquez commented Sep 9, 2024

@prestonvasquez
Copy link
Contributor Author

@tmc @eliben This is ready for review.

@tmc
Copy link
Owner

tmc commented Sep 12, 2024

Sorry about the delay here, I'll very likely be able to review this today and have been meaning to cut a new release so this should make it in soon!

@tmc
Copy link
Owner

tmc commented Sep 13, 2024

Fantastic change! I'm curious what the plan around the go.mongodb.org/mongo-driver/v2 beta tag is -- I don't love depending on a not-declared-stable package version.

@@ -0,0 +1,207 @@
package mongovector
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move these mocks into a separate subpackage (mocks? mongovectormocks?)

Copy link
Owner

@tmc tmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM! have some small nits/asks.

Can we get an example (or two? or three?) together to showcase this?

@@ -0,0 +1,250 @@
package mongovector
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please include a doc.go file in new packages with a package-level comment

"go.mongodb.org/mongo-driver/v2/mongo/options"
)

// Run the test without setting up the test space.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is/are the implications of this flag in terms of latency?

flag.Parse()

defer func() {
os.Exit(m.Run())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is pretty odd IMO, can we skip this defer and just wrap the setup in a conditional vs the early return?

@tmc
Copy link
Owner

tmc commented Sep 13, 2024

we should also get 'go test' to just work via a testcontainer container

@tmc
Copy link
Owner

tmc commented Sep 13, 2024

we should also get 'go test' to just work via a testcontainer container

Oh I didn't realize this was leveraging a feature not in the F/OSS version of mongo -- is there a reasonable way to fake this so we don't take on a network dependency in CI?

I see now that you gave a good amount of that context above. I'm happy getting this in with these tests skipped and we chat about how to improve that.

@tmc tmc enabled auto-merge (squash) September 13, 2024 02:59
@tmc tmc merged commit 66d7710 into tmc:main Sep 13, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integration with mongodb for vectorstore
2 participants