Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

community: CrateDB: Vector Store #27710

Closed

Conversation

amotl
Copy link
Contributor

@amotl amotl commented Oct 29, 2024

About

Status

  • We are considering the patch ready for review and merging, with a few spots to be handled on a later iteration.
  • Please let us know if you want to see any other details to be addressed before the initial merge.
  • A few backlog items have been collected here: Backlog for GA crate-workbench/langchain#30.

Sandbox

A little walkthrough how to exercise the software tests on your workstation.

docker run --rm -it --name=cratedb \
  --publish=4200:4200 --publish=5432:5432 --env=CRATE_HEAP_SIZE=2g \
  crate:latest -Cdiscovery.type=single-node
git clone https://github.com/crate-workbench/langchain.git --branch=cratedb-up/1/vector-store
cd langchain
uv venv
source .venv/bin/activate
cd libs/community
uv pip install --upgrade --prerelease=allow --editable=. poetry sqlalchemy-cratedb
poetry install --no-interaction --no-ansi --with dev,test,test_integration
pytest -vvv tests/integration_tests/vectorstores/test_cratedb.py

Trivia

The CrateDB implementation is heavily based on PGVector's, with a few adjustments. Previous generalizations and improvements to PGVector have been submitted the other day already.

Copy link

vercel bot commented Oct 29, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchain ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 20, 2024 4:38pm

@amotl amotl force-pushed the cratedb-up/1/vector-store branch 2 times, most recently from d13f281 to 46750b6 Compare October 29, 2024 20:46
Copy link

@surister surister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@amotl amotl force-pushed the cratedb-up/1/vector-store branch 2 times, most recently from 7ff1319 to 1ee02dd Compare November 11, 2024 04:55
Copy link

@BaurzhanSakhariev BaurzhanSakhariev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@amotl
Copy link
Contributor Author

amotl commented Nov 18, 2024

Hi @eyurtsev. We think our patches are ready to be merged. This one, and GH-27711 as well as GH-27712. May we humbly ask you to have a look?

Copy link

@kneth kneth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@amotl amotl force-pushed the cratedb-up/1/vector-store branch from 1ee02dd to 16ca3be Compare November 20, 2024 16:20
Before, the adapter used CrateDB's built-in `_score` field for ranking.
Now, it uses the dedicated `vector_similarity()` function to compute the
similarity between two vectors.
We don't need anything on top of it, ie we don't need this function and
instead should use value from CrateDB as is.

Similarity is already in the (0,1] interval and dividing by math.sqrt(2)
won't normalize it but return wrong result, for example 1 will become
0.714.
@amotl
Copy link
Contributor Author

amotl commented Dec 9, 2024

Dear @eyurtsev,

may I humbly ask you if you could afford a few cycles to review our patches? Thanks in advance!

With kind regards,
Andreas.

@efriis efriis self-assigned this Dec 12, 2024
@efriis
Copy link
Member

efriis commented Dec 12, 2024

Hey! This adds a net-new community integration or feature, which has been replaced by dedicated integration packages. I'll close this PR if it's ok with you, and would recommend reopening with just docs updates, as well as registering your package in libs/packages.yml! We'll be able to review simple PRs that only modify these two things much faster :)

Here's the guide, and if you have questions, feel free to leave them in the comments on those pages so others can see them! https://python.langchain.com/docs/contributing/how_to/integrations/

This will pair very nicely with the variety of integrations you're working on at the moment!

Will leave this PR open to discuss and link back here when closing the other ones (to make sure we're discussing in one place)

@amotl
Copy link
Contributor Author

amotl commented Dec 12, 2024

Hi Erick,

thanks for your reply. So, we will conceive and publish a dedicated Python package langchain-cratedb then, including the relevant code? Sure, we can do.

Will leave this PR open to discuss and link back here when closing the other ones (to make sure we're discussing in one place)

I think it would be coherent to also close this PR, and then discuss on behalf of a separate dedicated issue to accompany the genesis of langchain-cratedb, where we are very much looking forward to, receiving your assistance on any question we may have along the way. 🍀

I will open the other issue when it is time to start the discussion, i.e. when we have something to show that starts working. Do you agree with this approach?

With kind regards,
Andreas.

@efriis
Copy link
Member

efriis commented Dec 13, 2024

sounds like a great approach!

@efriis efriis closed this Dec 13, 2024
ccurme pushed a commit that referenced this pull request Dec 23, 2024
…"provider" documentation (#28877)

Hi Erick. Coming back from a previous attempt, we now made a separate
package for the CrateDB adapter, called `langchain-cratedb`, as advised.
Other than registering the package within `libs/packages.yml`, this
patch includes a minimal amount of documentation to accompany the advent
of this new package. Let us know about any mistakes we made, or changes
you would like to see. Thanks, Andreas.

## About
- **Description:** Register a new database adapter package,
`langchain-cratedb`, providing traditional vector store, document
loader, and chat message history features for a start.
- **Addressed to:** @efriis, @eyurtsev
- **References:** GH-27710
- **Preview:** [Providers » More »
CrateDB](https://langchain-git-fork-crate-workbench-register-la-4bf945-langchain.vercel.app/docs/integrations/providers/cratedb/)

## Status
- **PyPI:** https://pypi.org/project/langchain-cratedb/
- **GitHub:** https://github.com/crate/langchain-cratedb
- **Documentation (CrateDB):**
https://cratedb.com/docs/guide/integrate/langchain/
- **Documentation (LangChain):** _This PR._

## Backlog?
Is this applicable for this kind of patch?
> - [ ] **Add tests and docs**: If you're adding a new integration,
please include
> 1. a test for the integration, preferably unit tests that do not rely
on network access,
> 2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

## Q&A
1. Notebooks that use the LangChain CrateDB adapter are currently at
[CrateDB LangChain
Examples](https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llm-langchain),
and the documentation refers to them. Because they are derived from very
old blueprints coming from LangChain 0.0.x times, we guess they need a
refresh before adding them to `docs/docs/integrations`. Is it applicable
to merge this minimal package registration + documentation patch, which
already includes valid code snippets in `cratedb.mdx`, and add
corresponding notebooks on behalf of a subsequent patch later?

2. How would it work getting into the tabular list of _Integration
Packages_ enumerated on the [documentation entrypoint page about
Providers](https://python.langchain.com/docs/integrations/providers/)?

/cc Please also review, @ckurze, @wierdvanderhaar, @kneth,
@simonprickett, if you can find the time. Thanks!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Related to langchain-community size:XXL This PR changes 1000+ lines, ignoring generated files. Ɑ: vector store Related to vector store module
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants