Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment with different ways to represent datasets #30

Open
PGijsbers opened this issue Jul 25, 2024 · 0 comments
Open

Experiment with different ways to represent datasets #30

PGijsbers opened this issue Jul 25, 2024 · 0 comments
Milestone

Comments

@PGijsbers
Copy link
Member

PGijsbers commented Jul 25, 2024

To embed datasets and store them in our vector database, we are first converting them to documents (read: strings). This can be done in many different ways, and we have a lot of metadata of the dataset (title, description, data itself, qualities, features, ...). How to best "textify" the dataset and its metadata in a string to improve its discovery during the semantic seach is an open question.

@PGijsbers PGijsbers added this to the Search milestone Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant