-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Generate IDs when not given in add #2699
[ENH] Generate IDs when not given in add #2699
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
Please tag your PR title with one of: [ENH | BUG | DOC | TST | BLD | PERF | TYP | CLN | CHORE]. See https://docs.trychroma.com/contributing#contributing-code-and-ideas |
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. Join @spikechroma and the rest of your teammates on Graphite |
4a5b473
to
6eb54ca
Compare
d5d1da4
to
68f2e25
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed offline about moving the ID generation down into the server logic, and whether that would mean having to move the validators further down.
I think the right thing is to move both down into the server; we may want to make the validation logic change before making this one. Let's discuss further what the right answer might be.
chromadb/api/types.py
Outdated
@@ -292,6 +296,10 @@ def validate_ids(ids: IDs) -> IDs: | |||
for id_ in ids: | |||
if not isinstance(id_, str): | |||
raise ValueError(f"Expected ID to be a str, got {id_}") | |||
|
|||
if len(id_) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the fence about this still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can discuss offline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The empty string is a valid identifier, and would still be unique - I have no idea why anyone would want to do this, but let's let them.
@@ -181,7 +181,7 @@ def _unpack_embedding_set( | |||
|
|||
def _validate_embedding_set( | |||
self, | |||
ids: IDs, | |||
ids: Optional[IDs], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think ids
can be Optional
here, since validate_ids
fails on None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true. however, if we are generating ids on the server level, ids should be allowed to be optional here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed offline about moving the ID generation down into the server logic, and whether that would mean having to move the validators further down.
I think the right thing is to move both down into the server; we may want to make the validation logic change before making this one. Let's discuss further what the right answer might be.
68f2e25
to
1872593
Compare
5364f1e
to
64645ef
Compare
chromadb/test/property/strategies.py
Outdated
@@ -458,17 +458,26 @@ def recordsets( | |||
num_unique_metadata: Optional[int] = None, | |||
min_metadata_size: int = 0, | |||
max_metadata_size: Optional[int] = None, | |||
# ids can only be optional for add operations | |||
for_add: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this parameter name is confusing and underspecified - can we clarify it or somehow make this less option-heavy?
dc1bb7e
to
6d4bf70
Compare
ba22f97
to
303d36c
Compare
chromadb/server/fastapi/__init__.py
Outdated
@staticmethod | ||
def _generate_ids_when_not_present( | ||
ids: Optional[List[str]], | ||
documents: Optional[List[Optional[str]]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why do you need to pass docs/urs/embeddings into this, it makes more sense to me that this would take the value N that it needs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this change is partways (or if I have missed something), but as is this change can't be merged as its broken on local (not client/server or "single-node") chroma. Putting the id generation in server/fastapi is not the right way to get this to be uniform across local/deployed single node. We should put the change at the level of "ServerAPI" so that it runs at the conceptual server layer. I am happy to go over the code architecture tomorrow to make this clearer / discuss more.
I also think we need more testing against this functionality since its additive and doesn't break existing tests. The property test modifications are good, and I think basic point unit tests are warranted too.
This stack addresses #164 in part, where we agreed validation duplication is preferred. |
303d36c
to
c0a537a
Compare
c0a537a
to
84c08d4
Compare
47ac7ef
to
05c8b6e
Compare
cea408e
to
4e27cd5
Compare
d676e9b
to
b726474
Compare
Description of changes
Summarize the changes made by this PR.
Test plan
How are these changes tested?
pytest
for python,yarn test
for js,cargo test
for rustDocumentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs repository?