Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Further posthog improvements (and a little .gitignore) (#1222)
## Description of changes *Summarize the changes made by this PR.* - Improvements & Bug fixes - Increase add() batch size. - Flatten server context so we can do things like group by version. - Add a few things to `.gitignore` which seem to be created by test runs. - New functionality - Batch query() calls (batch size = 20, where we started for add()). - Change `collection.get()` so its fields are `int` instead of `bool` since I imagine we'll eventually batch it as well. ## Test plan *How are these changes tested?* - [x] Tests pass locally with `pytest` for python, `yarn test` for js - [x] Tested locally by printing posthog events: ```python >>> import chromadb >>> chroma_client = chromadb.Client() bf9b885c-b86e-4194-97b4-d9701d293cce ClientStartEvent {'batch_size': 1, 'chroma_version': '0.4.14', 'server_context': 'None', 'chroma_api_impl': 'chromadb.api.segment.SegmentAPI', 'is_persistent': False, 'chroma_server_ssl_enabled': False} >>> collection = chroma_client.create_collection(name="my_collection") bf9b885c-b86e-4194-97b4-d9701d293cce ClientCreateCollectionEvent {'batch_size': 1, 'collection_uuid': '50de2cd2-06ba-442d-82c1-ae0d94b620e4', 'embedding_function': 'ONNXMiniLM_L6_V2', 'chroma_version': '0.4.14', 'server_context': 'None', 'chroma_api_impl': 'chromadb.api.segment.SegmentAPI', 'is_persistent': False, 'chroma_server_ssl_enabled': False} >>> collection.add( ... documents=["This is a document", "This is another document"], ... metadatas=[{"source": "my_source"}, {"source": "my_source"}], ... ids=["id1", "id2"] ... ) bf9b885c-b86e-4194-97b4-d9701d293cce CollectionAddEvent {'batch_size': 1, 'collection_uuid': '50de2cd2-06ba-442d-82c1-ae0d94b620e4', 'add_amount': 2, 'with_documents': 2, 'with_metadata': 2, 'chroma_version': '0.4.14', 'server_context': 'None', 'chroma_api_impl': 'chromadb.api.segment.SegmentAPI', 'is_persistent': False, 'chroma_server_ssl_enabled': False} >>> for i in range(41): ... results = collection.query( ... query_texts=["This is a query document"], ... n_results=2 ... ) ... bf9b885c-b86e-4194-97b4-d9701d293cce CollectionQueryEvent {'batch_size': 1, 'collection_uuid': '50de2cd2-06ba-442d-82c1-ae0d94b620e4', 'query_amount': 1, 'with_metadata_filter': 1, 'with_document_filter': 1, 'n_results': 2, 'include_metadatas': 1, 'include_documents': 1, 'include_distances': 1, 'chroma_version': '0.4.14', 'server_context': 'None', 'chroma_api_impl': 'chromadb.api.segment.SegmentAPI', 'is_persistent': False, 'chroma_server_ssl_enabled': False} bf9b885c-b86e-4194-97b4-d9701d293cce CollectionQueryEvent {'batch_size': 20, 'collection_uuid': '50de2cd2-06ba-442d-82c1-ae0d94b620e4', 'query_amount': 20, 'with_metadata_filter': 20, 'with_document_filter': 20, 'n_results': 40, 'include_metadatas': 20, 'include_documents': 20, 'include_distances': 20, 'chroma_version': '0.4.14', 'server_context': 'None', 'chroma_api_impl': 'chromadb.api.segment.SegmentAPI', 'is_persistent': False, 'chroma_server_ssl_enabled': False} bf9b885c-b86e-4194-97b4-d9701d293cce CollectionQueryEvent {'batch_size': 20, 'collection_uuid': '50de2cd2-06ba-442d-82c1-ae0d94b620e4', 'query_amount': 20, 'with_metadata_filter': 20, 'with_document_filter': 20, 'n_results': 40, 'include_metadatas': 20, 'include_documents': 20, 'include_distances': 20, 'chroma_version': '0.4.14', 'server_context': 'None', 'chroma_api_impl': 'chromadb.api.segment.SegmentAPI', 'is_persistent': False, 'chroma_server_ssl_enabled': False} >>> for i in range(100): ... collection.add(documents=[str(i)], ids=[str(i)]) ... bf9b885c-b86e-4194-97b4-d9701d293cce CollectionAddEvent {'batch_size': 100, 'collection_uuid': '50de2cd2-06ba-442d-82c1-ae0d94b620e4', 'add_amount': 100, 'with_documents': 100, 'with_metadata': 0, 'chroma_version': '0.4.14', 'server_context': 'None', 'chroma_api_impl': 'chromadb.api.segment.SegmentAPI', 'is_persistent': False, 'chroma_server_ssl_enabled': False} ``` Also confirmed that `collection.get()` spits out an event every time it's called. It also spits out a bunch of data so I'll elide it from this PR description. ## Documentation Changes *Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the [docs repository](https://github.com/chroma-core/docs)?* No docs change needed -- we're not collecting anything new or changing anything significant about how we collect.
- Loading branch information