From 85f125ad44d97c12590ded741ec809dbdadbac94 Mon Sep 17 00:00:00 2001 From: Ben Eggers <64657842+beggers@users.noreply.github.com> Date: Fri, 13 Oct 2023 08:26:35 -0700 Subject: [PATCH] Observability cip (#1219) ## Description of changes - New functionality - A CIP proposing adding OpenTelemetry observability to Chroma. ## Test plan *How are these changes tested?* - [x] Tests pass locally with `pytest` for python, `yarn test` for js --- docs/CIP_6_OpenTelemetry_Monitoring.md | 41 ++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) create mode 100644 docs/CIP_6_OpenTelemetry_Monitoring.md diff --git a/docs/CIP_6_OpenTelemetry_Monitoring.md b/docs/CIP_6_OpenTelemetry_Monitoring.md new file mode 100644 index 00000000000..a212b8e6c49 --- /dev/null +++ b/docs/CIP_6_OpenTelemetry_Monitoring.md @@ -0,0 +1,41 @@ +# CIP 6: OpenTelemetry Monitoring + +## **Status** + +Current status: `Under Discussion` + +## **Motivation** + +Chroma currently has very little observability, only offering basic logging. Using Chroma in a high-performance production context requires the ability to understand how Chroma is behaving and responding to requests. + +## **Public Interfaces** + +The changes will affect the following: + +- Logging output +- Several new CLI flags + +## **Proposed Changes** + +We propose to instrument Chroma with [OpenTelemetry](https://opentelemetry.io/docs/instrumentation/python/) (OTel), the most prevalent open-source observability standard. OTel's Python libraries are considered stable for traces and metrics. We will create several layers of observability, configurable with command-line flags. + +- Chroma's default behavior will remain the same: events will be logged to the console with configurable severity levels. +- We will add a flag, `--opentelemetry-mode={api, sdk}` to instruct Chroma to export OTel data in either [API or SDK mode](https://stackoverflow.com/questions/72963553/opentelemetry-api-vs-sdk). +- We will add another flag, `--opentelemtry-detail={partial, full}`, to specify the level of detail desired from OTel. + - With `partial` detail, Chroma's top-level API calls will produce a single span. This mode is suitable for end-users of Chroma who are not intimately familiar with its operation but use it as part of their production system. + - `full` detail will emit spans for Chroma's sub-operations, enabling Chroma maintainers to monitor performance and diagnose issues. +- For now Chroma's OTel integrations will need to be specified with environment variables. As the [OTel file configuration project](https://github.com/MrAlias/otel-schema/pull/44) matures we will integrate support for file-based OTel configuration. + +## **Compatibility, Deprecation, and Migration Plan** + +This change adds no new default-on functionality. + +## **Test Plan** + +Observability logic and output will be tested on both single-node and distributed Chroma to confirm that metrics are exported properly and traces correctly identify parent spans across function and service boundaries. + +## **Rejected Alternatives** + +### Prometheus metrics + +Prometheus metrics offer similar OSS functionality to OTel. However the Prometheus standard is older and belongs to a single open-source project; OTel is designed for long-term cross-compatibility between *all* observability backends. As such, OTel output can easily be ingested by Prometheus users so there is no loss of functionality or compatibility. \ No newline at end of file