Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path is not handled per the specification for OTLP/HTTP #94

Open
smoyer64 opened this issue Nov 27, 2023 · 5 comments
Open

Path is not handled per the specification for OTLP/HTTP #94

smoyer64 opened this issue Nov 27, 2023 · 5 comments
Labels
status: help wanted Seeking more eyes and hands. type: bug Something isn't working

Comments

@smoyer64
Copy link
Contributor

Per the OpenTelemetry exporter specifications, the OTEL_EXPORTER_OTLP_ENDPOINT environment variable as well as the more specific OTEL_EXPORTER_OTLP_TRACES_ENDPOINT, OTEL_EXPORTER_OTLP_METRICS_ENDPOINT and OTEL_EXPORTER_OTLP_LOGS_ENDPOINT environment variables, are URLs. The specification for the configuration options reads as follows:

  • Endpoint (OTLP/HTTP): Target URL to which the exporter is going to send spans or metrics. The endpoint MUST be a valid URL with scheme (http or https) and host, MAY contain a port, SHOULD contain a path and MUST NOT contain other parts (such as query string or fragment). A scheme of https indicates a secure connection. When using OTEL_EXPORTER_OTLP_ENDPOINT, exporters MUST construct per-signal URLs as described below. The per-signal endpoint configuration options take precedence and can be used to override this behavior (the URL is used as-is for them, without any modifications). See the OTLP Specification for more details.

The specification for OTLP/HTTP endpoint URLs also contains a section that provides specific instructions for how the paths should be constructed if they're missing. Since this specification is polyglot, there is no discussion about how the opentelemetry-go library might effect these behaviors.

Now for the problem ... the otlpmetrichttp.WithEndpoint() and otlptracehttp.WithEndpoint() Option functions accept host or host:port strings, not URLs. At some point, the otelconfig library was adapted to strip the URL Scheme but the Path is not processed at all. Since the otelconfig library uses the these functions whether the value is provided by an Option or an environment variable, the Path is included and the error occurs when it's parsed as part of the Port string.

There are three ways this could be solved:

  1. Process the port per the rules in the specification and call otlpmetrichttp.WithURLPath() and/or otlptracehttp.WithURLPath() as needed with the exporter(s) are instantiated. In this case, the WithExporterEndpoint() function would continue to take a URL.

  2. Add a WithExporterURLPath(), WithMetricsExporterURLPath() and WithTracesExporterURLPath() to mimic the use of the underlying opentelemetry-go library. The environment variables would still have to be parsed but the rules for sending paths are pretty clear.

  3. If the generic or signal-specific end-points are provided via environment variables, don't add WithEndpoint(), WithInsecure(), WithTLSClientConfig() and WithURLPath() to the options when constructing the exporters. Since the otelconfig environment variables for the endpoints have the same names as those in the opentelemetry-go library, let that library do the work! For those that are configuring OTEL programmatically, the functions in bullet two above would still be beneficial.

Feel free to assign this to me if you can provide guidance on which of the options above would be best.

Versions

v1.13.0

Steps to reproduce

The following test provided at https://github.com/selesy/otel-config-go/blob/99aa2a853854e0e2eae464bd34d89cc4a7c08d79/otelconfig/pipelines/metrics_test.go#L10-L19 fails with the message:

parse "https://https:%2F%2Fotlp-gateway-prod-us-east-0.grafana.net%2Fotlp/v1/metrics": invalid port ":%2F%2Fotlp-gateway-prod-us-east-0.grafana.net%2Fotlp" after host

Additional context

There is a real-world example where this fails. If a developer is testing against Grafana cloud and using this service as the OTEL collector, the generic end-point URL has the path /otlp as described in the documentation at https://grafana.com/docs/grafana-cloud/send-data/otlp/send-data-otlp/#push-directly-from-applications-using-the-opentelemetry-sdks. This is a really convenient way to test that an application is fully instrumented without running the services locally.

@smoyer64 smoyer64 added the type: bug Something isn't working label Nov 27, 2023
@robbkidd
Copy link
Member

Hi, @smoyer64! Thanks for finding and opening this issue!

We're inclined philosophically towards Option 3: let upstream logic figure out as much auto-config as possible. I agree that Option 2 is valuable for developers configuring the SDK programmatically. That could be included in a PR to close this issue or in a follow-up PR.

In reviewing this issue and what it would take to implement Option 3, I have some vague concerns about how to add this better and correct behavior while maintaining backwards compatibility for the current users of otelconfig.

@robbkidd robbkidd added the status: oncall Flagged for awareness from Honeycomb Telemetry Oncall label Nov 30, 2023
@smoyer64
Copy link
Contributor Author

smoyer64 commented Dec 4, 2023

I think it's pretty easy to accomplish this without breaking existing users - how confident are we that the unit and smoke tests cover the current user scenarios? My very crudely patched version simply adds the WithURLPath() option when the exporter is created. Per the specification, if this value isn't set, then /v1/<signal> is used. So I simply set it to /otlp/v1/<signal>.

The bigger issue is that the ENDPOINT environment variables are supposed to be URLs but the WithEndpoint() options are supposed to be host:ports. I think that a host:port can be parsed as a URL, so one option for keeping the existing behavior (is this #4?). Is to process environment variables if they exist, break them into pieces and set all the values using options. The problem I had with certain combinations of environment variables is that, for those that use the same name in otelconfig and the Go OTEL SDK, the SDK also reads the variables - you can't do something other than what's expected in otelconfig without creating conflicts.

@JamieDanielson
Copy link
Contributor

Sorry for the delay on this.

If possible we'd like to keep the existing functions to avoid any possible breakage for current users of the library. If we can use the existing functions but extend them to also be able to handle a more spec-compliant provided endpoint, we would prefer that option.

If that's not possible, then it may be best to add additional spec-compliant options but I think we want to avoid adding too much extra code surface at this time.

Feel free to open a PR and we can take a look at it. Thanks!

@JamieDanielson JamieDanielson added status: help wanted Seeking more eyes and hands. and removed status: oncall Flagged for awareness from Honeycomb Telemetry Oncall labels Jan 10, 2024
@smoyer64
Copy link
Contributor Author

I looked at this further and believe that's possible but I also re-read the specification and am leaning towards creating a fork with that doesn't intercept environment variables at all with the exception of those that are required to actually choose the signal and protocol. There's a lot of duplication (causing conflicts with the specification) between this library and the underlying library. According the the specification, an environment variable should only be overridden if a value is provided by an option.

This library currently incorrectly interprets the specified environment variables and then sets options based on that incorrect interpretation. This leads to instances where you can set environment variables in a way that should be valid according to go-opentelemetry but fails to parse in otel-config-go. And in some cases, those same variables can be set in a way that's acceptable to this library and then fails in go-opentelemetry.

I'd argue that even breaking changes are worth it in the long run if it aligns the behavior of this library and the underlying OTEL library. How would you feel about a v2.0?

@MikeGoldsmith MikeGoldsmith self-assigned this Jan 16, 2024
@MikeGoldsmith
Copy link
Contributor

MikeGoldsmith commented Jan 19, 2024

Hey @smoyer64 - thanks for your help and thoughts on this.

I'm not against a v2 but would prefer not to where possible. v2+ go packages has very mixed success.

The intent of this package was to add an easy to use configuration layer on top the OTel SDK because it was not the easiest to use. We did offer to donate this package to OTel but it didn't quite happen for a few reasons.

With the aim of making this library ease configuration, I see deviations from spec and unintuitive configuration outcomes as bugs. I would love to look into if we could fix the library, and what potential breaking changes that might incur.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: help wanted Seeking more eyes and hands. type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants