Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding observability support #61

Closed
calebschoepp opened this issue Apr 2, 2024 · 9 comments
Closed

Adding observability support #61

calebschoepp opened this issue Apr 2, 2024 · 9 comments
Labels
enhancement New feature or request

Comments

@calebschoepp
Copy link
Contributor

Observability in shim

Spin 2.4 introduced basic OpenTelemetry observability support to Spin. More specifically it added support for the tracing OTEL signal. Traces are emitted when apps are triggered or host components are called.

I would like containerd-shim-spin to have this observability support too. We must use the spin-telemetry crate to setup this support:

let _telemetry_guard = spin_telemetry::init(version!().to_string())?;

Under the hood spin_telemetry::init is setting up a tracing subscriber to put logs out to STDERR and traces to a configured OTEL exporter.

The integration doesn't work

Out of the box integrating spin-telemetry with containerd-shim-spin panics:

$ sudo ctr run --rm --net-host  --env RUST_BACKTRACE=full --env OTEL_EXPORTER_OTLP_ENDPOINT=http://host.lima.internal:4318 --runtime io.containerd.spin.v2 ghcr.io/radu-matei/perftest:v1 perf bogus-arg
thread '<unnamed>' panicked at /home/caleb.linux/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tracing-subscriber-0.3.18/src/util.rs:91:14:
failed to set global default subscriber: SetLoggerError(())
stack backtrace:
   0:     0xaaaab728c600 - <unknown>
< snipped>
  15:     0xaaaab59469c0 - <unknown>

After lots of digging (and help from @kate-goldenring, @vdice, and @radu-matei 🙏) it was determined that the issue is that we are trying to set a global logger twice. The obvious place where it is here in the containerd-shim. The less obvious place where it is set is here in tracing-subscriber.

It turns out that spin-telemetry uses tracing-subscriber which by default has a tracing-log feature turned on. This tracing-log feature turns on a compatability layer where tracing will pick up any log messages and print them as tracing events. The tracing-log feature inevitably boils down to configuring a global logger.

It should also be noted that the trigger-sqs crate also depends on tracing-subscriber and configures a global logger.

Possible fixes

Turn off the containerd-shim logger

One obvious fix is to just turn off the containerd-shim logger:

let shim_config = Config {
    default_log_level: "error".to_string(),
    no_setup_logger: true,
    ..Default::default()
};

This works, but has the obvious problem that we aren't able to get any logs. Oddly enough this approach does have some of the containerd-wasm-shim logs picked up by the tracing subscriber that spin-telemetry uses. In other words some of the shim logs are emitted as app logs.

An example of this approach is found in #60.

Disable tracing-log feature in upstream crates

As seen in this Spin PR and this trigger-sqs PR we can turn off the tracing-log feature and we don't get the panic. This seems like a good fix, but has a key issue. It turns out Spin relies on the tracing-log feature to pick up any logs emitted from within Spin or in upstream dependencies via tracing::log::* or log::*. To simplify if we turn off the feature we are going to be losing some logging from Spin that we don't want to lose.

An example of this approach is found in #59.

Convert all of containerd-shim-spin, runwasi, and containerd-shim to use tracing instead of log

This is extremely far fetched, but I thought it was worth mentioning b/c it would technically solve our problem.

Another option I'm not thinking of

I'm all ears 👂

Going forward

I've been working on this for awhile and feel like I have hit an impasse. I am only trying to integrate observability into the shim, but it feels like I have tapped into a much larger issue of logging in the shim. I would appreciate input and direction from the maintainers to help drive this forward.

@Mossaka
Copy link
Member

Mossaka commented Apr 2, 2024

Hey thanks for writting up the detailed report of the issue and I really appreciate the assiduous work you've done on making spin more observable.

fwiw, I did some investigation on adding opentelemtry and tracing to runwasi: containerd/runwasi#10, so the last point about converting all upstream projects to use tracing isn't far fetched and I believe that's something we should do. With that said, I'd love to collaborate with you all on this work and I am happy to contribute to both runwasi and rust-extensions to add tracing support. I am happy to talk about how we can parallize the work if you're interested 🙌.

@squillace
Copy link

I'd love standardizing shims on runwasi solution! Let's do make sure we ping owners of the downstream shims to let them know and get their feedback.

@radu-matei
Copy link
Member

Thanks for the write-up, @calebschoepp.
@Mossaka awesome to hear the openness to collaborate on this!

Zooming in on one of the things you mentioned there, @calebschoepp:

In other words some of the shim logs are emitted as app logs.

Does that mean we could surface some of the shim logs (which are runtime logs, some that might be very useful when debugging apps) to the pod logs?

@Mossaka Mossaka added the enhancement New feature or request label Apr 5, 2024
@calebschoepp
Copy link
Contributor Author

fwiw, I did some investigation on adding opentelemtry and tracing to runwasi: containerd/runwasi#10, so the last point about converting all upstream projects to use tracing isn't far fetched and I believe that's something we should do. With that said, I'd love to collaborate with you all on this work and I am happy to contribute to both runwasi and rust-extensions to add tracing support. I am happy to talk about how we can parallize the work if you're interested 🙌.

Nice, I'm happy to collab on this too. Couple of thoughts and questions for you:

  1. How long do we think it would take to get tracing into everything upstream? If it is going to take awhile maybe I should try to land observability in the interim by disabling the tracing-log feature.
  2. We'd have to make sure that tracing makes it everywhere. A single log init somewhere and we're still screwed. I'm honestly pretty shaky on how shim-spin, runwasi, containerd, and everything else fit together logging wise so more clarity here would be good.
  3. Would containerd want to set its own tracing subscriber? B/c that would be a problem too and would conflict with spin_telemetry::init().

@calebschoepp
Copy link
Contributor Author

Does that mean we could surface some of the shim logs (which are runtime logs, some that might be very useful when debugging apps) to the pod logs?

Yes, I think that is a possibility.

@jsturtevant
Copy link
Contributor

@calebschoepp It looks like @Mossaka wired up the tracing::subscriber in a slightly different way not using the init() function that doesn't seem to interact with default logger:

: https://github.com/containerd/runwasi/pull/582/files#diff-529b294a55113158c10faa8011a513bfd0359d9fa7ba001d1c5549237df52853R77-R79

Would that be something worth trying or do you think we will still be missing some of the logs because of " It turns out Spin relies on the tracing-log feature to pick up any logs emitted from within Spin or in upstream dependencies via tracing::log::* or log::*. "

@calebschoepp
Copy link
Contributor Author

@jsturtevant That's an interesting idea, but I don't think it is actually any different in the end. From reading the docs it appears that init is just a convenience wrapper (via a trait extension) around the set_global_default call @Mossaka is making.

It sort of skirts the issue of the conflicting global loggers b/c it doesn't setup a global logger for the tracing-log compatibility layer. But, we're still left with the issue of Spin relying on this compatibility layer. I need to do more investigating on how much we rely on this compatibility layer.

I still think the easy way out is to turn off the tracing-log compat layer, and that the cleanest way out is to convert all the upstream libraries to tracing.

@calebschoepp
Copy link
Contributor Author

After some reflection and conversation with @radu-matei I've decided that the most straightforward way to land o11y support in SpinKube is by disabling the tracing-log feature in upstream crates. I'll get some PRs up for that. Bottom line was that we're really not losing that many logs doing it this way.

I still think we should standardize on tracing in all the libraries though fwiw.

@calebschoepp
Copy link
Contributor Author

o11y support has landed in the shim so I'm closing this. Config work is being tracked here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants