Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve journald support #411

Open
tallclair opened this issue Jul 20, 2024 · 10 comments
Open

Improve journald support #411

tallclair opened this issue Jul 20, 2024 · 10 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@tallclair
Copy link
Member

/kind feature

Problem

Running Kubelet as a systemd service logging to journald is not ideal today:

Jun 17 12:46:23 gke-test-default-r55g kubelet[118528]: I0617 12:46:23.668491  118528 kubelet.go:2447] "SyncLoop (probe)" probe="liveness" status="unhealthy" pod="default/test"
Jun 17 12:46:24 gke-test-default-r55g kubelet[118528]: E0617 12:46:24.369353  118528 kubelet.go:1875] "Unable to attach or mount volumes for pod; skipping pod" err="unmounted volumes=[data], unattached volumes=[], failed to process volumes=[]: timed out waiting for the condition" pod="default/test"
  1. Logs redundant information with the journal log:
    • date / time (0617 12:46:23.668491)
    • PID (118528)
  2. Severity level is not parsed by journald

The redundant information can be suppressed with the skip_headers flag, but that also suppresses the severity & code location.

Proposal

The simplest option would be to add a journald option that suppresses the redundant headers, and sets the severity prefix in the journald format (https://www.freedesktop.org/software/systemd/man/latest/sd-daemon.html).

Alternatively, klog could write to journald directly, e.g. something like https://git.frostfs.info/TrueCloudLab/zapjournald, but I don't know enough about the tradeoffs here. I suggest starting with the first option.

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 20, 2024
@pohly
Copy link

pohly commented Jul 21, 2024

With structured, contextual logging, Kubernetes follows the logr model which only has "info" and "error" severity. But I suppose that difference would be nice to represent.

Do you care about writing JSON output to the journald log?

@tallclair
Copy link
Member Author

Do you care about writing JSON output to the journald log?

Good question. I think I'd still expect the redundant information to be suppressed in this case, but I think some follow-up investigation is needed to see how the logs are typically ingested.

@dgrisonnet
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 25, 2024
@huww98
Copy link

huww98 commented Aug 15, 2024

Journald itself also stores structured logs (please see journalctl -o json), I think we can map logr fields into journald fields, logr severity to journald priority (to get colorful output with journalctl) if we write to journald socket directly. I can help with this, if you agree. I've merged journald support to ceph before.

But different from logr, journalctl does not show extra fields after message with its default output format. So maybe we should still format the message ourself. Maybe we can also replicate some common fields (or just all the fields) to journald fields for efficient filtering? e.g. how about enabling journalctl -u kubelet POD=default/test or journalctl CODE_FILE=kubelet.go

@pohly
Copy link

pohly commented Aug 19, 2024

if we write to journald socket directly

This could be useful, in particular when it ensures that multi-line log entries are treated as one entry. I'm a bit worried that this might happen in text mode with multi-line text values. The text logger does a single os.Stderr.Write, but that does not necessarily mean that journald can read the entire entry with a single read. Does it assume "one line = one entry" or use SOCK_DGRAM?

Maybe we can also replicate some common fields (or just all the fields) to journald fields for efficient filtering?

On caveat is that we only have recommendations for common fields, but no schema and no enforcement of those recommendations other than code reviews. Having said that, some fields like "pod" are used fairly consistently, so exposing that in journald could be useful.

Either way, this changes externally visible behavior of kubelet and potentially other components. We also need to consider how to roll it out. In other words, do we need a KEP? @tallclair, what do you think? Is it worth asking in a SIG Node meeting as this is most useful for kubelet?

@huww98
Copy link

huww98 commented Aug 19, 2024

The text logger does a single os.Stderr.Write, but that does not necessarily mean that journald can read the entire entry with a single read. Does it assume "one line = one entry" or use SOCK_DGRAM?

journald has its own protocol. we can write the total length of the message before the message text, so that multi-line log can be recognized as one entry.

@pohly
Copy link

pohly commented Aug 19, 2024

Suppose we don't use that protocol. How does systemd capture the stderr output from a process, i.e. what are the message boundaries in that case?

@huww98
Copy link

huww98 commented Aug 19, 2024

I think it will just use \n as the separator.

@pohly
Copy link

pohly commented Aug 20, 2024

Then we shouldn't remove the klog header while using stderr because the message boundaries could get lost.

@tallclair
Copy link
Member Author

Yes, I can follow up with with sig-node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

5 participants