Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qdisc collector does not expose queues with a parent #3088

Open
bh-tt opened this issue Aug 20, 2024 · 1 comment
Open

Qdisc collector does not expose queues with a parent #3088

bh-tt opened this issue Aug 20, 2024 · 1 comment

Comments

@bh-tt
Copy link

bh-tt commented Aug 20, 2024

Host operating system: output of uname -a

Linux k8s-secnet-node6 6.1.0-23-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.99-1 (2024-07-15) x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 1.8.2 (branch: HEAD, revision: f1e0e8360aa60b6cb5e5cc1560bed348fc2c1895)
  build user:       root@e8029641b208
  build date:       20240806-20:45:43
  go version:       go1.21.13
  platform:         linux/amd64
  tags:             unknown

node_exporter command line flags

--path.procfs=/host/proc --path.sysfs=/host/sys --web.listen-address=0.0.0.0:9100 --collector.qdisc     

node_exporter log output

not relevant

Are you running node_exporter in Docker?

yes, we have correctly exposed the host /proc and /sys.

What did you do that produced an error?

We enabled qdisc metrics to correlate a networking issue with packet drops in an eBPF program, but it turns out that node_exporter only gives metrics for the qdisc that have no parent. With tc -s qdisc show we see a lot of packet drops on ebpf qdiscs (type clsact) which have a parent qdisc defined, but because node_exporter does not expose these it is very hard to correlate our networking issues with packet drops here. Looking at the implementation this is logical, since node_exporter by default skips all qdiscs that have a parent.

See

if msg.Parent != 0 {

What did you expect to see?

I expected to see metrics for all qdiscs on the host, and to let users worry about possible cardinality issues. This collector is disabled by default anyway.

What did you see instead?

I saw only metrics for the root qdisc, which in this case is not that relevant.

Possible fixes

  • also show metrics for non-root discs (remove the condition in the linked function)
  • possibly do this based on a CLI flag or environment variable?
@discordianfish
Copy link
Member

Yeah dunno why we only expose root level, probably to ensure the metrics can be summed up without sepearating root and child queues. Any suggestions how to handle this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants