Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for the PCIe TX Throughput and RX Throughput metrics #127

Open
sbates130272 opened this issue Dec 12, 2023 · 1 comment
Open

Comments

@sbates130272
Copy link

Is your feature request related to a problem? Please describe.

Since the Maxwell architecture the NVIDIA GPUs have contained hardware counters that track the traffic on both the incoming and outgoing PCIe link. Adding these counters to the fields exposed via the exporter can be very useful when monitoring these GPUs in a AI/ML fleet.

Describe the solution you'd like

Update the exporter code to support the addition of the TX Throughput and RX Throughput fields obtained via the nvidia-smi tool. We probably need to do this after a test on the GPU architecture to avoid errors on pre-Maxwell GPUs.

Describe alternatives you've considered

There are no other solutions that are as clean as this. I don't see anyone wanting to write a second exporter just for those metrics and adding more calls to nvidia-smi is probably not a wise move at the system level.

Additional context

The fields is questions are discussed in the nvidia-smi documenation. Once this issue is merged we could update the Grafana dashboard to include counters and guages for PCIe traffic.

@utkuozdemir
Copy link
Owner

Hi, thank you for the suggestion. Lately I don't find any time to maintain the project, and I don't think it's gonna change anytime soon. But a PR would be more than welcome, if you'd be interested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants