Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DCGM needs a restart after creating MIG's via instaslice and Container mapping for MIG usage is not exported. #288

Open
bharathappali opened this issue Nov 28, 2024 · 2 comments

Comments

@bharathappali
Copy link

When I tried to track the MIG usage (created via instaslice) with DCGM metric DCGM_FI_DEV_FB_USED I couldn't see any data related to that query

Later I have restarted DCGM and I was able to see the MIG usage (but the label's exported_pod, exported_container and exported_namespace are not available). MIG's created via MIG manager have these labels exported along with the container details which was consuming the MIG.

In case of Instaslice we need to query the allocation section in instaslice object and extract the GPU UUID and then query the DCGM metrics which match the label GPU UUID value to record the GPU usage of the container.

Will add more context and my finding shortly in the comments of this issue.

@bharathappali
Copy link
Author

Instaslice object:

Screenshot from 2024-12-01 17-06-56

@bharathappali
Copy link
Author

Metrics dashboard shows no data points found for the MIG profile
Screenshot from 2024-12-01 17-07-44

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant