Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

most ratio metrics are zeroes #128

Open
pschonmann opened this issue Dec 29, 2023 · 3 comments
Open

most ratio metrics are zeroes #128

pschonmann opened this issue Dec 29, 2023 · 3 comments

Comments

@pschonmann
Copy link

Describe the bug
Some ratio metrics are zeroes

obrazek

Expected behavior
Just show me graphs with values, not zeroes

!UUID STRING WAS REPLACED!

Console output
Some metrics are zeroes like
nvidia_smi_fan_speed_ratio{uuid="MY_UUID_STRING"} 0.3
nvidia_smi_utilization_decoder_ratio{uuid="MY_UUID_STRING"} 0
nvidia_smi_utilization_encoder_ratio{uuid="MY_UUID_STRING"} 0
nvidia_smi_utilization_gpu_ratio{uuid="MY_UUID_STRING"} 0
nvidia_smi_utilization_jpeg_ratio{uuid="MY_UUID_STRING"} 0
nvidia_smi_utilization_memory_ratio{uuid="MY_UUID_STRING"} 0
nvidia_smi_utilization_ofa_ratio{uuid="MY_UUID_STRING"} 0

Model and Version

  • GPU Model: NVIDIA RTX A6000
  • App version and architecture: 6.1.0-13-amd64 Create MSI installer #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1 (2023-09-29) x86_64 GNU/Linux
  • Installation method binary
  • Operating System Debian 12
  • Nvidia GPU driver version - ii nvidia-driver 545.23.06-1 amd64 NVIDIA metapackage

Additional context

Fri Dec 29 13:09:22 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.06              Driver Version: 545.23.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A6000               On  | 00000000:00:10.0 Off |                  Off |
| 30%   33C    P8              20W / 300W |  47383MiB / 49140MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A   2375961      C   /opt/conda/bin/python3.9                  47374MiB |
+---------------------------------------------------------------------------------------+
@pschonmann
Copy link
Author

It was used that dashboard
https://grafana.com/grafana/dashboards/14574-nvidia-gpu-metrics/

@CPUtester5465
Copy link

@pschonmann In my case it solved there jina-ai/clip-as-service#254
(just run sudo nvidia-smi -pm 1 in the instance )

@pschonmann
Copy link
Author

@pschonmann In my case it solved there jina-ai/clip-as-service#254 (just run sudo nvidia-smi -pm 1 in the instance )

It doestn help, because is already enabled

Persistence mode is already Enabled for GPU 00000000:01:00.0.
Persistence mode is already Enabled for GPU 00000000:02:00.0.
Persistence mode is already Enabled for GPU 00000000:03:00.0.
Persistence mode is already Enabled for GPU 00000000:04:00.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants