Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PermissionError: [Errno 1] Operation not permitted #160

Open
kcsf opened this issue May 24, 2024 · 9 comments
Open

PermissionError: [Errno 1] Operation not permitted #160

kcsf opened this issue May 24, 2024 · 9 comments
Assignees

Comments

@kcsf
Copy link

kcsf commented May 24, 2024

(rickslab-gpu-utils-env) cg@gpu-49-59:~$ sudo gpu-ls
Error: Invalid icon path
Detected GPUs: AMD: 1
AMD: amdgpu version: 1:6.0.60002-1718217.22.04
AMD: Wattman features enabled: 0xfffd7fff
Warning: Can not read parameter: loading, disabling for this GPU: 0
Warning: Can not read parameter: mem_loading, disabling for this GPU: 0
Warning: Can not read parameter: sclk_ps, disabling for this GPU: 0
Warning: Can not read parameter: mclk_ps, disabling for this GPU: 0
Warning: Can not read parameter: ppm, disabling for this GPU: 0
Warning: Can not read parameter: power_dpm_force, disabling for this GPU: 0
Warning: Can not read parameter: power_cap_range, disabling for this GPU: 0
Warning: Can not read parameter: power, disabling for this GPU: 0
Warning: Can not read parameter: power_cap, disabling for this GPU: 0
Warning: Can not read parameter: temperatures, disabling for this GPU: 0
Warning: Can not read parameter: voltages, disabling for this GPU: 0
Warning: Can not read parameter: frequencies, disabling for this GPU: 0
Warning: Can not read parameter: fan_speed_range, disabling for this GPU: 0
Warning: Can not read parameter: fan_pwm_range, disabling for this GPU: 0
Warning: Can not read parameter: fan_enable, disabling for this GPU: 0
Warning: Can not read parameter: fan_target, disabling for this GPU: 0
Warning: Can not read parameter: fan_speed, disabling for this GPU: 0
Warning: Can not read parameter: pwm_mode, disabling for this GPU: 0
Warning: Can not read parameter: fan_pwm, disabling for this GPU: 0
1 total GPUs, 1 rw, 0 r-only, 0 w-only

Traceback (most recent call last):
File "/usr/bin/gpu-ls", line 154, in
main()
File "/usr/bin/gpu-ls", line 138, in main
gpu_list.read_gpu_pstates()
File "/usr/lib/python3/dist-packages/GPUmodules/GPUmodule.py", line 2136, in read_gpu_pstates
gpu.read_gpu_pstates()
File "/usr/lib/python3/dist-packages/GPUmodules/GPUmodule.py", line 1061, in read_gpu_pstates
for line in card_file:
PermissionError: [Errno 1] Operation not permitted

@Ricks-Lab
Copy link
Owner

Which distro are you using? The driver files are normally world readable.

@Ricks-Lab
Copy link
Owner

Also, I do not recommend running with sudo. The execution of files that write to driver files needs root permissions. gpu-pac is the only utility that writes to these files. By default it creates bash files that you can execute yourself with sudo or if you use the --execute_pac option, it will execute the bash script with sudo which will prompt you for credentials at the command line.

It would also be helpful to execute with --debug option and post the log file contents here. Feel free to delete any details from the logfile that you do not want to make public.

@kcsf
Copy link
Author

kcsf commented Jun 10, 2024

Hi Rick!
Thank you so much for your prompt response. I got busy and neglected to follow up. Now of course, it's rather urgent that I knock the power usage down on these gpus from 100 watts to 80.

Here's some info:

cg@gpu-13-23:$ pip list | grep rickslab-gpu-utils
rickslab-gpu-utils 3.6.0
cg@gpu-13-23:
$ pip3 list | grep rickslab-gpu-utils
rickslab-gpu-utils 3.6.0
cg@gpu-13-23:$ dpkg -l | grep gpu-utils
ii rickslab-gpu-utils 3.6.0-2 all AMD GPU performance adjustment and monitoring
cg@gpu-13-23:
$ gpu-ls --debug
Error: Invalid icon path
Ubuntu: Validated
Traceback (most recent call last):
File "/usr/bin/gpu-ls", line 154, in
main()
File "/usr/bin/gpu-ls", line 102, in main
gpu_list.set_gpu_list(clinfo_flag=True)
File "/usr/lib/python3/dist-packages/GPUmodules/GPUmodule.py", line 1885, in set_gpu_list
pp_od_file_details = file_ptr.read()
PermissionError: [Errno 1] Operation not permitted
cg@gpu-13-23:~$ gpu-ls
Error: Invalid icon path
Detected GPUs: AMD: 1
AMD: amdgpu version: 1:6.0.60002-1718217.22.04
AMD: Wattman features enabled: 0xfffd7fff
Warning: Can not read parameter: loading, disabling for this GPU: 0
Warning: Can not read parameter: mem_loading, disabling for this GPU: 0
Warning: Can not read parameter: sclk_ps, disabling for this GPU: 0
Warning: Can not read parameter: mclk_ps, disabling for this GPU: 0
Warning: Can not read parameter: ppm, disabling for this GPU: 0
Warning: Can not read parameter: power_dpm_force, disabling for this GPU: 0
Warning: Can not read parameter: power_cap_range, disabling for this GPU: 0
Warning: Can not read parameter: power, disabling for this GPU: 0
Warning: Can not read parameter: power_cap, disabling for this GPU: 0
Warning: Can not read parameter: temperatures, disabling for this GPU: 0
Warning: Can not read parameter: voltages, disabling for this GPU: 0
Warning: Can not read parameter: frequencies, disabling for this GPU: 0
Warning: Can not read parameter: fan_speed_range, disabling for this GPU: 0
Warning: Can not read parameter: fan_pwm_range, disabling for this GPU: 0
Warning: Can not read parameter: fan_enable, disabling for this GPU: 0
Warning: Can not read parameter: fan_target, disabling for this GPU: 0
Warning: Can not read parameter: fan_speed, disabling for this GPU: 0
Warning: Can not read parameter: pwm_mode, disabling for this GPU: 0
Warning: Can not read parameter: fan_pwm, disabling for this GPU: 0
1 total GPUs, 1 rw, 0 r-only, 0 w-only

Traceback (most recent call last):
File "/usr/bin/gpu-ls", line 154, in
main()
File "/usr/bin/gpu-ls", line 138, in main
gpu_list.read_gpu_pstates()
File "/usr/lib/python3/dist-packages/GPUmodules/GPUmodule.py", line 2136, in read_gpu_pstates
gpu.read_gpu_pstates()
File "/usr/lib/python3/dist-packages/GPUmodules/GPUmodule.py", line 1061, in read_gpu_pstates
for line in card_file:
PermissionError: [Errno 1] Operation not permitted
cg@gpu-13-23:~$ sudo gpu-ls --debug
Error: Invalid icon path
Ubuntu: Validated
Traceback (most recent call last):
File "/usr/bin/gpu-ls", line 154, in
main()
File "/usr/bin/gpu-ls", line 102, in main
gpu_list.set_gpu_list(clinfo_flag=True)
File "/usr/lib/python3/dist-packages/GPUmodules/GPUmodule.py", line 1885, in set_gpu_list
pp_od_file_details = file_ptr.read()
PermissionError: [Errno 1] Operation not permitted

@Ricks-Lab
Copy link
Owner

When the --debug option is used, there should be a log file that is produced. Can you paste it contents here?

Also, can you upgrade to the latest version? I recently released 3.9.0 to PyPI.

@Ricks-Lab Ricks-Lab self-assigned this Jun 10, 2024
@kcsf
Copy link
Author

kcsf commented Jun 10, 2024

Ok, I upgraded to 3.9

now i'm getting this:
`cg@gpu-24-34:~$ gpu-ls --debug
Ubuntu: Validated
HW Exception by GPU node-1 (Agent handle: 0x5e41c0b8f730) reason :GPU Hang
Error: system support issue for 01:00.0: [[Errno 1] Operation not permitted]
Detected GPUs: AMD: 1
AMD: amdgpu version: 1:6.0.60002-1718217.22.04
AMD: Wattman features enabled: 0xfffd7fff
Error: System support issue for GPU [01:00.0]
Error: System support issue for GPU [01:00.0]
Error: System support issue for GPU [01:00.0]
Error: System support issue for GPU [01:00.0]
Error: System support issue for GPU [01:00.0]
Error: System support issue for GPU [01:00.0]
Error: System support issue for GPU [01:00.0]
Error: System support issue for GPU [01:00.0]
Error: System support issue for GPU [01:00.0]
Error: System support issue for GPU [01:00.0]

read_time_val: 10-Jun-2024 13:59:15

model_display: True: Cyan Skillfish
loading: True: None
mem_loading: True: None
mem_vram_usage: True: 0.06260871887207031
mem_gtt_usage: True: 0.2832306048274743
power: True: None
power_cap: True: None
energy: True: 0.0
temp_val: True: None
vddgfx_val: True: nan
fan_pwm: True: None
sclk_f_val: True: None
sclk_ps_val: True:
mclk_f_val: True: None
mclk_ps_val: True:
ppm: True:

Total of 1 GPU: 0 are rw, 1 is r-only, and 0 are w-only

Card Number: 0
Vendor: AMD
Readable: True
Writable: False
Compute: False
Device ID: {'device': '0x13fe', 'subsystem_device': '0x0000', 'subsystem_vendor': '0x1022', 'vendor': '0x1002'}
Decoded Device ID: Cyan Skillfish
Card Model: Advanced Micro Devices, Inc. [AMD/ATI] Cyan Skillfish
Display Card Model: Cyan Skillfish
PCIe ID: 01:00.0
Link Speed: 16.0 GT/s PCIe
Link Width: 16
##################################################
Driver: amdgpu
vBIOS Version: 113-AMDRBN-003
Compute Platform: None
GPU Type: Modern
HWmon: /sys/class/drm/card0/device/hwmon/hwmon0
Card Path: /sys/class/drm/card0/device
System Card Path: /sys/devices/pci0000:00/0000:00:08.1/0000:01:00.0
##################################################
##################################################
Current GTT Memory Usage (%): 0.283
Current GTT Memory Used (GB): 0.011
Total GTT Memory (GB): 3.738
Current VRAM Usage (%): 0.063
Current VRAM Used (GB): 0.005
Total VRAM (GB): 8.000
Critical Temps (C): {}
Vddgfx Offset (mV): 0
Vddgfx Offset Range (mV): [-25, 25]
##################################################
Disabled Parameters: pp_od_clk_voltage, sclk_f_range, mclk_f_range, vddc_range,
pp_features, unique_id, loading, mem_loading,
sclk_ps, mclk_ps, pstates, ppm,
power_dpm_force, power_dpm_state, power_cap_range, power,
power_cap, temperatures, voltages, frequencies,
fan_speed_range, fan_pwm_range, fan_enable, fan_target,
fan_speed, pwm_mode, fan_pwm

`

@kcsf
Copy link
Author

kcsf commented Jun 10, 2024

gpu-utils_debug-log.txt

Am I able to control the gpu speed and/or power use yet, or is there more troubleshooting to do?

@Ricks-Lab
Copy link
Owner

I am running Ubuntu 22.04 on two systems and do not see the issue of driver files not being readable. Possibly a driver/hardware issue or feature definition may be different for newer GPUs. I suggest updating:
AMD: Wattman features enabled: 0xfffd7fff to 0xffffffff

Here is what ChatGPT has to say:

The error message you're encountering indicates a hardware exception caused by a GPU hang. This can be due to several factors, including hardware failures, driver issues, or system configuration problems. Here's a step-by-step guide to troubleshoot and address this issue:

Check System Logs:

Look into system logs for more detailed error messages. On Linux, you can use dmesg or check /var/log/syslog or /var/log/messages.
Update GPU Drivers:

Ensure that your GPU drivers are up to date. You can download the latest drivers from the GPU manufacturer's website (NVIDIA, AMD, etc.).
Check Hardware:

Ensure that the GPU is properly seated in its slot and that all power connectors are securely attached.
Monitor the GPU temperature to ensure it is not overheating. You can use tools like nvidia-smi for NVIDIA GPUs or radeontop for AMD GPUs.
Test GPU on Another System:

If possible, test the GPU on a different system to rule out hardware failure.
Verify System Configuration:

Ensure that your system’s power supply is adequate for the GPU.
Check for BIOS/UEFI updates for your motherboard and apply them if necessary.
Disable any overclocking settings and see if the problem persists.
Check Permissions:

The error message "Operation not permitted" suggests there might be a permissions issue. Make sure you have the necessary permissions to access the GPU. Running the operation as root or with sudo might help.
Consult Documentation:

Refer to the documentation for your specific GPU and system for any known issues or configuration tips.
Contact Support:

If the problem persists, consider reaching out to the GPU manufacturer’s support or your system’s support service for further assistance.
By systematically going through these steps, you should be able to identify and resolve the issue causing the GPU hang.

@kcsf
Copy link
Author

kcsf commented Jun 12, 2024

Dang. I've tried most of that. It's a BC-250 (re-purposed PS5 card).
There are no bios updates for it. The only thing I can think to try is update the kernel & os to 24.04 - but it took me a long time to find an old kernel that worked in the first place.

Any ideas or suggestions would be much appreciated.

@Ricks-Lab
Copy link
Owner

I really doubt that any of this would be enabled for PS5 hardware.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants