Skip to content

A small program that uses the NVIDIA Management Library to control the GPU independent of OS or display server. This project is not provided or endorsed by NVIDIA

License

Notifications You must be signed in to change notification settings

HackTestes/NVML-GPU-Control

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NVML GPU Control

This is a small program that uses the NVIDIA Management Library (NVML) to monitor GPU temperature and set fan speed. NVML is being used, because it is OS and display sever agnostic (that means it doesn't depend on X11 or Windows). Another important reason is that the official NVIDIA tool (NVIDIA smi) does not currently support fan control.

Disclaimer

  • This project is NOT endorsed or sponsored by NVIDIA
  • This project is independent

Supported hardware

  • Any NVIDIA CUDA supported card with a driver higher or equal to version 520

Dependencies

To use it, you must have installed:

  • NVIDIA's proprietary drivers (>= v520)
  • Python 3
  • nvidia-ml-py (current version used: 12.535.133)

You will also need admin/root privileges to be able to set the fan speed.

Why I am creating this project?

Because of multiple reasons:

  1. NVIDIA smi doesn't change fan speed
  2. Can't use nvidia-settings under Wayland to control the fans
  3. GeForce Experience needs internet to work and it's pretty bad

Now that NVIDIA added the functions to work on any CUDA supported card on drivers equal or higher than v520 (see Change Log here), it is possible to control GeForce cards' fans through NVML! This means that I can get perfect Wayland support as well, since NVML doesn't depend on a display server.

NVIDIA NVML change logs NVIDIA NVML change logs

Installation

Note: you may need to adapt the path of some of the commands

  1. Clone the repository
git clone https://github.com/HackTestes/NVML-GPU-Control NVML_GPU_Control

The next part requires admin/root permissions

  1. Create a new folder for the scripts
# Windows
mkdir 'C:\Program Files\User_NVIDIA_GPU_Control\'

# Linux
sudo mkdir '/usr/bin/User_NVIDIA_GPU_Control/'
  1. Copy the scripts files from the repository to the new directory
# Windows
cp 'C:\Path_to_the_repository\NVML_GPU_Control\src\*' 'C:\Program Files\User_NVIDIA_GPU_Control\'

# Linux
sudo cp -r /Path_to_the_repository/NVML_GPU_Control/src/* /usr/bin/User_NVIDIA_GPU_Control/

Additional notes: you may also need to install the library as admin or install it as a normal user and then lock the files(change the permissions and take ownership as root/admin).

Uninstall

You only need to remove the directory (BE EXTRA CAREFUL WITH THE RM COMMAND). You can also use the GUI to simply delete the directory if you find that easier and safer.

Useful docs (read before running the commands):

# Windows - you can run first with the -WhatIf parameter to test 
Remove-Item -Confirm -Force -Recurse -Path 'C:\Program Files\User_NVIDIA_GPU_Control\'

# Linux
rm --interactive --preserve-root -R '/usr/bin/User_NVIDIA_GPU_Control'

How to use

  • Make sure to run with the working directory being the .\src
cd ./src
  • You must first list all cards that are connected, so you can get the name or UUID
python.exe ./nvml_gpu_control.py list
  • Then you can select a target by name
python.exe ./nvml_gpu_control.py fan-control -n 'NVIDIA GeForce RTX 4080'
  • And the fan speed for each temperature level
sudo python.exe ./nvml_gpu_control.py fan-control -n 'NVIDIA GeForce RTX 4080' -sp '10:35,20:50,30:50,35:100'
  • You could also use the --dry-run for testing!
python.exe ./nvml_gpu_control.py fan-control -n 'NVIDIA GeForce RTX 4080' -sp '10:35,20:50,30:50,35:100' --dry-run
  • You can also revert to the original state
python.exe ./nvml_gpu_control.py fan-policy --auto -n 'NVIDIA GeForce RTX 4080'

Note that it does not current support fan curve (or linear progression), so it works on levels. Each level the temperature is verified against the configuration (higher or equal) and then set properly. Also, each temperature associated with speed is ordered automatically. (think of it as a staircase graph)

Temp : speed(%)

1. 40 : 100 (>=40°C - 100%)

2. 30 : 50 (>=30°C - 50%)

3. 20 : 30 (>=20°C - 30%)

4. Default speed (DS)

___________________________

41°C - 100%

21°C - 30%

19°C - Default speed

Usage docs

python.exe .\nvml_gpu_control.py <ACTION> <OPTIONS>

ACTIONS
    help
          Display help text

    list
          List all available GPUs connected to the system by printing its name and UUID

    fan-control
          Monitor and controls the fan speed of the selected card (you must select a target card)

    fan-info
          Shows information about fan speed

    fan-policy <--auto|--manual>
          Changes the fan control policy to automatic (vBIOS controlled) or manual. Note that when the fan speed is changed, the NVML library automatically changes this setting to manual. This setting is useful to change the GPU back to its original state

    fan-policy-info
          Shows information about the current fan policy

    power-limit-info
          Shows information about the power limit of the selected GPU

    power-control
          Controls the power limit of the selected GPU. It runs in a loop by default, but can run once using the --single-use option

    thresholds-info
          Shows information about temperature thresholds in dregrees Celsius of the selected GPU.

    temp-control
          Controls the temperature thresholds configuration of the selected GPU. It runs in a loop by default, but can run once using the --single-use option

    control-all
         Allows the use of all controls in a single command/loop


OPTIONS

    --name OR -n <GPU_NAME>
          Select a target GPU by its name. Note: UUID has preference over name

    --uuid OR -id <GPU_UUID>
          Select a target GPU by its Universally Unique IDentifier (UUID). Note: UUID has preference over name

    --time-interval OR -ti <TIME_SECONDS>
          Time period to wait before probing the GPU again. Works for all actions that run in a loop

    --dry-run OR -dr
          Run the program, but don't change/set anything. Useful for testing the behavior of the program

    --speed-pair OR -sp <TEMP_CELSIUS:SPEED_PERCENTAGE,TEMP_CELSIUS:SPEED_PERCENTAGE...>
          A comma separated list of pairs of temperature in celsius and the fan speed in % (temp:speed) defining basic settings for a fan curve

    --default-speed OR -ds <FAN_SPEED_PERCENTAGE>
          Set a default speed for when there is no match for the fan curve settings

    --manual
          Sets the fan policy to manual

    --auto
          Sets the fan policy to automatic (vBIOS controlled)

    --power-limit OR -pl <POWER_LIMIT_WATTS>
          Sets the power limit of the GPU in watts

    --acoustic-temp-limit OR -tl <TEMPERATURE_CELSIUS>
          Sets the acoustic threshold in celsious (note that this is the same temperature limit used by GeForce Experience)

    --single-use OR -su
          Makes some actions work only once insted of in a loop. This option is valid for: temp-control and power-control

Running tests
python.exe ./src/tests.py -b

Setting up services or tasks (under development)

This section will present some simple commands to setup services or tasks that start as admin and run the configured program with the configured settings. You should secure the files under an admin only folder, so only authorized programs can modify the scripts (and DON'T use SUID in Linux).

Windows

Please, check Microsoft's documentation:

Since this program does not implement the service API, it will be using scheduled tasks to run at startup. There will be presented a GUI and a command line guide to how to do the setup:

GUI
  1. Make sure to have the script files at a path only accessible to admin users. This guide will be using C:\Program Files\User_NVIDIA_GPU_Control\

  2. Open Task Scheduler as an admin (you might need to select a admin user)

  3. Click on create task (do not confuse it for the create simple task)

TaskScheduler_CreateTask

  1. General tab -> Write the service name. This guide will use: User NVIDIA GPU Control Task

  2. General tab -> Write a description. This guide will use: This task runs a daemon at startup responsible for controling NVIDIA GPUs' fans and power

  3. General tab -> Mark the box containing Run whether the user is logged or not

  4. General tab -> Mark the box containing Do not store password

  5. General tab -> Mark the box containing Run with highest privileges

TaskScheduler_GeneralTab

  1. Triggers tab -> Create a new trigger and change the Begin the task to At Startup (make sure to leave the Enabled box marked)

TaskScheduler_TriggersTab TaskScheduler_TriggersTab_2

  1. Actions tab -> Create a new action and select the action Start a program

  2. Actions tab -> In the Program/script put the path of the python executable. This guide wil use "C:\Program Files\Python312\python.exe" (Note that some python versions may have a different directory name and make sure only admin users can change the executable and the folder) - the double quotes are necessary

  3. Actions tab -> In the Add arguments (optional), add the script path and the desired settings. This guide will use the following args: "C:\Program Files\User_NVIDIA_GPU_Control\nvml_gpu_control.py" "fan-control" "-n" "NVIDIA GeForce RTX 4080" "-sp" "10:0,20:50,35:100"

or

"C:\Program Files\User_NVIDIA_GPU_Control\nvml_gpu_control.py" "control-all" "-n" "NVIDIA GeForce RTX 4080" "-pl" "305" "-tl" "65" "-sp" "10:0,20:50,35:100"
  1. Actions tab -> In the Start in (optional), add the script path directory. This guide will use the following args: C:\Program Files\User_NVIDIA_GPU_Control

TaskScheduler_ActionsTab

  1. Conditions tab -> Leave all boxes UNmarked

TaskScheduler_ConditionsTab

  1. Settings tab -> Mark the box in Allow task to be run on demand

  2. Settings tab -> UNmark the box in Stop task if it runs longer than

  3. Settings tab -> Mark the box in If the running task does not end when requested, force it to stop

  4. Settings tab -> In the If the task is already running, then the following rule applies, select the Do not start a new instance

TaskScheduler_SettingsTab

Command line (Not recommended and untested)

Some users might find easier to simply run a command, however, it is important to warn about two things:

  1. The command line utility has less features than the GUI version;

  2. If you are unsure of what the command does, please check MS's documentation before running it (especially because you must run it with admin permissions)

  3. Open a terminal with admin permissions

  4. Write the following command: schtasks /create /tn 'User NVIDIA GPU Control Task' /tr 'C:\Program Files\Python312\python.exe C:\Program Files\User_NVIDIA_GPU_Control\nvml_gpu_control.py fan-control -t "NVIDIA GeForce RTX 4080" -sp "10:0,20:47,30:50,35:100"' /sc ONSTART /np /rl HIGHEST

Another formatting

schtasks /create
/tn 'User NVIDIA GPU Control Task'
/tr 'C:\Program Files\Python312\python.exe C:\Program Files\User_NVIDIA_GPU_Control\nvml_gpu_control.py fan-control -n "NVIDIA GeForce RTX 4080" -sp "10:0,20:47,30:50,35:100"'
/sc ONSTART
/np
/rl HIGHEST

One of the limitations involve not being able to change the start working directory, so some paths in the scripts might break. Overall, I do not recommend this approach on Windows, users should opt for the GUI method.

Linux (systemd)

This section will show how to install a global (system wide) systemd service in Ubuntu and enable it, so very time the computer starts the control will resume their work.

Systemd service
  1. Take a look at the systemd service at linux_config/unofficial-gpu-nvml-control.service. Change the GPU name and the settings to the desired configuration (Note: you can use the UUID as well).

  2. Copy the unit file into /etc/systemd/system/ (needs root)

sudo cp ./linux_config/unofficial-gpu-nvml-control.service /etc/systemd/system/
  1. Enable the service (needs root)
sudo systemctl enable unofficial-gpu-nvml-control.service
  1. Start the service (needs root)
sudo systemctl start unofficial-gpu-nvml-control.service
  1. Troubleshoot if needed (get the stdout from the service)
sudo journalctl -u unofficial-gpu-nvml-control.service

Reload service

sudo systemctl daemon-reload

Security considerations

Windows

  1. Having an admin prompt under the same desktop

    An opened prompt under the same desktop can receive key command from non-privileged programs, allowing any program to escalate to admin. To mitigate this it is necessary to restrict all other programs with a UI limit JobObject, create the window under a new desktop or not create any windows on the desktop (this is how it is done under the guide).

  2. Programs that start automatically as admin must be secured against writes

    The scripts and the executables can only be written by admin users, otherwise, another program may overwrite them and gain admin rights on the machine. Please, verify the permissions set on the python executable and on the scripts (this also applies to the library nvidia-ml-py).

Linux

  1. Having an admin prompt under the same desktop (X11)

    This is a similar risk to the Windows counterpart, especially on X11/Xorg. So, if you use X11, you must create a new session under a new TTY to create an admin window; but if you use Wayland, it already isolates windows by default.

  2. Programs that start automatically as admin must be secured against writes

    Same as Windows. All of the executables and scripts must be accessible only to the root user (UID 0). I recommend to install the pynvml library with the distro's package manager.

Roadmap (features to be added)

Must have

  • Fan control

  • Select GPU by name

  • Display fan speed per controller

  • Control fan policy

  • Select GPU by UUID (allows users to control more than 1 GPU individually that shares the same model - e.g. 2 RTXs 4080)

  • Run at startup with necessary permissions (Windows and Linux) - Windows already works

  • Power limit control

  • Temperature threshold control

  • Enable all controls

  • Help action must not require NVML initialization

Can consider (nice to have)

  • Logging to file option (with message size limit) -> user can spawn another instance with the same arguments and pass the --dry-run option as it should mirror the output of the privileged one

  • Temperature curves (linear, quadratic, logarithmic...) -> might be unnecessary as users can generate all speed points elsewhere and just pass it as arguments

Support

I will be supporting this program as long as I have NVIDIA GPUs (especially because I am also dogfooding it). Don't expect new features as it has everything currently I need, but you can suggest new features that you think is useful (note that the focus is energy and temperature control to increase stability). You can expect however bug fixes from me so my project remains compatible with the latest versions of NVML.

If I loose the need for this software (aka change my hardware), I will make sure to update this notice.

Contribute

Just a few guidelines and style decisions:

  • variable_name

  • function_name

  • ObjectOrClassName

  • Other dependencies are DISALLOWED, I want to limit the dependencies as a security measure (just remember the xz incident). You are free to try to convince me, but your contribution will most likely be rejected

  • Code should be testable, so please include unit tests to your code. If you think that certain parts are just too hard to make tests, include a juntification

About

A small program that uses the NVIDIA Management Library to control the GPU independent of OS or display server. This project is not provided or endorsed by NVIDIA

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages