Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gazebo in Windows Docker cannot use Nvidia GPU, falls back to using CPU. #2595

Open
3 tasks done
toenails6 opened this issue Sep 5, 2024 · 51 comments
Open
3 tasks done
Labels
bug Something isn't working

Comments

@toenails6
Copy link

toenails6 commented Sep 5, 2024

Environment

  • OS Version: Win11 Docker WSL2 (dockerfile given near the bottom)
  • Binary build, installed via ros-jazzy-ros-gz.
    • Generally, mention all circumstances that might affect rendering capabilities:
      • running in Docker/Singularity
      • using VirtualGL, XVFB, Xdummy, XVNC or other indirect rendering utilities
    • Rendering system info:
      • On Linux, provide the outputs of the following commands:
        LANG=C lspci -nn | grep VGA  # might require installing pciutils
        echo "$DISPLAY"
        LANG=C glxinfo -B | grep -i '\(direct rendering\|opengl\|profile\)'  # might require installing mesa-utils package
        ps aux | grep Xorg
        sudo env LANG=C X -version  # if you don't have root access, try to tell the version of Xorg e.g. via package manager
        No output
        host.docker.internal:0.0
        direct rendering: Yes
            Preferred profile: core (0x1)
            Max core profile version: 4.5
            Max compat profile version: 4.5
            Max GLES1 profile version: 1.1
            Max GLES[23] profile version: 3.2
        OpenGL vendor string: Mesa
        OpenGL renderer string: llvmpipe (LLVM 17.0.6, 256 bits)
        OpenGL core profile version string: 4.5 (Core Profile) Mesa 24.2.1 - kisak-mesa PPA
        OpenGL core profile shading language version string: 4.50
        OpenGL core profile context flags: (none)
        OpenGL core profile profile mask: core profile
        OpenGL version string: 4.5 (Compatibility Profile) Mesa 24.2.1 - kisak-mesa PPA
        OpenGL shading language version string: 4.50
        OpenGL context flags: (none)
        OpenGL profile mask: compatibility profile
        OpenGL ES profile version string: OpenGL ES 3.2 Mesa 24.2.1 - kisak-mesa PPA
        OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
        
        root 635 0.0 0.0 3956 2020 pts/0 S+ 03:25 0:00 grep --color=auto Xorg
        env: ‘X’: No such file or directory
    • Please, attach the ogre.log or ogre2.log file from ~/.gz/rendering
03:19:58: Creating resource group General
03:19:58: Creating resource group Internal
03:19:58: Creating resource group Autodetect
03:19:58: SceneManagerFactory for type 'DefaultSceneManager' registered.
03:19:58: Registering ResourceManager for type Material
03:19:58: Registering ResourceManager for type Mesh
03:19:58: Registering ResourceManager for type Mesh2
03:19:58: Registering ResourceManager for type OldSkeleton

env: ‘X’: No such file or directory

Description

  • Expected behavior:
    Gazebo process should show up in nvidia-smi, and CPU usage should not be so high.
    There should not be any errors or warnings when launching Gazebo.
  • Actual behavior:
    The following errors show up.
[INFO] [ruby $(which gz) sim-1]: process started with pid [703]
[ruby $(which gz) sim-1] libEGL warning: MESA-LOADER: egl: failed to open vgem: driver not built!
[ruby $(which gz) sim-1] 
[ruby $(which gz) sim-1] libEGL warning: NEEDS EXTENSION: falling back to kms_swrast
Gazebo does not use GPU according to nvidia-smi, the framerates are pretty low, and obviously not high-definition. 

nvidia-smi does not show Gazebo process when Gazebo is active.

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.01             Driver Version: 537.70       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060        On  | 00000000:01:00.0  On |                  N/A |
| 37%   37C    P5              16W / 170W |   1806MiB / 12288MiB |     16%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A        30      G   /Xwayland                                 N/A      |
+---------------------------------------------------------------------------------------+

Steps to reproduce

  1. docker run -it --privileged --name DELTA --gpus all ros2stable
  2. ros2 launch ros_gz_sim gz_sim.launch.py gz_args:=empty.sdf

Output

My docker file:

# Set ROS distribution. 
ARG ROS_DISTRO=jazzy

# Use official image. 
FROM ros:${ROS_DISTRO}-ros-base

# ROS2 installs. 
RUN apt update && \
    apt install -y software-properties-common && \
    apt install -y curl && \
    apt install -y wget
RUN apt install -y ros-${ROS_DISTRO}-turtlesim
RUN apt install -y ros-${ROS_DISTRO}-rqt*
RUN apt install -y ros-${ROS_DISTRO}-ros-gz

# Install MESA graphics utility tools. 
RUN apt install -y mesa-utils
RUN apt install -y ffmpeg libsm6 libxext6
RUN apt install -y libgl1-mesa-dev libosmesa6-dev

# Display environment setup. 
ENV DISPLAY=host.docker.internal:0.0
ENV QT_X11_NO_MITSHM=1
ENV NVIDIA_DRIVER_CAPABILITIES=all
RUN echo "export XDG_RUNTIME_DIR=/usr/local/xdg" >> ~/.bashrc
# RUN echo "export RUNLEVEL=3" >> ~/.bashrc

# Use Stable version of MESA. 
RUN add-apt-repository ppa:kisak/kisak-mesa && \
    apt update && \
    apt install -y mesa-utils && \
    apt install -y ffmpeg libsm6 libxext6 && \
    apt install -y libgl1-mesa-dev libosmesa6-dev && \
    apt upgrade -y

# Check versions. 
RUN apt update && \
    apt upgrade -y

# Source ROS2 for default terminal. 
RUN echo "source /opt/ros/${ROS_DISTRO}/setup.bash" >> ~/.bashrc

image

image

@toenails6 toenails6 added the bug Something isn't working label Sep 5, 2024
@toenails6
Copy link
Author

toenails6 commented Sep 5, 2024

Sorry if I missed anything, but I don't really fully know all of this. If more information is needed, please let me know, and I will try my best to provide them.
This seems to be a persisting issue, many people have different variants of this issue, but I tried a whole lot of things, none of which work.

Tried using Gazebo garden as suggested in: #920
also does not work.
I tried ROS2 humble and iron, also does not work.

@traversaro
Copy link
Contributor

llvmpipe Is already used in glxinfo, that suggests that there is something going on independently from Gazebo. Can you run any program at all that uses the GPU in WSL2? Note that docker could play a role here, so you may want to start outside docker and only in the WSL2 instance, and only if it works move in docker.

@toenails6
Copy link
Author

Nothing can access the GPU in this setting.
I have heard that this is a WSL issue, and some people saw similar things on WSL.
The Docker arguments I used should pass all GPU capabilities to the container, as many others have used, so at least it should not be the docker run commands.
I am going to try at WSL2 directly to make sure, but the odds are unlikely.

@toenails6
Copy link
Author

Yeah, not surprisingly it fails.
image
image
image
Better frame rates, but nvidia-smi does not show gazebo. The better frame rate is likely due to WSL2 being native, but still.
Gazebo is not using the GPU, and the mentioned errors when launching Gazebo still happens.

@toenails6
Copy link
Author

Wait hang on this is a little different, lemme try kisak MESA.

@toenails6
Copy link
Author

toenails6 commented Sep 5, 2024

Hmm interesting, not sure what this is.
Still no nvidia-smi gazebo though.
image

@toenails6
Copy link
Author

Well, kisak MESA is generally considered to be the solution, but it does not work here.
A lot of people mention kisak MESA, such as the following link:
https://forums.developer.nvidia.com/t/enabling-nvidia-support-for-vulkan-on-ubuntu-22-04-through-wsl2/244176/3
Since this does not work, I don't know what can work.

@traversaro
Copy link
Contributor

Sorry, can you run LANG=C glxinfo -B | grep -i '\(direct rendering\|opengl\|profile\)' outside of Docker? Perhaps the problem here is that you are using the integrated card instead of the Nvidia one.

@toenails6
Copy link
Author

I don't have an integrated GPU, so I don't think that's the case. By outside docker do you mean the wsl2 just like above?
I'll try it as soon as I can.

@traversaro
Copy link
Contributor

By outside docker do you mean the wsl2 just like above?

Yes, I would first try to get everything working in wsl2, and then I would add the additional docker complexity later. Anyhow, a few interesting documentation entry points:

@bperseghetti
Copy link
Collaborator

bperseghetti commented Sep 5, 2024

I don't have an integrated GPU, so I don't think that's the case. By outside docker do you mean the wsl2 just like above? I'll try it as soon as I can.

Yeah I would be interested in seeing all this run outside docker on your WSL2 instance. I'm not aware of anyone running on WSL2 with docker for gazebo simulation. I sorta view WSL2 as it's own form of a "docker interactive linux shell for windows". Getting true full GPU enablement in docker is already a pain and occasionally breaks with mismatched drivers and runtimes on native as is. Unless you work for docker and your job relies on you making it work in docker, I would never suggest it if your main objective is to run a simulator/simulation. I almost view it as running Docker CE instance in an existing Docker CE.

Also appears that GPU paravirtulization in WSL2 might only be supported from docker desktop from some google searches and the official docker documentation on it. Docker does not official state support for Docker CE with GPU paravirtualization on WSL2.

@bperseghetti
Copy link
Collaborator

Although if you are really deadset after testing it on "native WSL2" to use docker maybe look at this from microsoft, looks like there might be a way: https://learn.microsoft.com/en-us/windows/wsl/tutorials/gpu-compute

@bperseghetti
Copy link
Collaborator

Also the Intel i7-11700 that you are showing has a UHD 750 integrated GPU, you will probably need to specify what GPU you want to use regardless: https://learn.microsoft.com/en-us/windows/wsl/tutorials/gpu-compute#multiple-gpus

@traversaro
Copy link
Contributor

Also the Intel i7-11700 that you are showing has a UHD 750 integrated GPU, you will probably need to specify what GPU you want to use regardless: https://learn.microsoft.com/en-us/windows/wsl/tutorials/gpu-compute#multiple-gpus

If that is the case, export MESA_D3D12_DEFAULT_ADAPTER_NAME=NVIDIA as documented in https://github.com/microsoft/wslg/wiki/GPU-selection-in-WSLg should be sufficient to tell to d3d12 to try to exposed in direct3d12 the NVIDIA GPU and not the integrated Intel one.

@bperseghetti
Copy link
Collaborator

Also the Intel i7-11700 that you are showing has a UHD 750 integrated GPU, you will probably need to specify what GPU you want to use regardless: https://learn.microsoft.com/en-us/windows/wsl/tutorials/gpu-compute#multiple-gpus

If that is the case, export MESA_D3D12_DEFAULT_ADAPTER_NAME=NVIDIA as documented in https://github.com/microsoft/wslg/wiki/GPU-selection-in-WSLg should be sufficient to tell to d3d12 to try to exposed in direct3d12 the NVIDIA GPU and not the integrated Intel one.

Interesting from that link you shared, it says:

"On newer version of MESA, the d3d12 backend will always default to the first enumerated integrated GPU if no user selection is specified. So on a laptop like the Surface Laptop Studio above the default GPU will be the Intel GPU and the user has to manually opt-in for the NVIDIA GPU if they so desire. This is done to avoid accidentally waking up the more powerful (and power hungry) GPU unless this is what the user wants as this has an impact on battery life."

@bperseghetti
Copy link
Collaborator

@toenails6 let us know when you get a chance to test that so we can close this issue if it's just telling WSL what GPU it needs.

@toenails6
Copy link
Author

toenails6 commented Sep 5, 2024

Hmm, that's a lot of info, I'll have to read through that later, but anyhow, the following is from LANG=C glxinfo -B | grep -i '\(direct rendering\|opengl\|profile\)'

WARNING: dzn is not a conformant Vulkan implementation, testing use only.
WARNING: Some incorrect rendering might occur because the selected Vulkan device (Microsoft Direct3D12 (NVIDIA GeForce RTX 3060)) doesn't support base Zink requirements: feats.features.logicOp have_EXT_custom_border_color have_EXT_line_rasterization
direct rendering: Yes
    Preferred profile: core (0x1)
    Max core profile version: 4.6
    Max compat profile version: 4.6
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.1
OpenGL vendor string: Microsoft Corporation
OpenGL renderer string: D3D12 (NVIDIA GeForce RTX 3060)
OpenGL core profile version string: 4.6 (Core Profile) Mesa 24.2.1 - kisak-mesa PPA
OpenGL core profile shading language version string: 4.60
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL version string: 4.6 (Compatibility Profile) Mesa 24.2.1 - kisak-mesa PPA
OpenGL shading language version string: 4.60
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile
OpenGL ES profile version string: OpenGL ES 3.1 Mesa 24.2.1 - kisak-mesa PPA
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10

Directly on WSL2.

@toenails6
Copy link
Author

toenails6 commented Sep 5, 2024

Side note, my incentive behind all this is just so that I can run ROS2-Gazebo stuff for RL on windows (preferably using docker) without having to dual boot.
I'm a grad student, so there're other things I have to do on windows with my computer, and running on Linux kind of interferes with that.
I do see many people attempting to do the same throughout many forums over many recent years, but I do not know if they succeded or not. Many of them do indeed have some variant of issues with their graphics unit.

I see someone reportedly succeeding with ros-humble-ros-gzgarden in one of the Github posts mentioned above, so I thought to gives this some more chances.

Right now it seems, even if it has a chance of working, its reliability is very much up to questioning, as any small changes or updates seem to break things easily.

I personally feel like it would be very niche to have ROS2-Gazebo runnable on either WSL2 or docker on windows, and it has been many years. I'm not too much of an expert in different systems, but I would say this is a very good functionality to say the least. Is it true that this just simply cannot work?

@toenails6 toenails6 changed the title Gazebo cannot use Nvidia GPU, falls back to using CPU. Gazebo in Windows Docker cannot use Nvidia GPU, falls back to using CPU. Sep 5, 2024
@toenails6
Copy link
Author

If this really cannot work that way, I'll just go back to dual booting and leave this post as a warning to others who would attempt to do ROS2-Gazebo on Windows Docker. .

@toenails6
Copy link
Author

And to clarify, just in case of misunderstanding I'm not trying to run Docker inside WSL2.
The best case I would like is to have Windows Docker desktop create containers running all of my simulation needs.
I mentioned WSL2 in the description because Docker uses WSL2 as its engine.
I could also scratch by if it is only runnable directly on WSL2.
Worst case, just back to same old dual boot.

@toenails6
Copy link
Author

toenails6 commented Sep 5, 2024

I tried actually running a ball sim in docker, and performance is not the best but actually do-able:
image
image

Docker shows CPU usage at 46.12%, and task manager shows GPU usage at 21%.
Sim real time factor is 90% rock bottom, which is not bad(pretty good actually). FPS is on the low side of ~20FPS.
I get similar Real Time Factor with WSL2, but almost triple the framerate (I don't mind the framerate, I just need a good Real Time Factor, so both WSL2 and Docker are meeting my standards in this sim).
Albeit nvidia-smi still is not showing Gazebo process, but the GPU is somewhat being used. I'm not sure if it is being used at full capacity.
Does this mean nvidia-smi is not trust worthy or nvidia-smi just does not show Gazebo?
Is the GPU being used right? Can it be used more?
I guess this has a chance of working in it's current state, but I am not entirely sure, as I have not tried any complex sims yet.
I will also try all this without kisak MESA if I have the time. Using the apt MESA repository release gives silightlt different warnings, vgem whatnot.

@toenails6
Copy link
Author

Docker with apt MESA gives the same performance, just different errors and warnings:
image

[INFO] [ruby $(which gz) sim-1]: process started with pid [70]
[ruby $(which gz) sim-1] MESA: error: ZINK: failed to choose pdev
[ruby $(which gz) sim-1] glx: failed to create drisw screen
[ruby $(which gz) sim-1] libEGL warning: MESA-LOADER: failed to open vgem: /usr/lib/dri/vgem_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/x86_64-linux-gnu/dri:\$${ORIGIN}/dri:/usr/lib/dri, suffix _dri)
[ruby $(which gz) sim-1] 
[ruby $(which gz) sim-1] libEGL warning: MESA-LOADER: failed to open vgem: /usr/lib/dri/vgem_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/x86_64-linux-gnu/dri:\$${ORIGIN}/dri:/usr/lib/dri, suffix _dri)
[ruby $(which gz) sim-1]
[ruby $(which gz) sim-1] libEGL warning: NEEDS EXTENSION: falling back to kms_swrast

@toenails6
Copy link
Author

I don't know what to think here, have I inadvertently succeeded without knowing it was a success?
Are those libEGL warnings to be simply diregarded?

@bperseghetti
Copy link
Collaborator

nvidia-smi shows:
image
for mine on native linux, no clue if nvidia-smi even works properly in WSL or for that matter docker.

@bperseghetti
Copy link
Collaborator

I guess I'm confused as to why use docker at all if you are wanting to do RL research why not just do it all native in WSL2? Also looks like it's "running" maybe stress test it with a more difficult world? Like depot world?

@toenails6
Copy link
Author

Docker increases portability and its volume system helps with code archiving. But yes I could just use WSL2 natively.
Yes, I will definitely stress test it at some point.

@toenails6
Copy link
Author

I don't have anything right out of the box, stress testing might have to wait a while.

@toenails6
Copy link
Author

I would say the GPU is not being used to its max potential, Otherwise CPU usage should likely be a bit lower, but I am not sure.

@traversaro
Copy link
Contributor

I do not think nvidia-smi shows process that are using the GPU on Windows, at least at a first glance (tried with glxgears, it is typically easier to debug this kind of things with a simpler project first).

However, to make sure that the GPU (for rendering) is actually used, you can compare the CPU usage between:

LIBGL_ALWAYS_SOFTWARE=true gz sim

(that forces the CPU usage for rendering) and:

LIBGL_ALWAYS_SOFTWARE=false gz sim

that instead uses the GPU if available. In my case, I can see that LIBGL_ALWAYS_SOFTWARE=true gz sim reaches ~400% CPU usage, while LIBGL_ALWAYS_SOFTWARE=false gz sim ~100% CPU usage (note that with top to get the gz-sim usage you need to sum the two CPU usage of the ruby processes, assuming that we do not have any other ruby running in the system).

If LIBGL_ALWAYS_SOFTWARE=true gz sim is using more CPU, then I think we can me quite sure that you are using the GPU. At this point, I think you can try to achieve something similar in Docker by following the docs in https://github.com/microsoft/wslg/blob/main/samples/container/Containers.md, in particular the "Containerized applications access to the vGPU" section. Again, as you can see in the example you can use glxinfo -B | grep "renderer string" to check if "OpenGL renderer string:" is "D3D12 (NVIDIA GeForce RTX 3060)" (hardware accelerated) or "OpenGL renderer string: llvmpipe (LLVM 17.0.6, 256 bits)" (rendering done via CPU).

@toenails6
Copy link
Author

I will do this as soon as possible, thank you!.

@toenails6
Copy link
Author

Well I am seeing similar CPU usages, so no, the GPU was not actually used. My worries were not paranoia.
I keep checking the other Github link you have provided.

@toenails6
Copy link
Author

The mentioned GitHub post describes how to use docker to containerize applications inside WSLg, which is different from my situation.
glxinfo -B | grep "renderer string" shows:

MESA: error: ZINK: failed to choose pdev
glx: failed to create drisw screen
OpenGL renderer string: llvmpipe (LLVM 17.0.6, 256 bits)

so it likely attempts to use Nvidia, but fails due to some reasons and then falls back to using CPU for rendering.
Some people suggested using Kisak MESA, but GPU passthrough from Windows to Docker container still fails whichever version of MESA is being used.

@traversaro
Copy link
Contributor

What is your Dockerfile now? The zink error is there even for non-Docker WSLg, so I do not think it is relevant.

@toenails6
Copy link
Author

I am currently not using Kisak MESA:

# Set ROS distribution. 
ARG ROS_DISTRO=jazzy

# Use official image. 
FROM ros:${ROS_DISTRO}-ros-base

# ROS2 installs. 
RUN apt update && \
    apt install -y software-properties-common && \
    apt install -y curl && \
    apt install -y wget && \
    apt install -y nano && \
    apt install -y pciutils
RUN apt install -y ros-${ROS_DISTRO}-turtlesim
RUN apt install -y ros-${ROS_DISTRO}-rqt*
RUN apt install -y ros-${ROS_DISTRO}-ros-gz*

# Install MESA virtual acceleration drivers and graphics utility tools. 
RUN apt install -y mesa-va-drivers
RUN apt install -y vainfo
RUN apt install -y mesa-utils
RUN apt install -y ffmpeg libsm6 libxext6
RUN apt install -y libgl1-mesa-dev libosmesa6-dev

# Display environment setup. 
ENV DISPLAY=host.docker.internal:0.0
ENV QT_X11_NO_MITSHM=1
ENV NVIDIA_DRIVER_CAPABILITIES=all
RUN echo "export XDG_RUNTIME_DIR=/usr/local/xdg" >> ~/.bashrc
# RUN echo "export RUNLEVEL=3" >> ~/.bashrc

# Use Stable version of MESA. 
# RUN add-apt-repository ppa:kisak/kisak-mesa && \
#     apt update && \
#     apt install -y mesa-utils && \
#     apt install -y ffmpeg libsm6 libxext6 && \
#     apt install -y libgl1-mesa-dev libosmesa6-dev && \
#     apt upgrade -y

# Check versions. 
RUN apt update && \
    apt upgrade -y

# Source ROS2 for default terminal. 
RUN echo "source /opt/ros/${ROS_DISTRO}/setup.bash" >> ~/.bashrc

@traversaro
Copy link
Contributor

traversaro commented Sep 6, 2024

The mentioned GitHub post describes how to use docker to containerize applications inside WSLg, which is different from my situation.

I am not sure what you meant here, the section "Containerized applications access to the vGPU" in https://github.com/microsoft/wslg/blob/main/samples%2Fcontainer%2FContainers.md describes how to ensure that a docker process uses the physical GPU, how is this different from your use case?

@traversaro
Copy link
Contributor

I am currently not using Kisak MESA:

# Set ROS distribution. 
ARG ROS_DISTRO=jazzy

# Use official image. 
FROM ros:${ROS_DISTRO}-ros-base

# ROS2 installs. 
RUN apt update && \
    apt install -y software-properties-common && \
    apt install -y curl && \
    apt install -y wget && \
    apt install -y nano && \
    apt install -y pciutils
RUN apt install -y ros-${ROS_DISTRO}-turtlesim
RUN apt install -y ros-${ROS_DISTRO}-rqt*
RUN apt install -y ros-${ROS_DISTRO}-ros-gz*

# Install MESA virtual acceleration drivers and graphics utility tools. 
RUN apt install -y mesa-va-drivers
RUN apt install -y vainfo
RUN apt install -y mesa-utils
RUN apt install -y ffmpeg libsm6 libxext6
RUN apt install -y libgl1-mesa-dev libosmesa6-dev

# Display environment setup. 
ENV DISPLAY=host.docker.internal:0.0
ENV QT_X11_NO_MITSHM=1
ENV NVIDIA_DRIVER_CAPABILITIES=all
RUN echo "export XDG_RUNTIME_DIR=/usr/local/xdg" >> ~/.bashrc
# RUN echo "export RUNLEVEL=3" >> ~/.bashrc

# Use Stable version of MESA. 
# RUN add-apt-repository ppa:kisak/kisak-mesa && \
#     apt update && \
#     apt install -y mesa-utils && \
#     apt install -y ffmpeg libsm6 libxext6 && \
#     apt install -y libgl1-mesa-dev libosmesa6-dev && \
#     apt upgrade -y

# Check versions. 
RUN apt update && \
    apt upgrade -y

# Source ROS2 for default terminal. 
RUN echo "source /opt/ros/${ROS_DISTRO}/setup.bash" >> ~/.bashrc

Are you mapping the directories and setting LD_LIBRARY_PATH=/usr/lib/wsl/lib as documented in the wslg container docs?

@toenails6
Copy link
Author

That GitHub post describes the use of containers within WSLg.
Basically running Docker within WSLg for containerization purposes. For them, the GPU first is passed to WSLg, then the Docker Containers within WSLg.
This assumes the GPU passthrough to WSLg is without issue in the first place, but for me this is the first step which is already problematic.
I think the situations are fundamentally different.
The Docker image in that post needs WSL mapping because it is WSL that passes GPU usage to the containers.
Again that assumes WSL should have access.

I am trying to run Docker directly on windows, and Docker Desktop on Windows uses WSL2 as its engine. That first step of WSL2 correctly accessing the GPU is my problem.

@toenails6
Copy link
Author

sudo docker build -t videoaccel -f Dockerfile.videoaccel .
sudo docker run -it -v /tmp/.X11-unix:/tmp/.X11-unix -v /mnt/wslg:/mnt/wslg \
    -v /usr/lib/wsl:/usr/lib/wsl --device=/dev/dxg -e DISPLAY=$DISPLAY \
    --device /dev/dri/card0 --device /dev/dri/renderD128 \
    -e WAYLAND_DISPLAY=$WAYLAND_DISPLAY -e XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR \
    -e PULSE_SERVER=$PULSE_SERVER --gpus all videoaccel

These docker build and run commands are obviously linux commands, not windows commands.

@toenails6
Copy link
Author

To clarify again:
The post https://github.com/microsoft/wslg/blob/main/samples%2Fcontainer%2FContainers.md describes having Docker containers within WSLg using the vGPU provided by WSLg.

My problem is trying to get Docker images, which use WSL2 as their engines, to have a vGPU correctly from windows.

@bperseghetti
Copy link
Collaborator

To clarify again:
The post https://github.com/microsoft/wslg/blob/main/samples%2Fcontainer%2FContainers.md describes having Docker containers within WSLg using the vGPU provided by WSLg.

My problem is trying to get Docker images, which use WSL2 as their engines, to have a vGPU correctly from windows.

And you tried? https://docs.docker.com/desktop/gpu/

If it's a docker GPU to WSL2 issue that sounds like a problem outside of Gazebo (more a problem with docker, windows and NVidia not playing nicely). Considering it works without that on "native WSL2" I'm leaning towards this being an unsupported edge case.

@toenails6
Copy link
Author

Right now it does not work with native WSL2 either.
Apparently in the past, sometimes it worked, then later failed after updates.
There are others that have tried applications that require GPU on native WSL2, and have similar issues.
https://forums.developer.nvidia.com/t/vgem-dri-so-file-is-missing-libegl-warning/244383/3
Many people say this is a WSL inherent issue and have no solution as of yet.

@toenails6
Copy link
Author

I just wanted to give this a shot, but it seems that I have to fall back to dual boot.

@bperseghetti
Copy link
Collaborator

Or maybe weigh just running it in WSL2 without Docker Desktop.

@toenails6
Copy link
Author

I have tried it, and I did put some results above.
Native WSL2 has the same issues as Docker Desktop.
Docker has these issues because it is using WSL2 as its engine on Windows.
The issues stem from WSL2.

@toenails6
Copy link
Author

The Nvidia forum mentioned above used WSL2, and they have the same problems.
https://forums.developer.nvidia.com/t/vgem-dri-so-file-is-missing-libegl-warning/244383/3

@traversaro
Copy link
Contributor

traversaro commented Sep 7, 2024

Sorry, there is something I am missing. In #2595 (comment) you report that the output for "glxinfo" in the "OpenGL renderer string" reported the use of d3d12, that is why I assumed the problem with WSL without docker) was solved, and you were trying to get it working on Docker. Now instead (still with WSL with no Docker) glxinfo -B reports the use of llvmpipe (i.e. software rendering), did you changed something in your system?

@toenails6
Copy link
Author

Hmm, there was actually something I changed. I stopped using Kisak MESA because Kisak MESA does not quite work either.
But even so, the Kisak MESA drivers will fail and fall back to using CPU instead of GPU.

@traversaro
Copy link
Contributor

I would first focus on making sure that glxinfo -B lists "OpenGL renderer string: D3D12 (NVIDIA GeForce RTX 3060)". Once that works, there is hope GPU will work, otherwise not. However if you are on Ubuntu 24.04 d3d12 provide by default mesa should work fine. Are you sure that you do not have anything strange setbin your environment variables such as LIBGL_ALWAYS_SOFTWARE=1? If you are in doubt and you can, it make be an option to reinstall the Ubuntu 24.04 WSL image to start from a clean slate.

@toenails6
Copy link
Author

I checked CPU and GPU usages for using and not-using GPU for Gazebo sim on WSL2 after falling back to Kisak-MESA:
image
image
image
image

CPU usage is around 33% while using GPU, with GPU at 24%, CPU usage goes 60% when specifying not to use the GPU, and GPU usage then is around 2%.
Framerates are also obviously different.
The terminal logs seem to be strange, but the results are clear.
It seems that native WSL2 is actually using the GPU and that I was mistaken before.
Again, the warnings given in the logs do not seem to mean that the GPU was not used.
Hmm, seems like this does work on windows after all, that is incredibly good news.
Although now this is clearly a Docker issue. I will have to look into this when I have the time.

@toenails6
Copy link
Author

Thank you guys for all the help, the commands and packages you've referred to were incredibly useful!
Although I would like to keep this issue open for the time being, until I have figured out how to do this on Docker.
Maybe someone might find this post useful one day.

@samanipour
Copy link

The following Solution worked for me.

Problem description

About a week ago, I started using Gazebo Harmonic and had the same problem: It won't use the Nvidia Geforce GPU (in my case, GTX 1050), and the load was on the CPU during the rendering. I also tested the dual-boot solution. It gave me better FPS, but the problem remained the same as before, and all load was on the CPU, so I finally decided to go back to Ubuntu 24.04 based on WSL2.

My Solution (Especially for those who should or want to use Windows)

before starting, I should mention that I tested Gazebo Harmonic and Gazebo Fortress using both Windows WSL2, dual-booted Ubuntu (both Jammy-22.04 and Noble 24.04 ), and also Windows binary packages provided by Conda and Conda-forge.
Based on these experiences, I think the best solution is to use the following combination:

  • Ubuntu Noble 24.04 on Windows WSL2
  • Gazebo Fortress
  • ROS2 Jazzy (If you are a ROS2 user)

Solution Steps

  1. Install the proper Nvidia driver for your graphic card on Windows (https://www.nvidia.com/en-us/geforce/drivers). According to the official Microsoft and Nvidia, you don't need to and should not install the Nvidia graphic card driver on the Linux WSL2 distribution because the WSL uses the host graphic card driver on Windows.

  2. Install WSL2 and then Ubuntu 24.04 (https://learn.microsoft.com/en-us/windows/wsl/install)

  3. Install Mesa package:
    sudo apt install mesa-utils

  4. Change Mesa profile to use Nvidia graphic card instead of integrated intel GPU:
    export MESA_D3D12_DEFAULT_ADAPTER_NAME=NVIDIA
    then check if it correctly identifies your graphic card model
    glxinfo -B

  5. Install the Gazebo Ubuntu binary package as described in the documentation (https://gazebosim.org/docs/fortress/install_ubuntu/).

  6. Make Gazebo use the graphic card during the rendering:
    export LIBGL_ALWAYS_SOFTWARE=false

  7. Start the Gazebo simulation, pick and run one of the examples:
    gz sim

  8. Run the Gazebo simulation and check the graphic card load on the Linux or Windows task manager:
    Check in the Linux:
    watch -n0.1 nvidia-smi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Inbox
Development

No branches or pull requests

4 participants