crashes on startup for PRIME enabled hardware thats configured to use the onboard nvidia not the intel #88

davidbuzz · 2024-02-22T21:01:00Z

... seems to specifically choose the incorrect (intel) video card even tho 'prime-select query' output says 'nvidia'

temporary workaround, switch-around your prime configuration to use the worse 'intel" video card, reboot, and then it won't crash.
run: sudo prime-select intel
reboot

Edit: vulkaninfo

davidbuzz · 2024-02-22T21:02:52Z

discussed with @kvark in discord, and #84 and #86 were unsuccessful attempts at improving/fixing this issue.

kvark · 2024-02-23T05:27:38Z

Based on the Discord discussion, even vkcube doesn't work on that setup when using Intel GPU.
There is a lot of Nvidia-related issues (or instances of a single issue?): NVIDIA/open-gpu-kernel-modules#317, gfx-rs/wgpu#4775, NVIDIA/egl-wayland#72, and others.
Could you share the info about your driver version and X11/Wayland environment?

davidbuzz · 2024-02-23T23:20:58Z

Setup 1:
My default/preferred configuration is 'sudo prime-select nvidia', which results in 'vulkaninfo' showing 3 devices.. the nvidia, the intel, and the mesa software-renderer/llvmpipe GPUs. In this configuration blade/zed doesn't work, as it keeps choosing to use the Intel hardware, incorrectly.
vkcube --gpu_number 0
[ this is the nvidia hardware], and vkcube runs great, but blade and Zed refuses to use this device.
[because 'prime-select query' shows its using nvidia]
vkcube --gpu_number 1
[ this is the intel hardware], and vkcube crashes in this configuration, and blade and Zed crash in this configuration.

Setup 2:
After doing 'sudo prime-select intel', and rebooting makes vkcube work... but its entirely ignoring the nvidia hardware at that point. At this point, 'vulkaninfo' only shows 2 devices ( the nvidia hardware is no-more in the list , so it has the Intel, and the llvmpipe GPUs).
vkcube --gpu_number 0
Selected GPU 0: Intel(R) UHD Graphics (CML GT2), type: IntegratedGpu
[ device zero in this configuration is Intel, as the nvidia device has gone-away as a result of 'prime-select intel' above]
vkcube --gpu_number 1
Selected GPU 1: llvmpipe (LLVM 15.0.7, 256 bits), type: Cpu
[ device 1 is the mesa softrware renderer llvmpime ], and it works too

In this 'Setup 2'... Zed and the blade 'bunnymark' example both work great, no issue here... but thats not how most people with a PRIME setup and nvidia hardware are gonna be using it

Summary:
In both these configurations blade/Zed appears to try to use the 'intel' hardware.. Adapter "Intel(R) UHD Graphics (CML GT2)", and obviously, it shouldn't.

davidbuzz · 2024-02-24T04:22:31Z

$ uname -a
Linux buzzlap 6.5.0-21-generic #21-Ubuntu SMP PREEMPT_DYNAMIC Wed Feb 7 14:17:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=23.10
DISTRIB_CODENAME=mantic
DISTRIB_DESCRIPTION="Ubuntu 23.10"

$ nvidia-smi

Sat Feb 24 14:20:53 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.223.02   Driver Version: 470.223.02   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro T2000 wi...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   76C    P5     8W /  N/A |    796MiB /  3914MiB |     14%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      4989      G   /usr/lib/xorg/Xorg                402MiB |
|    0   N/A  N/A      6567      G   /usr/bin/gnome-shell              155MiB |
|    0   N/A  N/A      6594      G   ...mviewer/tv_bin/TeamViewer        1MiB |
|    0   N/A  N/A      7288      G   ...RendererForSitePerProcess       34MiB |
|    0   N/A  N/A     11137      G   ...9/usr/lib/firefox/firefox      199MiB |
+-----------------------------------------------------------------------------+

davidbuzz · 2024-02-24T22:48:25Z

note to say that updating the nvidia driver to 550 didn't magically make it work, but did change a few things ( like the number of reported devices in 'vulkaninfo' output is now 4 for me ( one intel, two nvidia, and the mesa driver ) , and vkcube now seems to run no matter which of the 4 devices I choose ...
'sudo prime-select nvidia' was also run after a driver change
vkcube --gpu_number 0
vkcube --gpu_number 1
vkcube --gpu_number 2
vkcube --gpu_number 3
[ these all run... but blade crashes with a different validation error now]

davidbuzz · 2024-02-24T23:02:23Z

vkcube without explicity choosing a device , chooses the nvidia hardware.... let me see...

The ordering of the GPU's output by these two vulkan commands is different ... vulkaninfo reports GPU0 as Intel, and vkcuke when specifying a GPU reports that 'GPU0' is the nvidia.. so their "get a list of vulcan deivces" code is using a different ordering/numbering/indexing ...?

vulkaninfo --summary | egrep '(GPU|deviceName)'
GPU0:
    deviceType         = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
    deviceName         = Intel(R) UHD Graphics (CML GT2)
GPU1:
    deviceType         = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
    deviceName         = Quadro T2000 with Max-Q Design
GPU2:
    deviceType         = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
    deviceName         = Quadro T2000 with Max-Q Design
GPU3:
    deviceName         = llvmpipe (LLVM 15.0.7, 256 bits)

$vkcube --gpu_number 0
Selected GPU 0: Quadro T2000 with Max-Q Design, type: DiscreteGpu

$ vkcube --gpu_number 1
Selected GPU 1: Quadro T2000 with Max-Q Design, type: DiscreteGpu

$ vkcube --gpu_number 2
Selected GPU 2: Intel(R) UHD Graphics (CML GT2), type: IntegratedGpu

$ vkcube --gpu_number 3
Selected GPU 3: llvmpipe (LLVM 15.0.7, 256 bits), type: Cpu

is that important?
I think u want to do whatever vkcube is doing, not what vulcaninfo is doing.

kvark · 2024-02-27T06:13:53Z

Yes, we want to do what vkcube is doing, if possible.

[ these all run... but blade crashes with a different validation error now]

Please post the exact error

davidbuzz · 2024-03-03T02:59:31Z

command: blade]$ RUST_LOG=blade_graphics=debug RUST_BACKTRACE=1 cargo run --example bunnymark > buzz.550.bunnymark.validation2.error.txt 2>&1

output:

   Compiling blade-graphics v0.3.0 (/home/buzz/blade/blade-graphics)
   Compiling blade-render v0.2.0 (/home/buzz/blade/blade-render)
   Compiling blade-egui v0.2.0 (/home/buzz/blade/blade-egui)
   Compiling blade v0.2.0 (/home/buzz/blade)
    Finished dev [unoptimized + debuginfo] target(s) in 13.66s
     Running `target/debug/examples/bunnymark`
[2024-03-03T02:55:57Z INFO  blade_graphics::hal::init] Adapter "Intel(R) UHD Graphics (CML GT2)"
[2024-03-03T02:55:57Z INFO  blade_graphics::hal::init] No ray tracing extensions are supported
[2024-03-03T02:55:57Z DEBUG blade_graphics::hal::init] Adapter AdapterCapabilities {
        api_version: 4206847,
        properties: PhysicalDeviceProperties {
            api_version: 4206847,
            driver_version: 96477185,
            vendor_id: 32902,
            device_id: 39876,
            device_type: INTEGRATED_GPU,
            device_name: "Intel(R) UHD Graphics (CML GT2)",
            pipeline_cache_uuid: [
                160,
                145,
                135,
                32,
                32,
                75,
                120,
                124,
                207,
                186,
                129,
                7,
                1,
                126,
                156,
                91,
            ],
            limits: PhysicalDeviceLimits {
                max_image_dimension1_d: 16384,
                max_image_dimension2_d: 16384,
                max_image_dimension3_d: 2048,
                max_image_dimension_cube: 16384,
                max_image_array_layers: 2048,
                max_texel_buffer_elements: 134217728,
                max_uniform_buffer_range: 134217728,
                max_storage_buffer_range: 4294967295,
                max_push_constants_size: 128,
                max_memory_allocation_count: 4294967295,
                max_sampler_allocation_count: 65536,
                buffer_image_granularity: 1,
                sparse_address_space_size: 0,
                max_bound_descriptor_sets: 8,
                max_per_stage_descriptor_samplers: 65535,
                max_per_stage_descriptor_uniform_buffers: 64,
                max_per_stage_descriptor_storage_buffers: 65535,
                max_per_stage_descriptor_sampled_images: 65535,
                max_per_stage_descriptor_storage_images: 65535,
                max_per_stage_descriptor_input_attachments: 64,
                max_per_stage_resources: 4294967295,
                max_descriptor_set_samplers: 393210,
                max_descriptor_set_uniform_buffers: 384,
                max_descriptor_set_uniform_buffers_dynamic: 8,
                max_descriptor_set_storage_buffers: 393210,
                max_descriptor_set_storage_buffers_dynamic: 8,
                max_descriptor_set_sampled_images: 393210,
                max_descriptor_set_storage_images: 393210,
                max_descriptor_set_input_attachments: 256,
                max_vertex_input_attributes: 29,
                max_vertex_input_bindings: 31,
                max_vertex_input_attribute_offset: 2047,
                max_vertex_input_binding_stride: 4095,
                max_vertex_output_components: 128,
                max_tessellation_generation_level: 64,
                max_tessellation_patch_size: 32,
                max_tessellation_control_per_vertex_input_components: 128,
                max_tessellation_control_per_vertex_output_components: 128,
                max_tessellation_control_per_patch_output_components: 128,
                max_tessellation_control_total_output_components: 2048,
                max_tessellation_evaluation_input_components: 128,
                max_tessellation_evaluation_output_components: 128,
                max_geometry_shader_invocations: 32,
                max_geometry_input_components: 128,
                max_geometry_output_components: 128,
                max_geometry_output_vertices: 256,
                max_geometry_total_output_components: 1024,
                max_fragment_input_components: 116,
                max_fragment_output_attachments: 8,
                max_fragment_dual_src_attachments: 1,
                max_fragment_combined_output_resources: 131078,
                max_compute_shared_memory_size: 65536,
                max_compute_work_group_count: [
                    65535,
                    65535,
                    65535,
                ],
                max_compute_work_group_invocations: 1024,
                max_compute_work_group_size: [
                    1024,
                    1024,
                    1024,
                ],
                sub_pixel_precision_bits: 8,
                sub_texel_precision_bits: 8,
                mipmap_precision_bits: 8,
                max_draw_indexed_index_value: 4294967295,
                max_draw_indirect_count: 4294967295,
                max_sampler_lod_bias: 16.0,
                max_sampler_anisotropy: 16.0,
                max_viewports: 16,
                max_viewport_dimensions: [
                    16384,
                    16384,
                ],
                viewport_bounds_range: [
                    -32768.0,
                    32767.0,
                ],
                viewport_sub_pixel_bits: 13,
                min_memory_map_alignment: 4096,
                min_texel_buffer_offset_alignment: 16,
                min_uniform_buffer_offset_alignment: 64,
                min_storage_buffer_offset_alignment: 4,
                min_texel_offset: -8,
                max_texel_offset: 7,
                min_texel_gather_offset: -32,
                max_texel_gather_offset: 31,
                min_interpolation_offset: -0.5,
                max_interpolation_offset: 0.4375,
                sub_pixel_interpolation_offset_bits: 4,
                max_framebuffer_width: 16384,
                max_framebuffer_height: 16384,
                max_framebuffer_layers: 2048,
                framebuffer_color_sample_counts: TYPE_1 | TYPE_2 | TYPE_4 | TYPE_8 | TYPE_16,
                framebuffer_depth_sample_counts: TYPE_1 | TYPE_2 | TYPE_4 | TYPE_8 | TYPE_16,
                framebuffer_stencil_sample_counts: TYPE_1 | TYPE_2 | TYPE_4 | TYPE_8 | TYPE_16,
                framebuffer_no_attachments_sample_counts: TYPE_1 | TYPE_2 | TYPE_4 | TYPE_8 | TYPE_16,
                max_color_attachments: 8,
                sampled_image_color_sample_counts: TYPE_1 | TYPE_2 | TYPE_4 | TYPE_8 | TYPE_16,
                sampled_image_integer_sample_counts: TYPE_1 | TYPE_2 | TYPE_4 | TYPE_8 | TYPE_16,
                sampled_image_depth_sample_counts: TYPE_1 | TYPE_2 | TYPE_4 | TYPE_8 | TYPE_16,
                sampled_image_stencil_sample_counts: TYPE_1 | TYPE_2 | TYPE_4 | TYPE_8 | TYPE_16,
                storage_image_sample_counts: TYPE_1,
                max_sample_mask_words: 1,
                timestamp_compute_and_graphics: 1,
                timestamp_period: 83.333336,
                max_clip_distances: 8,
                max_cull_distances: 8,
                max_combined_clip_and_cull_distances: 8,
                discrete_queue_priorities: 2,
                point_size_range: [
                    0.125,
                    255.875,
                ],
                line_width_range: [
                    0.0,
                    8.0,
                ],
                point_size_granularity: 0.125,
                line_width_granularity: 0.0078125,
                strict_lines: 0,
                standard_sample_locations: 1,
                optimal_buffer_copy_offset_alignment: 128,
                optimal_buffer_copy_row_pitch_alignment: 128,
                non_coherent_atom_size: 64,
            },
            sparse_properties: PhysicalDeviceSparseProperties {
                residency_standard2_d_block_shape: 0,
                residency_standard2_d_multisample_block_shape: 0,
                residency_standard3_d_block_shape: 0,
                residency_aligned_mip_size: 0,
                residency_non_resident_strict: 0,
            },
        },
        queue_family_index: 0,
        layered: false,
        ray_tracing: false,
        buffer_marker: false,
        shader_info: false,
    }
[2024-03-03T02:55:57Z INFO  blade_graphics::hal::resource] Creating texture 0x84c0580000000017 of size 1x1x1 and format Rgba8Unorm, name 'texutre', handle 0
[2024-03-03T02:55:57Z INFO  blade_graphics::hal::resource] Creating buffer 0x95a125000000001a of size 4, name 'staging', handle 1
[2024-03-03T02:55:58Z INFO  blade_graphics::hal::resource] Destroying buffer 0x95a125000000001a, handle 1
SYNC-HAZARD-WRITE-AFTER-WRITE(ERROR / SPEC): msgNum: 1544472022 - Validation Error: [ SYNC-HAZARD-WRITE-AFTER-WRITE ] Object 0: handle = 0xf443490000000006, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x5c0ec5d6 | vkCmdPipelineBarrier():  Hazard WRITE_AFTER_WRITE for image barrier 0 VkImage 0xf443490000000006[]. Access info (usage: SYNC_IMAGE_LAYOUT_TRANSITION, prior_usage: SYNC_COLOR_ATTACHMENT_OUTPUT_COLOR_ATTACHMENT_WRITE, write_barriers: 0, command: vkCmdEndRenderingKHR, seq_no: 5, reset_no: 1).
    Objects: 1
        [0] 0xf443490000000006, type: 10, name: NULL
SYNC-HAZARD-WRITE-AFTER-WRITE(ERROR / SPEC): msgNum: 1544472022 - Validation Error: [ SYNC-HAZARD-WRITE-AFTER-WRITE ] Object 0: handle = 0xcb3ee80000000007, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x5c0ec5d6 | vkCmdPipelineBarrier():  Hazard WRITE_AFTER_WRITE for image barrier 0 VkImage 0xcb3ee80000000007[]. Access info (usage: SYNC_IMAGE_LAYOUT_TRANSITION, prior_usage: SYNC_COLOR_ATTACHMENT_OUTPUT_COLOR_ATTACHMENT_WRITE, write_barriers: 0, command: vkCmdEndRenderingKHR, seq_no: 5, reset_no: 3).
    Objects: 1
        [0] 0xcb3ee80000000007, type: 10, name: NULL
SYNC-HAZARD-WRITE-AFTER-WRITE(ERROR / SPEC): msgNum: 1544472022 - Validation Error: [ SYNC-HAZARD-WRITE-AFTER-WRITE ] Object 0: handle = 0xead9370000000008, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x5c0ec5d6 | vkCmdPipelineBarrier():  Hazard WRITE_AFTER_WRITE for image barrier 0 VkImage 0xead9370000000008[]. Access info (usage: SYNC_IMAGE_LAYOUT_TRANSITION, prior_usage: SYNC_COLOR_ATTACHMENT_OUTPUT_COLOR_ATTACHMENT_WRITE, write_barriers: 0, command: vkCmdEndRenderingKHR, seq_no: 5, reset_no: 3).
    Objects: 1
        [0] 0xead9370000000008, type: 10, name: NULL

kvark · 2024-03-04T05:09:53Z

pretty sure it's https://gitlab.freedesktop.org/mesa/mesa/-/issues/4688
see also - gfx-rs/wgpu#1898

flukejones · 2024-03-10T03:10:32Z

What is the exact situation here? Is it:

Xorg is used and configured to use dgpu as primary?
A laptop which has a MUX switch?
Or something else? There is a rather large lack of information here.

I had a similar issue with WGPU, gfx-rs/wgpu#4110, solved by checking the mesa version and if less than 21.2 it is disabled.

(I wrote and maintain https://gitlab.com/asus-linux/supergfxctl/, so I have a fairly decent understanding of the hardware level but not so much actual use)

flukejones · 2024-03-11T04:52:32Z

@davidbuzz I need to knwo more info about this. I noticed:

You are using Xorg
You say you "configured to use the onboard nvidia not the intel"

This to me implies that you are using xorg-dgpu mode. Something that is pretty much a hack and not necessary these days. As a result of this I think some incorrect assumptions have been made.

Can you please verify for me that under a wayland session blade works perfectly fine without the blocking commit? Given that the Linux world is likely going to be defaulting to Wayland by the end of the year if not by this quarter the resulting knee-capping of everyone because of this one unique and not very well supported use-case isn't justified.

davidbuzz · 2024-03-11T14:37:48Z

@flukejones ... its a nice Dell laptop with both Intel graphics and Nvidia Graphics. Its an integration that at the hardware is called 'Nvidia Optimus' and the software/switcher/etc is called 'Nvidia Prime'. [Google both those for more]
This is one of the available things u can do... ie "which video card do i want to use by-default, for all apps i launch unless changed":
'sudo prime-select intel'
'sudo prime-select nvidia'

The nvidia card being more powerful, amd this laptop always being on power, and it doing some pretty busy stuff, i keep the nvidia active and in-use all the time by running the 2nd of those commands , and just leavong it like that.

flukejones · 2024-03-11T18:04:32Z

Right. So xorg configured to use nvidia as primary is the entire cause of the issue you had.

Any work around needs to check just that one thing, not chop everything off at the knees for everybody else.

I suggest you give KDE 6 Wayland a try if you can, it works very very well on hybrid setups

kvark · 2024-03-12T05:32:45Z

@flukejones thanks for your input!
Look like @davidbuzz 's driver is much newer than 21.2 (see Vulkaninfo in the issue description):

driverInfo = Mesa 23.2.1-1ubuntu3.1

So it would not make sense to try to port your wgpu PR here. Or at least, it wouldn't help this issue in particular.

Right. So xorg configured to use nvidia as primary is the entire cause of the issue you had.
Any work around needs to check just that one thing, not chop everything off at the knees for everybody else.

Any idea how to detect this specifically (i.e. without chopping everything off at the knees)?

flukejones · 2024-03-12T08:05:42Z

Lets keep new discussion on the linked issue :)

kvark added the type: bug Something isn't working label Feb 23, 2024

kvark mentioned this issue Mar 4, 2024

Disable presentation on Intel when Nvidia is detected #92

Merged

kvark closed this as completed in #92 Mar 4, 2024

flukejones mentioned this issue Mar 11, 2024

Don't block use of Intel on Nvidia hybrid systems. #93

Closed

kvark mentioned this issue May 10, 2024

[linux] Vulkan ERROR_INITIALIZATION_FAILED zed-industries/zed#8168

Open

1 task

kvark mentioned this issue Jul 16, 2024

Panics in submit when running under X11 #139

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

crashes on startup for PRIME enabled hardware thats configured to use the onboard nvidia not the intel #88

crashes on startup for PRIME enabled hardware thats configured to use the onboard nvidia not the intel #88

davidbuzz commented Feb 22, 2024 •

edited by kvark

Loading

davidbuzz commented Feb 22, 2024

kvark commented Feb 23, 2024 •

edited

Loading

davidbuzz commented Feb 23, 2024 •

edited

Loading

davidbuzz commented Feb 24, 2024 •

edited

Loading

davidbuzz commented Feb 24, 2024 •

edited

Loading

davidbuzz commented Feb 24, 2024

kvark commented Feb 27, 2024

davidbuzz commented Mar 3, 2024

kvark commented Mar 4, 2024

flukejones commented Mar 10, 2024 •

edited

Loading

flukejones commented Mar 11, 2024

davidbuzz commented Mar 11, 2024

flukejones commented Mar 11, 2024

kvark commented Mar 12, 2024

flukejones commented Mar 12, 2024

crashes on startup for PRIME enabled hardware thats configured to use the onboard nvidia not the intel #88

crashes on startup for PRIME enabled hardware thats configured to use the onboard nvidia not the intel #88

Comments

davidbuzz commented Feb 22, 2024 • edited by kvark Loading

davidbuzz commented Feb 22, 2024

kvark commented Feb 23, 2024 • edited Loading

davidbuzz commented Feb 23, 2024 • edited Loading

davidbuzz commented Feb 24, 2024 • edited Loading

davidbuzz commented Feb 24, 2024 • edited Loading

davidbuzz commented Feb 24, 2024

kvark commented Feb 27, 2024

davidbuzz commented Mar 3, 2024

kvark commented Mar 4, 2024

flukejones commented Mar 10, 2024 • edited Loading

flukejones commented Mar 11, 2024

davidbuzz commented Mar 11, 2024

flukejones commented Mar 11, 2024

kvark commented Mar 12, 2024

flukejones commented Mar 12, 2024

davidbuzz commented Feb 22, 2024 •

edited by kvark

Loading

kvark commented Feb 23, 2024 •

edited

Loading

davidbuzz commented Feb 23, 2024 •

edited

Loading

davidbuzz commented Feb 24, 2024 •

edited

Loading

davidbuzz commented Feb 24, 2024 •

edited

Loading

flukejones commented Mar 10, 2024 •

edited

Loading