Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should descriptor_pool_info include FREE_DESCRIPTOR_SET flag? #117

Closed
dannycoates opened this issue May 14, 2024 · 12 comments
Closed

Should descriptor_pool_info include FREE_DESCRIPTOR_SET flag? #117

dannycoates opened this issue May 14, 2024 · 12 comments

Comments

@dannycoates
Copy link

From zed-industries/zed#10018 (comment)

@kvark
Copy link
Owner

kvark commented May 15, 2024

VK_DESCRIPTOR_POOL_CREATE_FREE_DESCRIPTOR_SET_BIT specifies that descriptor sets can return their individual allocations to the pool, i.e. all of vkAllocateDescriptorSets, vkFreeDescriptorSets, and vkResetDescriptorPool are allowed. Otherwise, descriptor sets allocated from the pool must not be individually freed back to the pool, i.e. only vkAllocateDescriptorSets and vkResetDescriptorPool are allowed.

We aren't doing any vkFreeDescriptorSets so this flag is meaningless.
In Blade, descriptor pools are assigned to command encoders, and are reset when the encoders are reset.

This looks like one of the following things to me:

  1. A driver bug. Intel Vulkan drivers aren't expecting our usage pattern. We'll need to reach out to them and possibly land a vendor-specific workaround in Blade, such as specifying VK_DESCRIPTOR_POOL_CREATE_FREE_DESCRIPTOR_SET_BIT
  2. We are doing something wrong. Could you run with Vulkan Configurator that enables the Validation and see if anything pops up when it runs out of pool memory?

@progmboy
Copy link

progmboy commented May 27, 2024

After several hours of debugging, I can confirm that this issue only occurs with Intel's drivers.

filename: igvk64.dll
fileversion: 31.0.101.4502
filedescription: Vulkan(R) Driver for Intel(R) Graphics Accelerator
__int64 __fastcall vkDescriptorAllocatePoolSet(__int64 a1, __int64 a2, __int64 a3)
{
	...
	// Check descriptor_pool->dword424 is bigger than set.
	// the throw VK_ERROR_OUT_OF_POOL_MEMORY error.
	//
	// When this error occurs.
	// 0:000> dd 0x00000212bdec4878+424 l1
	// 00000212`bdec4c9c  0000ea60
	// 0:000> dd 0x00000212bdec4878+420 l1
	// 00000212`bdec4c98  0000ea60
	//
	if ( v11 && (unsigned int)(descriptor_pool->dword424 + 1) > descriptor_pool->dword420 )
        {
            sub_180470550(&v36, 20LL);
            v37[0] = v36;
            CxxThrowException(v37, (_ThrowInfo *)&_TI1_AUException_VK__);
        }
	...

	// here inc dword424 
        *(_DWORD *)(v14 + 68) = descriptor_pool->dword424++;
        v30 = v12;
        v20 = v11;
        v21 = playout;
}
__int64 __fastcall vkResetDescriptorPool(__int64 hDevice, struct_descriptor_pool *descriptor_pool)
{
	// 
	// VK_DESCRIPTOR_POOL_CREATE_FREE_DESCRIPTOR_SET_BIT = 1
	// defalut descriptor_pool->flags == 0
	// 
	if ( descriptor_pool->flags ){
		...
	
		//
		//  void __fastcall sub_18039C3B0(struct_descriptor_pool *descriptor_pool, __int64 a2)
		//  {
		//	 ...
		//	 // ONLY this way to dec descriptor_pool->dword424
		//      descriptor_pool->dword424 -= *(_DWORD *)(a2 + 68) != -1;
  		//      v16 = *(unsigned int *)(a2 + 68);
		//      ...
		// 	}
		//
	
		sub_18039C3B0(descriptor_pool, v11);
		...
	}else{
		// NERVER DEC descriptor_pool->dword424 
	}
}

@kvark
Copy link
Owner

kvark commented May 28, 2024

nice work, @progmboy ! Would you be able to file this upstream to Intel folks? Seems rather serious.

side note, things are going to be slightly better with #118 now

@skejeton
Copy link
Contributor

skejeton commented Jun 8, 2024

@kvark Does adding FREE_DESCRIPTOR_SET flag affect other platforms? From Vulkan specification:

VK_DESCRIPTOR_POOL_CREATE_FREE_DESCRIPTOR_SET_BIT specifies that descriptor sets can return their individual allocations to the pool, i.e. all of vkAllocateDescriptorSets, vkFreeDescriptorSets, and vkResetDescriptorPool are allowed. Otherwise, descriptor sets allocated from the pool must not be individually freed back to the pool, i.e. only vkAllocateDescriptorSets and vkResetDescriptorPool are allowed.

Even though blade only releases entire pools, it seems like this flag only extends you to allow to release each set individually. If this fixes the Intel driver bug, and doesn't meddle with other graphics cards, I think it'd be a good solution.

@kvark
Copy link
Owner

kvark commented Jun 9, 2024

I suspect it may change the allocation strategy for the descriptors, thus making it slower for us to allocate new ones. And this would be potentially significant since Blade relies on allocating descriptors many times per frame, potentially thousands of times.
For this reason, and the fact that the current code agrees with Vulkan spec, I wouldn't want to put this workaround unconditionally. I am, however, open to put it explicitly as a workaround for a specific platform bug.
Looks to be affecting Windows Intel 11th gen? That's a pretty popular config...

@skejeton
Copy link
Contributor

skejeton commented Jun 9, 2024

Yeah sounds good.

@skejeton
Copy link
Contributor

skejeton commented Jun 9, 2024

Interestingly, someone else is having this issue but on an AMD GPU: zed-industries/zed#12561 (comment)
I'm not sure if these are related.

@shenjackyuanjie
Copy link

image

more about the GPU
it's an RX 580 8G
with the latest driver of 24.3.1

image

@kvark
Copy link
Owner

kvark commented Jun 10, 2024

Uhh, that's rather unfortunate. Maybe it's just a Windows thing in general?

@shenjackyuanjie
Copy link

Uhh, that's rather unfortunate. Maybe it's just a Windows thing in general?

but there is some good news, on zed's last commit, it looks fine

e829a8c3b0564bc902cc0d0a530099be7f0f036e

@kvark
Copy link
Owner

kvark commented Jun 10, 2024

ah, ok, phew, so it was addressed by #118

skejeton added a commit to skejeton/blade that referenced this issue Jun 11, 2024
kvark pushed a commit that referenced this issue Jun 13, 2024
This will detect an Intel graphics card and apply the fix for leaky
descriptor pools.
@kvark
Copy link
Owner

kvark commented Jul 9, 2024

Closed by #129

@kvark kvark closed this as completed Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants