-
Notifications
You must be signed in to change notification settings - Fork 57
CTS dEQP issue fix based on mesa 19.0.5 #116
base: master
Are you sure you want to change the base?
Conversation
Reviewed-by: Alejandro Piñeiro <[email protected]> (cherry picked from commit 30b548f)
Fixes: 19064b8 "nir: Add a pass for gathering transform feedback info" Reviewed-by: Alejandro Piñeiro <[email protected]> (cherry picked from commit 8f0fe71)
Instead of going to all the work of to combine them into one array, just make two arrays and use location_frac to colocate them within CLIP0. Then the back-end can sort things out and stack them on top of each other. Thanks to ef99f4c, we also don't need to set compact anymore. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> (cherry picked from commit 4e69fba) Conflicts resolved by Dylan Conflicts: src/compiler/nir/nir_lower_clip_cull_distance_arrays.c
We needed to better handle cases where a chunk of a variable starts at some non-zero location_frac and rolls over into the next slot but may not be more than 4 dwords. For example, if gl_CullDistance is an array of 3 things and has location_frac = 2, it will span across two vec4s but is not, itself, bigger than a vec4. If you ignore the clip/cull special case, it's not allowed to happen for anything else because the only things that can span more than one slot is dvec3 and dvec4 and they're both bigger than a vec4. The current code uses this attrib_slot thing where we count attribute slots and iterate over them. However, that doesn't work in the case above because gl_CullDistance will have an attrib_slot count of 1 even though it does span two slots. We could fix this by adjusting attrib_slot but we already have comp_mask and it's easier to just handle it that way. Reviewed-by: Alejandro Piñeiro <[email protected]> (cherry picked from commit 558c314)
This makes us properly handle gl_ClipDistance and gl_CullDistance. Fixes: 19064b8 "nir: Add a pass for gathering transform feedback info" Reviewed-by: Alejandro Piñeiro <[email protected]> (cherry picked from commit 1a93fc3)
Needed for https://gitlab.freedesktop.org/mesa/mesa/merge_requests/248 . That MR keeps the clip and cull arrays split. So we have to handle - compact arrays with location_frac != 0 - VARYING_SLOT_CLIP_DIST1 Reviewed-by: Samuel Pitoiset <[email protected]> (cherry picked from commit 1ef2855)
Copy+paste error. It was supposed to test cull and not clip. Fixes: 4e69fba "nir: Rewrite lower_clip_cull_distance_arrays..." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109717 Reviewed-by: Lionel Landwerlin <[email protected]> (cherry picked from commit f98fd9d)
nir_lower_clip_cull_distance_arrays() marks the combined clip/cull distance array as compact. However, when translating in from GLSL or SPIR-V, we were not marking the original float[] arrays as compact. We should do so. That way, we can detect these corner cases properly. Reviewed-by: Timothy Arceri <[email protected]> (cherry picked from commit ef99f4c) Reviewed-by: Jason Ekstrand <[email protected]>
The cherry-pick dropped a chunk. Fixes: e735173 "radv: Fix float16 interpolation set up." Reviewed-by: Samuel Pitoiset <[email protected]>
spirv_to_nir can generate input/output variables which are illegal for the current shader stage, which would cause nir_validate_shader to balk. After my recent commit to start decorating arrays as compact, dEQP-VK.spirv_assembly.instruction.graphics.module.same_module started hitting validation errors due to outputs in a TCS (not intended for the TCS at all) not being per-vertex arrays. Thanks to Jason Ekstrand for suggesting this approach. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109573 Fixes: ef99f4c compiler: Mark clip/cull distance arrays as compact before lowering. Reviewed-by: Juan A. Suarez <[email protected]> (cherry picked from commit 6775665)
Fix the logic for buffer full check on alloc. This patch just takes the fix Nicolai attached to the bug report and updates it to work on master. Fixes: e0f0d36 ("radeonsi: factor si_query_buffer logic out of si_query_hw") Reviewed-by: Marek Olšák <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109561 (cherry picked from commit 603206d)
Fixes following valgrind warning: ==27561== Conditional jump or move depends on uninitialised value(s) ==27561== at 0x667856B: value_set_ssa_components (nir_opt_copy_prop_vars.c:78) ==27561== by 0x667A1C4: copy_prop_vars_block (nir_opt_copy_prop_vars.c:797) Fixes: 62332d1 "nir: Add a local variable-based copy propagation pass" Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> (cherry picked from commit 22267fe)
We shouldn't increment the buffer list pointers twice. This fixes some crashes with new CTS dEQP-VK.binding_model.descriptor_copy.*. Cc: 18.3 19.0 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> (cherry picked from commit 9256e0a)
Sampler descriptors don't have a buffer list. This fixes some crashes with new CTS dEQP-VK.binding_model.descriptor_copy.*.sampler_*. Cc: 18.3 19.0 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> (cherry picked from commit 4924dfc)
Earlier commit introduced support for haiku yet did not properly annotate the loader/xmlconfig dependencies. Thus we ended up adding inc_loader for each !haiku platform - see 659910e 9a96bf0 c731508 ec6cb01. One piece remained though - the wayland platform. Hence the following would fail: meson -Dgallium-drivers=etnaviv -Ddri-drivers=''\ -Dtools=etnaviv -Dplatforms=wayland -Dglx=disabled \ build/ Cc: Alexander von Gluck IV <[email protected]> Reported-by: Boris Brezillon <[email protected]> Fixes: 834d221 ("meson: Add Haiku platform support v4") Signed-off-by: Emil Velikov <[email protected]> Tested-by: Boris Brezillon <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Reviewed-by: Dylan Baker <[email protected]> (cherry picked from commit f0a7b46)
Calculating the scissor rectangle fields with the y flipped (0 on top) can generate negative values that will cause assertion failure later on as the scissor fields are all unsigned. We must clamp the bbox values again to make sure they don't exceed the fb_height. Also fixed a calculation error. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108999 https://bugs.freedesktop.org/show_bug.cgi?id=109594 v2: - I initially clamped the values inside the if (Y is flipped) case and I made a mistake in the calculation: the clamp of the bbox[2] should be a check if (bbox[2] >= fbheight) bbox[2] = fbheight - 1 instead and I shouldn't have changed the ScissorRectangleYMax calculation. As the fixed code is equivalent with using CLAMP instead of MAX2 at the top of the function when bbox[2] and bbox[3] are calculated, and the 2nd is more clear, I replaced it. (Nanley Chery) v3: - Reversed the CLAMP change in bbox[3] as the API guarantees that the viewport height is positive. (Nanley Chery) v4: - Added nomination for the mesa-stable branch and the link to the second bugzilla bug (Nanley Chery) CC: <[email protected]> Tested-by: Paul Chelombitko <[email protected]> Reviewed-by: Nanley Chery <[email protected]> (cherry picked from commit fd37a19)
If no framebuffer is bound, get the number of samples and the image format from the render pass. This fixes new CTS dEQP-VK.geometry.layered.*.secondary_cmd_buffer. Cc: 18.3 19.0 <[email protected]> Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> (cherry picked from commit 5671f38) Conflicts resolved by Dylan Conflicts: src/amd/vulkan/radv_meta_clear.c
Seems like dxvk used integer builtins without setting the flat interpolation decoration. I believe in the current spec the app is required to set these, but in the meantime to avoid breaking things in stable releases (and so close to release for 19.0), only expand the interpolation to float16 and struct (which cannot be builtins as our spirv parser lowers the builtin block). Fixes: f324784 "radv: Allow interpolation on non-float types." Reviewed-by: Samuel Pitoiset <[email protected]> (cherry picked from commit c011047)
This fixes a failed assertion in glDeleteLists() for the following case: list = glGenLists(1); glDeleteLists(list, 1); when those are the first display list commands issued by the application. When we generate display lists, we plug in empty lists created with the make_list() helper. This function uses the OPCODE_END_OF_LIST opcode but does not call dlist_alloc() which would set the InstSize[OPCODE_END_OF_LIST] element to non-zero. When the empty list was deleted, we failed the InstSize[opcode] > 0 assertion. Typically, display lists are created with glNewList/glEndList so we set InstSize[OPCODE_END_OF_LIST] = 1 in dlist_alloc(). That's why this bug wasn't found before. To fix this failure, simply initialize the InstSize[OPCODE_END_OF_LIST] element in make_list(). The game oolite was hitting this. Fixes: OoliteProject/oolite#325 Reviewed-by: Marek Olšák <[email protected]> (cherry picked from commit 6dabcb5)
This change applies the workaround suggested by Bill Deegan on the affected SCons versions. It also adds a comment with the URL explaining why we were using customizing the decider and max_drift in the first place, as I had forgotten all about it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109443 Tested-by: [email protected] Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Brian Paul <[email protected]>
There is only room for 3 vertices now (RECT has 3 vertices). Fixes: 6ef7700 Signed-off-by: Jonathan Marek <[email protected]> (cherry picked from commit 357313a)
Fixes: cb2322c Signed-off-by: Jonathan Marek <[email protected]> (cherry picked from commit 8eca6df)
In freedreno_gmem.c, gmem_align of 0x8000 is used. Alignment used here should be the same. Fixes: 912a9c8 Signed-off-by: Jonathan Marek <[email protected]> (cherry picked from commit 4f23767)
Fixes: 3a273a4 Signed-off-by: Jonathan Marek <[email protected]> (cherry picked from commit 6c0fefb)
Now that freedreno has create_with_modifiers(), this "hack" is needed to make some cases work. Copied from vc4. Fixes: 41ddf1d Signed-off-by: Jonathan Marek <[email protected]> (cherry picked from commit e3591b0)
The optimization in 4cd1a0b introduced a replacement of : cmp(8).z.f0.0 vgrf11.y:D, vgrf10.xxxx:D, vgrf2.xyyy:D ... cmp(8).nz.f0.0 null.x:D, vgrf11.yyyy:D, 0D By : cmp(8).z.f0.0 vgrf15.x:D, vgrf10.xxxx:D, vgrf2.yyyy:D ... mov(8) vgrf11.y:D, vgrf15.yyyy:D The first cmp instruction is storing in x while the second mov is sourcing from y. We need to take into account where the replacement on the scan_inst destination is going to store thing so that the replacement mov can source things from the correct location. Signed-off-by: Lionel Landwerlin <[email protected]> Fixes: 4cd1a0b ("i965/vec4: Propagate conditional modifiers from more compares to other compares") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109759 Reviewed-by: Ian Romanick <[email protected]> (cherry picked from commit 6e18414)
Added check for higher compat profile being allowed before assigning certain extensions. Fixes: 272fe94 (mesa: enable ARB_texture_buffer_* extensions in the Compatibility profile) Signed-off-by: Danylo Piliaiev <[email protected]> Signed-off-by: Yevhenii Kolesnikov <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107052 (cherry picked from commit 07f4b4e)
Some types of params such as some builtins are always padded. We need to keep track of this so we can restore the list correctly. Here we also remove a couple of cache entries that are not actually required as they get rebuilt by the _mesa_add_parameter() calls. This patch fixes a bunch of arb_texture_multisample and arb_sample_shading piglit tests for the radeonsi NIR backend. Fixes: edded12 ("mesa: rework ParameterList to allow packing") Reviewed-by: Marek Olšák <[email protected]> (cherry picked from commit 7536af6)
call XShmDetach to allow X server to free shared memory Fixes: bcd80be "drisw/glx: use XShm if possible" Signed-off-by: Ray Zhang <[email protected]> Reviewed-by: Dave Airlie <[email protected]> (cherry picked from commit b344e32)
To avoid blocking other EGL calls, release the display mutex before calling update_buffers(), which will call droid_window_dequeue_buffer(). This patch fixes some failure cases in android graphics cts test. Signed-off-by: Min He <[email protected]> Signed-off-by: Chenglei Ren <[email protected]>
This fixes crash due to NULL window when swap interval is set for pbuffer surface. Jira: 61995 Test: CtsDisplayTestCases pass Signed-off-by: samiuddi <[email protected]> (am from https://patchwork.freedesktop.org/patch/235697/)
Signed-off-by: Kalyan Kondapally <[email protected]>
GLSL ES 320 technically allows #line to have arbitrary expression trees rather than integer literal constants, unlike the C and C++ preprocessor. This is likely a completely unused feature that does not make sense. However, Android irritatingly mandates this useless behavior, so this patch implements a hack to try and support it. We handle a single expression: #line <line number expression> but we avoid handling the double expression: #line <line number expression> <source string expression> because this is an ambiguous grammar. Instead, we handle the case that wraps both in parenthesis, which is actually well defined: #line (<line number expression>) (<source string expression>) With this change following tests pass: dEQP-GLES3.functional.shaders.preprocessor.builtin.line_expression_vertex dEQP-GLES3.functional.shaders.preprocessor.builtin.line_expression_fragment dEQP-GLES3.functional.shaders.preprocessor.builtin.line_and_file_expression_vertex dEQP-GLES3.functional.shaders.preprocessor.builtin.line_and_file_expression_fragment Signed-off-by: Tapani Pälli <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> BUG=b:33352633 BUG=b:33247335 TEST=affected tests passing on CTS 7.1_r1 sentry Change-Id: I7afbbb386bd4a582e3f241014a83eaccad1d50d9 Reviewed-on: https://chromium-review.googlesource.com/427305 Tested-by: Haixia Shi <[email protected]> Reviewed-by: Ilja H. Friedel <[email protected]> Commit-Queue: Haixia Shi <[email protected]> Trybot-Ready: Haixia Shi <[email protected]>
GPA requires a null renderer query which disables all rendering. This feels fairly at odds with the spirit of the INTEL_performance_query extension. Note: Considering the INTEL_blackhole_render implementation(https://www. khronos.org/registry/OpenGL/extensions/INTEL/INTEL_blackhole_render .txt, https://patchwork.freedesktop.org/series/40035/)need test case changes, and also need time to review in upstream, we keep this patch firstly for urgent project milestone. Test: Pass mdapi test_GfxDrv_DriverAcceptance test case GfxDrv_DriverAcceptanceQuery.GL_NULL_HARDWARE and has no reg issue Signed-off-by: Landwerlin, Lionel <[email protected]>
This change makes following test pass: dEQP-VK.api.info.device.extensions Originally-from: Tapani Pälli <[email protected]> Test: [CTS 9.0_r8] dEQP-VK.api.info.device.extensions Signed-off-by: Kevin Strasser <[email protected]>
(cover letter https://patchwork.freedesktop.org/series/51006/) FROMLIST: i965: SIMD32 heuristics debug flag Added a new DEBUG_HEUR32 flag to INTEL_DEBUG flags for enabling SIMD32 selection heuristics. (am from https://patchwork.freedesktop.org/patch/256764/) FROMLIST: i965: SIMD32 heuristics control data Added a new structure for holding SIMD32 heuristics control data. The control data itself will be fetched from drirc. (am from https://patchwork.freedesktop.org/patch/256806/) FROMLIST: i965: SIMD32 heuristics control data from drirc To be able to test the heuristics with different parameters, they can be controlled via environment variables through drirc. (am from https://patchwork.freedesktop.org/patch/256788/) FROMLIST: mesa: Helper functions for counting set bits in a mask (am from https://patchwork.freedesktop.org/patch/256765/) FROMLIST: i965/fs: Save the instruction count of each dispatch width The SIMD32 selection heuristics will use this information for deciding whether SIMD32 shaders should be used. (am from https://patchwork.freedesktop.org/patch/256793/) FROMLIST: i965/fs: SIMD32 selection heuristic based on grouped texture fetches The function goes through the compiled shader and checks how many grouped texture fetches there are. This is a simple heuristic which gets rid of most of the regressions when enabling SIMD32 shaders but still retains some of the benefits. (am from https://patchwork.freedesktop.org/patch/256798/) FROMLIST: i965/fs: Enable all SIMD32 heuristics There are three simple heuristics for SIMD32 shader enabling: - How many MRTs does the shader write into? - How many grouped texture fetches does the shader have? - How many instructions does the SIMD32 shader have compared to the SIMD16 shader? For testing purposes, the heuristics can be controlled via these environment variables: simd32_heuristic_mrt_check - Enables MRT write check - Default: true simd32_heuristic_max_mrts - How many MRT writes the heuristic allows - Default: 1 simd32_heuristic_grouped_check - Enables grouped texture fetch check - Default: true simd32_heuristic_grouped_sends - How many grouped texture fetches the heuristic allows - Default: 6 simd32_heuristic_inst_check - Enables SIMD32 vs. SIMD16 instruction count check - Default: true simd32_heuristic_inst_ratio - SIMD32 vs. SIMD16 instruction count ratio the heuristic allows - Default: 2.3 SIMD32 shaders will not be compiled also when SIMD16 compilation fails or spills. (am from https://patchwork.freedesktop.org/patch/256766/)
This is needed to be in agreement with spec requirements: KhronosGroup/OpenGL-API#46 Piers Daniell: "We discussed this in the OpenGL/ES working group meeting and agreed that eliminating unused elements from the interface block array is not desirable. There is no statement in the spec that this takes place and it would be highly implementation dependent if it happens. If the application has an "interface" in the shader they need to match up with the API it would be quite confusing to have the binding point get compacted. So the answer is no, the binding points aren't affected by unused elements in the interface block array." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109532 Reported-By: Ilia Mirkin <[email protected]> Signed-off-by: Andrii Simiklit <[email protected]> TEST=[CTS 9.0r6} dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers#18 (am from https://gitlab.freedesktop.org/mesa/mesa/merge_requests/332) Signed-off-by: Kevin Strasser <[email protected]>
…of ssbo/ubo This is needed to fix these tests: piglit.spec.arb_shader_storage_buffer_object.compiler.unused-array-element_frag piglit.spec.arb_shader_storage_buffer_object.compiler.unused-array-element_comp Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109532 Reported-By: Ilia Mirkin <[email protected]> Signed-off-by: Andrii Simiklit <[email protected]> TEST=[CTS 9.0r6} dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers#18 (am from https://gitlab.freedesktop.org/mesa/mesa/merge_requests/332) Signed-off-by: Kevin Strasser <[email protected]>
… build The libmesa_anv_gen* modules require anv_extensions.h, patch makes sure it gets generated as a dependency before building them. Signed-off-by: Chenglei Ren <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Cc: <[email protected]> (cherry picked from commit 13b38ca)
This reverts commit ca36eb1. This changes help pass dEQP-VK.spirv_assembly.instruction.* time out issue. Test: No regression on CTS test Signed-off-by: Chenglei Ren <[email protected]>
As some extension has been added to CTS white list, now we could add them back. Test: No regression on CTS test Signed-off-by: Chenglei Ren <[email protected]>
On low powerful platform(APL), ssbo & ubo scalar needs much more time to finish. This will cause CTS test cases time out. Here we disable VK_EXT_scalar_block_layout to pass them. Test: No regression on CTS test Signed-off-by: Chenglei Ren <[email protected]>
While the number of ACPs is generally not huge compared to the number of blocks, 16 does seem a bit small. Bumping it to 64 takes the execution time of the piglit vs-isnan-dvec test from about 1:18.1 on an unoptimized debug build (what we run in CI) with NIR_VALIDATE=0 to about 1:16.2. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
If the destination of an ACP entry exists only within this block, then there's no need to keep it for dataflow analysis. We can delete it from the out_acp table and avoid growing the bitsets any bigger than we absolutely have to. This reduces the maximum number of global ACP entries in the vs-isnan-dvec with software fp64 on Kaby Lake from 8630 to 3942 and takes the execution time of the piglit vs-isnan-dvec test from about 1:16.2 on an unoptimized debug build (what we run in CI) with NIR_VALIDATE=0 to about 56.4 seconds. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
…ction In order to set up KILL sets, the dataflow code was walking the entire array of ACPs for every instruction. If you assume the number of ACPs increases roughly with the number of instructions, this is O(n^2). As it turns out, regions_overlap() is not nearly as cheap as one would like and shows up as a significant chunk on perf traces. This commit changes things around and instead first builds an array of exec_lists which it uses like a hash table (keyed off ACP source or destination) similar to what's done in the rest of the copy-prop code. By first walking the list of ACPs and populating the table and then walking instructions and only looking at ACPs which probably have the same VGRF number, we can reduce the complexity to O(n). This takes the execution time of the piglit vs-isnan-dvec test from about 56.4 seconds on an unoptimized debug build (what we run in CI) with NIR_VALIDATE=0 to about 38.7 seconds. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Autobuild started from pull-request-changes on this PR.
FAILURE: CheckBug Bad comments/Bugs
For more information, see: /absp/builders/celadon-autobuild/builds/914
@renchenglei FYI, for issue INTERNAL: Revert "Revert "intel/compiler: More peephole select"" there is the real fix now in Mesa master branch. It is commit f4ef34f, "intel/fs: Add an UNDEF instruction to avoid excess live ranges". I think we should use that commit (if feasible) instead of revert. |
@tpalli, thanks a lot for the info! Let me have a try, :) |
We may have already covered this somewhere, but for the vk extension stuff can we test the api level in the script? It seems like the extension whitelist for Android is set for each release, we might as well do the same in Mesa. |
Thanks @strassek, that make sense! Recently, I am blocked by some other work, once done, I will take a look this. |
We tried this commit(f4ef34f), there seems regression with this new commit. |
a69fa2f
to
6959b05
Compare
We have upgrade our mesa to 19.0.5. Based on the new code, we have some CTS issues, the following patches could help fix some of them.
@tpalli @strassek @js0701 @YuanjunHuang Please help take a review of them.
Kevin, let's freeze our code to 19.0.5 and figure out issue based on 19.0.5. I will monitor upstream branch, if there is any fix for current issue. I will share the patch and back port them to our github. We have full CTS dEQP and GFX test on them. With following changes, no regression is found.
Bug track:
Bug ID OAM-80938, fix patch 3b7825a
Bug ID OAM-80942, fix patch 480e1cc
Bug ID OAM-80943, fix patch efac387; ef1e217; 99274f6
Kevin, please feel free to reject my PR, and reorg or re-back port some of them from upstream.