Skip to content
This repository has been archived by the owner on Jan 7, 2023. It is now read-only.

[DO NOT MERGE] Try for OAM-80992 random building issue #117

Open
wants to merge 378 commits into
base: master
Choose a base branch
from
Open

[DO NOT MERGE] Try for OAM-80992 random building issue #117

wants to merge 378 commits into from

Conversation

renchenglei
Copy link
Contributor

We encounter vk random building issue on Android Q, please refer
to following error log.

https://buildbot.sh.intel.com/abspb/builders/build-q/builds/13888/steps/build_all/logs/stdio

flto and others added 30 commits March 5, 2019 12:03
Fixes: cb2322c

Signed-off-by: Jonathan Marek <[email protected]>
(cherry picked from commit 8eca6df)
In freedreno_gmem.c, gmem_align of 0x8000 is used. Alignment used here
should be the same.

Fixes: 912a9c8

Signed-off-by: Jonathan Marek <[email protected]>
(cherry picked from commit 4f23767)
Fixes: 3a273a4

Signed-off-by: Jonathan Marek <[email protected]>
(cherry picked from commit 6c0fefb)
Now that freedreno has create_with_modifiers(), this "hack" is needed to
make some cases work. Copied from vc4.

Fixes: 41ddf1d

Signed-off-by: Jonathan Marek <[email protected]>
(cherry picked from commit e3591b0)
The optimization in 4cd1a0b introduced a replacement of :

cmp(8).z.f0.0 vgrf11.y:D, vgrf10.xxxx:D, vgrf2.xyyy:D
...
cmp(8).nz.f0.0 null.x:D, vgrf11.yyyy:D, 0D

By :

cmp(8).z.f0.0 vgrf15.x:D, vgrf10.xxxx:D, vgrf2.yyyy:D
...
mov(8) vgrf11.y:D, vgrf15.yyyy:D

The first cmp instruction is storing in x while the second mov is
sourcing from y. We need to take into account where the replacement on
the scan_inst destination is going to store thing so that the
replacement mov can source things from the correct location.

Signed-off-by: Lionel Landwerlin <[email protected]>
Fixes: 4cd1a0b ("i965/vec4: Propagate conditional modifiers from more compares to other compares")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109759
Reviewed-by: Ian Romanick <[email protected]>
(cherry picked from commit 6e18414)
Added check for higher compat profile being allowed
before assigning certain extensions.

Fixes: 272fe94 (mesa: enable ARB_texture_buffer_* extensions in the Compatibility profile)

Signed-off-by: Danylo Piliaiev <[email protected]>
Signed-off-by: Yevhenii Kolesnikov <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107052
(cherry picked from commit 07f4b4e)
Some types of params such as some builtins are always padded. We
need to keep track of this so we can restore the list correctly.

Here we also remove a couple of cache entries that are not actually
required as they get rebuilt by the _mesa_add_parameter() calls.

This patch fixes a bunch of arb_texture_multisample and
arb_sample_shading piglit tests for the radeonsi NIR backend.

Fixes: edded12 ("mesa: rework ParameterList to allow packing")

Reviewed-by: Marek Olšák <[email protected]>
(cherry picked from commit 7536af6)
call XShmDetach to allow X server to free shared memory

Fixes: bcd80be "drisw/glx: use XShm if possible"
Signed-off-by: Ray Zhang <[email protected]>
Reviewed-by: Dave Airlie <[email protected]>
(cherry picked from commit b344e32)
Avoids regression on:

KHR-GLES*.core.tessellation_shader.single.xfb_captures_data_from_correct_stage

that is uncovered by the following patch.

"glsl: fix recording of variables for XFB in TCS shaders"

v2: Rebased over glsl: fix recording of variables for XFB in TCS shaders
v3: Move this patch before "glsl: fix recording of variables for XFB in TCS
    shaders" to avoid temporal regressions. (Illia Mirkin)

Cc: 19.0 <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
(cherry picked from commit bf1f494)
This is purely for conformance, since it's not actually possible to do
XFB on TCS output varyings. However we do have to make sure we record
the names correctly, and this removes an extra level of array-ness from
the names in question.

Fixes KHR-GL45.tessellation_shader.single.xfb_captures_data_from_correct_stage

v2: Add comment to the new program_resource_visitor::process function.
    (Ilia Mirkin)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108457
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: 19.0 <[email protected]>
Reviewed-by: Timothy Arceri <[email protected]>
(cherry picked from commit 4eec3a2)
Fix anv_extrypoints.{c,h} and anv_extensions.{c,h} missing dependencies
Rename the variable labels according to targets and python scripts
Align the building rules as per Automake for simplification

Fixes building errors during rebuils due to missing dependencies

(v2) Fixed a missing $(VULKAN_API_XML) reference

Fixes: 9a508b7 ("android: anv/extensions: fix generated sources build")
Fixes: dd088d4 ("anv/extensions: Generate a header file with extension tables")
Signed-off-by: Mauro Rossi <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
Cc: "19.0" <[email protected]>
(cherry picked from commit 14e7e26)
Fixes undefined reference building errors for XML_* functions

Signed-off-by: Mauro Rossi <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
Cc: "19.0" <[email protected]>
(cherry picked from commit ec0f465)
Check GetWindowInfo and ignore the computed sizes
if there is an error.

Fixes a regression caused by earlier commit when
using old wine gallium nine patches.

Should also address a crash at window destruction.

Related issues:
 iXit/Mesa-3D#331
 iXit/Mesa-3D#332

Cc: [email protected]
Fixes: 2318ca6 ("st/nine: Handle window resize when a presentation
buffer is used")

Signed-off-by: Axel Davy <[email protected]>
(cherry picked from commit 86666f0)
Apparently instead of returning error when passing
a quality level different than 0 for
D3DMULTISAMPLE_NONE, we should pass.

Fixes: iXit/Mesa-3D#340

Cc: [email protected]

Signed-off-by: Axel Davy <[email protected]>
(cherry picked from commit 1d363d4)
…port

We were accidentally not counting those surfaces

Fixes: ddc4069 "anv: Implement VK_KHR_maintenance3"
Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
(cherry picked from commit 5049fbd)
No idea how this fell through the cracks besides the fact that the
sampler bound at 0 almost always works and the CTS isn't amazing.  In
any case, this appears to have been broken for almost forever.

Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Cc: [email protected]
(cherry picked from commit ca295dd)
When we have a larger sampler index, we get into the "high sampler"
scenario and need an instruction header.  Even in SIMD8, this pushes the
instruction over the sampler message size maximum of 11 registers.
Instead, we have to lower TXD to TXL.

Fixes: cb98e07 "intel/fs: Support min_lod parameters on texture..."
Reviewed-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
(cherry picked from commit 5c96120)
This does not seem to fix anything ATM but is the right thing todo.

Signed-off-by: Tapani Pälli <[email protected]>
Fixes: f3e91e7 ("anv: add nir lowering pass for ycbcr textures")
Reviewed-by: Lionel Landwerlin <[email protected]>
(cherry picked from commit 33bf3d5)
This function was never used, and isn't properly guarded by HAVE_LIBDRM,
breaking the build on systems that don't have libdrm.

Let's just remove it.

Fixes: 7552fcb "egl: add base EGL_EXT_device_base implementation"
Reported-by: Timo Aaltonen <[email protected]>
Signed-off-by: Eric Engestrom <[email protected]>
Acked-by: Emil Velikov <[email protected]>
(cherry picked from commit bcc4bfc)
If alignement is 0, offets returned by
radv_cmd_buffer_upload_alloc() are always 0. These two
virtual addresses were pointing at the same location.

Cc: 18.3 19.0 <[email protected]>
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
(cherry picked from commit c2a1486)
We can't pull it from the variable type because it might be an array of
blocks and not just the one block.  While we're here, throw in some
error checking.

Reviewed-by: Lionel Landwerlin <[email protected]>
Cc: [email protected]
(cherry picked from commit f1dbc7e)
Fixes: 6ac2d16 ("i965/fs: Fix extract_i8/u8 to a 64-bit destination")
Reviewed-by: Jason Ekstrand <[email protected]>
(cherry picked from commit 4aaf139)
In the old code, we would generate the exact same instruction for
extract_u8(some_u64, 0) and extract_u8(some_u64, 1).  The mask-a-word
trick only works for even numbered bytes.

This fixes the (new) piglit test
tests/spec/arb_gpu_shader_int64/execution/fs-ushr-and-mask.shader_test.

v2: Use a SHR instead of an AND.  This saves an instruction compared to
using two moves.  Suggested by Jason.

Fixes: 6ac2d16 ("i965/fs: Fix extract_i8/u8 to a 64-bit destination")
Reviewed-by: Jason Ekstrand <[email protected]>
(cherry picked from commit 55e6454)
This variable is now unused, so let's remove it.

Fixes: db77573 (virgl: modify how we handle GL_MAP_FLUSH_EXPLICIT_BIT)
Reviewed-by: Gurchetan Singh <[email protected]>
(cherry picked from commit 44620d4)
Pull the common code out of the two entrypoints into the helper which
fetches the push descriptor set for us.  Now that it does more than just
get a thing, call it anv_cmd_buffer_push_descriptor_set.

Cc: "19.0" <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
Technically, descriptor set layouts aren't required to survive past the
function they're passed into so we need to reference them.

Cc: "19.0" <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
This fixes a rendering issue where UBO updates aren't always picked
up by drawing calls.  This issue effected the Webots robotics
simulator.  VMware bug 2175527.

Testing Done: Webots replay, piglit, misc Linux games

Reviewed-by: Thomas Hellstrom <[email protected]>
(cherry picked from commit d4381cf)
With createImage(), the caller was expected to set a SHARED flag if they
needed the ability to get a GEM handle.  DRI3, wayland, and gbm all set
it, EGL_MESA_drm_image passes it through, and surfaceless doesn't need it
because there's no way to request a handle.

With the new createImageWithModifiers() DRI method to replace it, the
expectation is that you'll always be able to share the buffer, so the flag
is unnecessary in its arguments.  However, we do need to tell gallium
about this expectation.

Without this, kmscube's modifiers path using
gbm_bo_create_with_modifiers(&modifier, 1) instead of
gbm_bo_create(SCANOUT | SHARED) will call the driver's resource_create()
function wtih PIPE_BIND_SHARED unset, so the driver (particularly
renderonly drivers) may allocate in such a way that it can't return an
answer from gbm_bo_get_handle().  I used to have a hack in v3d using
count==1 && modifier==LINEAR to indicate that you wanted SHARED anyway,
but that was dropped recently.

Fixes: 59527a3 ("v3d: Restructure RO allocations using
resource_from_handle.")
Reviewed-by: Kristian H. Kristensen <[email protected]>

(cherry picked from commit fafead7)
strassek and others added 4 commits June 6, 2019 12:00
This change makes following test pass:
	dEQP-VK.api.info.device.extensions

Originally-from: Tapani Pälli <[email protected]>
Test: [CTS 9.0_r8] dEQP-VK.api.info.device.extensions
Signed-off-by: Kevin Strasser <[email protected]>
(cover letter https://patchwork.freedesktop.org/series/51006/)

FROMLIST: i965: SIMD32 heuristics debug flag

Added a new DEBUG_HEUR32 flag to INTEL_DEBUG flags for enabling SIMD32
selection heuristics.

(am from https://patchwork.freedesktop.org/patch/256764/)

FROMLIST: i965: SIMD32 heuristics control data

Added a new structure for holding SIMD32 heuristics control data. The
control data itself will be fetched from drirc.

(am from https://patchwork.freedesktop.org/patch/256806/)

FROMLIST: i965: SIMD32 heuristics control data from drirc

To be able to test the heuristics with different parameters, they can be
controlled via environment variables through drirc.

(am from https://patchwork.freedesktop.org/patch/256788/)

FROMLIST: mesa: Helper functions for counting set bits in a mask

(am from https://patchwork.freedesktop.org/patch/256765/)

FROMLIST: i965/fs: Save the instruction count of each dispatch width

The SIMD32 selection heuristics will use this information for deciding whether
SIMD32 shaders should be used.

(am from https://patchwork.freedesktop.org/patch/256793/)

FROMLIST: i965/fs: SIMD32 selection heuristic based on grouped texture fetches

The function goes through the compiled shader and checks how many grouped
texture fetches there are. This is a simple heuristic which gets rid of most
of the regressions when enabling SIMD32 shaders but still retains some of
the benefits.

(am from https://patchwork.freedesktop.org/patch/256798/)

FROMLIST: i965/fs: Enable all SIMD32 heuristics

There are three simple heuristics for SIMD32 shader enabling:

- How many MRTs does the shader write into?
- How many grouped texture fetches does the shader have?
- How many instructions does the SIMD32 shader have compared to the SIMD16
   shader?

For testing purposes, the heuristics can be controlled via these environment
variables:

simd32_heuristic_mrt_check
- Enables MRT write check
- Default: true

simd32_heuristic_max_mrts
- How many MRT writes the heuristic allows
- Default: 1

simd32_heuristic_grouped_check
- Enables grouped texture fetch check
- Default: true

simd32_heuristic_grouped_sends
- How many grouped texture fetches the heuristic allows
- Default: 6

simd32_heuristic_inst_check
- Enables SIMD32 vs. SIMD16 instruction count check
- Default: true

simd32_heuristic_inst_ratio
- SIMD32 vs. SIMD16 instruction count ratio the heuristic allows
- Default: 2.3

SIMD32 shaders will not be compiled also when SIMD16 compilation fails or
spills.

(am from https://patchwork.freedesktop.org/patch/256766/)
This is needed to be in agreement with spec requirements:
KhronosGroup/OpenGL-API#46

Piers Daniell:
   "We discussed this in the OpenGL/ES working group meeting
    and agreed that eliminating unused elements from the interface
    block array is not desirable. There is no statement in the spec
    that this takes place and it would be highly implementation
    dependent if it happens. If the application has an "interface"
    in the shader they need to match up with the API it would be
    quite confusing to have the binding point get compacted.
    So the answer is no, the binding points aren't affected by
    unused elements in the interface block array."

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109532
Reported-By: Ilia Mirkin <[email protected]>
Signed-off-by: Andrii Simiklit <[email protected]>

TEST=[CTS 9.0r6} dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers#18
(am from https://gitlab.freedesktop.org/mesa/mesa/merge_requests/332)
Signed-off-by: Kevin Strasser <[email protected]>
…of ssbo/ubo

This is needed to fix these tests:
piglit.spec.arb_shader_storage_buffer_object.compiler.unused-array-element_frag
piglit.spec.arb_shader_storage_buffer_object.compiler.unused-array-element_comp

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109532
Reported-By: Ilia Mirkin <[email protected]>
Signed-off-by: Andrii Simiklit <[email protected]>

TEST=[CTS 9.0r6} dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers#18
(am from https://gitlab.freedesktop.org/mesa/mesa/merge_requests/332)
Signed-off-by: Kevin Strasser <[email protected]>
@renchenglei
Copy link
Contributor Author

Please don't merge this PR, I put here for building test. We have one random vk building issue, which is only reproduced with buildbot. I have to create PR and build with buildbot.

Copy link

@sysopenci sysopenci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Autobuild started from pull-request-changes on this PR.

FAILURE: CheckBug Bad comments/Bugs

For more information, see: /absp/builders/celadon-autobuild/builds/960

@tpalli
Copy link
Contributor

tpalli commented Jun 12, 2019

It looks strange, libmesa_vulkan_util should come within libmesa_vulkan_common

@renchenglei
Copy link
Contributor Author

@tpalli, yes, the reproduced rate is very low. Not sure, if it is caused by building rule changes on Q. Maybe some sub-module is built parallel.

@renchenglei
Copy link
Contributor Author

@tpalli, I have updated the patch. With the new changes, we have tried 15 buildings. And issue seems gone. Do we still need those two flags on Android?

@tpalli
Copy link
Contributor

tpalli commented Jun 13, 2019

@tpalli, I have updated the patch. With the new changes, we have tried 15 buildings. And issue seems gone. Do we still need those two flags on Android?

Yes we do need these flags, they are also used with i965 driver. Problem is that otherwise some other hash algorithm (default from toolchain) might get chosen and we explicitly want 'sha1'. I don't understand how having these flags could cause the linking issue. Maybe you were just lucky in this case?

@renchenglei
Copy link
Contributor Author

@tpalli, I tried to reproduce this issue on buildbot without any changes, and the reproduced rate is about 40%. Now with this changes, after 15 building, and no issue now.
If 'sha1' is must have. How about 'wl'?
The failure is about ld issues:
ld.lld: error: undefined symbol: vk_Result_to_str

@tpalli
Copy link
Contributor

tpalli commented Jun 13, 2019

@tpalli, I tried to reproduce this issue on buildbot without any changes, and the reproduced rate is about 40%. Now with this changes, after 15 building, and no issue now.
If 'sha1' is must have. How about 'wl'?

-Wl is the prefix used to give linker parameters when linker is run indirectly (via compiler), so it's not a parameter itself, just the indication that we want to give parameter '--build-id'

The failure is about ld issues:
ld.lld: error: undefined symbol: vk_Result_to_str

I do find it very strange if the linker parameter affects this. You could try if Vulkan apps work without these flags. When enabling Vulkan I had to put this flag, otherwise the build-id check would fail (see anv_physical_device_init_uuids in anv_device.c).

@renchenglei
Copy link
Contributor Author

@tpalli, that make sense! If possible, let's keep the origin patch. I updated the patch, I added "LOCAL_LDFLAGS += -Wl,--no-threads", which means we don't use multi-threads to do link. I triggered about 20 buildings, and not reproduced the random failure issue. Please take a review the latest patch. Thanks a lot!

@tpalli
Copy link
Contributor

tpalli commented Jun 13, 2019

All right, adding '--no-threads' is fine for me, it will mean slower linking but that will affect only the vulkan driver. It could possibly indicate a bug in Q toolchain as this has been working well so far.

@renchenglei
Copy link
Contributor Author

Thanks @tpalli. Will we keep this as a internal changes, if yes, I will update the PR commit info, and get it merged asap. As the random failure break Cactus team's auto building.

Copy link

@sysopenci sysopenci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Autobuild started from pull-request-changes on this PR.

FAILURE: CheckBug Bad comments/Bugs

For more information, see: /absp/builders/celadon-autobuild/builds/971

@renchenglei
Copy link
Contributor Author

@tpalli & @YuanjunHuang FYI.
With our previous changes(do not use multi-threads to link), we still could reproduce this issue. The reproduced rate is about 5%. It is so strange, this issue can't be reproduced on my local host, it is only reproduced on build bot, which could build with more than 30 processes.
There is some building enhancement on Android Q, it is much better not to generate *.h or *.c during building. This issue is caused by build&link parallel. Some files are not generated&compile before it is built&called.
I have check the source code, it seems that only one file call function vk_Result_to_str

grep -r vk_Result_to_str *
src/intel/vulkan/anv_util.c:   const char *error_str = vk_Result_to_str(error);
src/amd/vulkan/radv_util.c:     const char *error_str = vk_Result_to_str(error);

So I reorg the generation order, we hope it has been generated before we call it. I will trigger many many buildings with build bot to check if issue is still here.

@renchenglei
Copy link
Contributor Author

I tried to move the generated file code to libmesa_vulkan_common, but issue is still here, :(
https://buildbot.sh.intel.com/abspb/builders/build-q/builds/16187/steps/build_all/logs/stdio
It is so tough, could we copy the function to anv_util.c directly, which could help pass the building issue. And if yes, I suggested we only get it merged on 1A which won't block daily/weekly and integration build, we still could use github and opensource-gfx build bot to keep debugging this issue.

@renchenglei
Copy link
Contributor Author

@tpalli & @YuanjunHuang, what do you think of this changes, could we use this as a hot fix for cactus build?
After many potential fixs, the vk random link failure is still here. Maybe we need much more time to investigate this issue, which is only reproduced on build bot with a very low reproduced rate.

@tpalli
Copy link
Contributor

tpalli commented Jun 18, 2019

Are you able to access the build machine? It would be interesting to see what symbols 'libmesa_vulkan_util.a' and 'libmesa_vulkan_common.a' have in them, you can list the symbols as example like: "readelf -s --wide libmesa_vulkan_common.a'"

@tpalli
Copy link
Contributor

tpalli commented Jun 18, 2019

Overall it looks to me that this is likely a bug in Android Q toolchain as the dependencies are set correctly in mk files. Using a workaround is ok for me but it might become painful to update that whenever there are changes, such as new error messages from new VK extensions.

@renchenglei
Copy link
Contributor Author

I have synced with Cactus team, when the build finished, the whole out folder will be deleted. Currently, we can't get 'libmesa_vulkan_util.a' and 'libmesa_vulkan_common.a'.
Yes, it will be painful for our rebase. Before we have a real fix, I will take care of this wa, we keep this wa to unblock the cactus building error, :)
We won't get this wa landed on our github, so opensource-gfx branch works as before. I will also keep investigating this issue with cactus and compile team to double confirm if this issue is from our side or Android Q toolchain.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.