Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test flake in lldb concurrent break point tests #111583

Open
ilovepi opened this issue Oct 8, 2024 · 2 comments
Open

Test flake in lldb concurrent break point tests #111583

ilovepi opened this issue Oct 8, 2024 · 2 comments
Labels

Comments

@ilovepi
Copy link
Contributor

ilovepi commented Oct 8, 2024

We're seeing some LLDB tests flake in our CI. Given these are concurrent tests I assume there is some data race or lack of synchronization.

Flaky tests:
lldb-api :: functionalities/thread/concurrent_events/TestConcurrentSignalWatchBreak.py
lldb-api :: functionalities/thread/concurrent_events/TestConcurrentSignalNWatchNBreak.py

Bots:
https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/lldb-linux-arm64/b8734630228996969777/infra
https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/lldb-linux-arm64/b8734618131611235377/overview

Error output:

******************** TEST 'lldb-api :: functionalities/thread/concurrent_events/TestConcurrentSignalWatchBreak.py' FAILED ********************
Script:
--
/b/s/w/ir/x/w/install-cpython-aarch64-linux-gnu/bin/python3 /b/s/w/ir/x/w/llvm-llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env ARCHIVER=/b/s/w/ir/x/w/cipd/clang/bin/llvm-ar --env OBJCOPY=/b/s/w/ir/x/w/cipd/clang/bin/llvm-objcopy --env LLVM_LIBS_DIR=/b/s/w/ir/x/w/llvm_build/./lib --env LLVM_INCLUDE_DIR=/b/s/w/ir/x/w/llvm_build/include --env LLVM_TOOLS_DIR=/b/s/w/ir/x/w/llvm_build/./bin --arch aarch64 --build-dir /b/s/w/ir/x/w/llvm_build/lldb-test-build.noindex --lldb-module-cache-dir /b/s/w/ir/x/w/llvm_build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /b/s/w/ir/x/w/llvm_build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /b/s/w/ir/x/w/llvm_build/./bin/lldb --compiler /b/s/w/ir/x/w/cipd/clang/bin/clang --dsymutil /b/s/w/ir/x/w/llvm_build/./bin/dsymutil --llvm-tools-dir /b/s/w/ir/x/w/llvm_build/./bin --lldb-obj-root /b/s/w/ir/x/w/llvm_build/tools/lldb --lldb-libs-dir /b/s/w/ir/x/w/llvm_build/./lib --skip-category=pexpect /b/s/w/ir/x/w/llvm-llvm-project/lldb/test/API/functionalities/thread/concurrent_events -p TestConcurrentSignalWatchBreak.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 20.0.0git (https://llvm.googlesource.com/a/llvm-project revision 8ab77184dde2583950fc6e4886ff526e7e598f7e)
  clang revision 8ab77184dde2583950fc6e4886ff526e7e598f7e
  llvm revision 8ab77184dde2583950fc6e4886ff526e7e598f7e
Skipping the following test categories: ['pexpect', 'dsym', 'gmodules', 'debugserver', 'objc']

Watchpoint 1 hit:
old value: 0
new value: 1

--
Command Output (stderr):
--
FAIL: LLDB (/b/s/w/ir/x/w/cipd/clang/bin/clang-aarch64) :: test (TestConcurrentSignalWatchBreak.ConcurrentSignalWatchBreak.test)
======================================================================
FAIL: test (TestConcurrentSignalWatchBreak.ConcurrentSignalWatchBreak.test)
   Test a signal/watchpoint/breakpoint in multiple threads.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/b/s/w/ir/x/w/llvm-llvm-project/lldb/packages/Python/lldbsuite/test/decorators.py", line 148, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/b/s/w/ir/x/w/llvm-llvm-project/lldb/test/API/functionalities/thread/concurrent_events/TestConcurrentSignalWatchBreak.py", line 15, in test
    self.do_thread_actions(
  File "/b/s/w/ir/x/w/llvm-llvm-project/lldb/packages/Python/lldbsuite/test/concurrent_base.py", line 333, in do_thread_actions
    self.assertEqual(
AssertionError: 1 != 2 : Expected 1 stops due to signal delivery, but got 2
Config=aarch64-/b/s/w/ir/x/w/cipd/clang/bin/clang
----------------------------------------------------------------------
Ran 1 test in 3.160s

FAILED (failures=1)

--

********************

#39394 seems to be a similar report. @JDevlieghere is this a known problem?

@ilovepi ilovepi added the lldb label Oct 8, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Oct 8, 2024

@llvm/issue-subscribers-lldb

Author: Paul Kirth (ilovepi)

We're seeing some LLDB tests flake in our CI. Given these are concurrent tests I assume there is some data race or lack of synchronization.

Flaky tests:
lldb-api :: functionalities/thread/concurrent_events/TestConcurrentSignalWatchBreak.py
lldb-api :: functionalities/thread/concurrent_events/TestConcurrentSignalNWatchNBreak.py

Bots:
https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/lldb-linux-arm64/b8734630228996969777/infra
https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/lldb-linux-arm64/b8734618131611235377/overview

Error output:

******************** TEST 'lldb-api :: functionalities/thread/concurrent_events/TestConcurrentSignalWatchBreak.py' FAILED ********************
Script:
--
/b/s/w/ir/x/w/install-cpython-aarch64-linux-gnu/bin/python3 /b/s/w/ir/x/w/llvm-llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env ARCHIVER=/b/s/w/ir/x/w/cipd/clang/bin/llvm-ar --env OBJCOPY=/b/s/w/ir/x/w/cipd/clang/bin/llvm-objcopy --env LLVM_LIBS_DIR=/b/s/w/ir/x/w/llvm_build/./lib --env LLVM_INCLUDE_DIR=/b/s/w/ir/x/w/llvm_build/include --env LLVM_TOOLS_DIR=/b/s/w/ir/x/w/llvm_build/./bin --arch aarch64 --build-dir /b/s/w/ir/x/w/llvm_build/lldb-test-build.noindex --lldb-module-cache-dir /b/s/w/ir/x/w/llvm_build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /b/s/w/ir/x/w/llvm_build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /b/s/w/ir/x/w/llvm_build/./bin/lldb --compiler /b/s/w/ir/x/w/cipd/clang/bin/clang --dsymutil /b/s/w/ir/x/w/llvm_build/./bin/dsymutil --llvm-tools-dir /b/s/w/ir/x/w/llvm_build/./bin --lldb-obj-root /b/s/w/ir/x/w/llvm_build/tools/lldb --lldb-libs-dir /b/s/w/ir/x/w/llvm_build/./lib --skip-category=pexpect /b/s/w/ir/x/w/llvm-llvm-project/lldb/test/API/functionalities/thread/concurrent_events -p TestConcurrentSignalWatchBreak.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 20.0.0git (https://llvm.googlesource.com/a/llvm-project revision 8ab77184dde2583950fc6e4886ff526e7e598f7e)
  clang revision 8ab77184dde2583950fc6e4886ff526e7e598f7e
  llvm revision 8ab77184dde2583950fc6e4886ff526e7e598f7e
Skipping the following test categories: ['pexpect', 'dsym', 'gmodules', 'debugserver', 'objc']

Watchpoint 1 hit:
old value: 0
new value: 1

--
Command Output (stderr):
--
FAIL: LLDB (/b/s/w/ir/x/w/cipd/clang/bin/clang-aarch64) :: test (TestConcurrentSignalWatchBreak.ConcurrentSignalWatchBreak.test)
======================================================================
FAIL: test (TestConcurrentSignalWatchBreak.ConcurrentSignalWatchBreak.test)
   Test a signal/watchpoint/breakpoint in multiple threads.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/b/s/w/ir/x/w/llvm-llvm-project/lldb/packages/Python/lldbsuite/test/decorators.py", line 148, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/b/s/w/ir/x/w/llvm-llvm-project/lldb/test/API/functionalities/thread/concurrent_events/TestConcurrentSignalWatchBreak.py", line 15, in test
    self.do_thread_actions(
  File "/b/s/w/ir/x/w/llvm-llvm-project/lldb/packages/Python/lldbsuite/test/concurrent_base.py", line 333, in do_thread_actions
    self.assertEqual(
AssertionError: 1 != 2 : Expected 1 stops due to signal delivery, but got 2
Config=aarch64-/b/s/w/ir/x/w/cipd/clang/bin/clang
----------------------------------------------------------------------
Ran 1 test in 3.160s

FAILED (failures=1)

--

********************

#39394 seems to be a similar report. @JDevlieghere is this a known problem?

@jimingham
Copy link
Collaborator

The breakpoint counting in these tests has been flakey but for a known reason (we weren't making the distinction between "thread executed the breakpoint trap" and "the process stopped while this thread happened to have the PC on the trap instruction, but it hasn't executed it yet", which could lead to miscounting breakpoints.

But I'm not sure how you'd get miscounted signals. What the test is actually counting is "number of stops in the debugger where some thread had a stop reason of "signal". The test itself only sends one SIGUSR per signal thread, and the test makes only one signal thread. So either that signal is getting resent - which seems unlikely but signals are weird - and we're legitimately reporting two signal stops, or we are incorrectly preserving the signal stop reason across two stops.

We clear the stop reason for a thread the next time that thread is given a chance to run. We don't know or care whether it actually ran, we clear it when we tell that thread it can run, and then resume the process. However, if we don't allow the thread to run when we resume the process we preserve the stop info, since that really is the last state of that thread...

But in this test the only time we suspend threads is when stepping over breakpoints, we do that by suspending all the other threads and only allowing the breakpoint thread to run one instruction. Then we put the trap back in place and run all threads without returning control to the user. So I can't see a way that that stop - with the preserved signal stop info - could leak to the user.

If we could see the gdb-remote packet log and the lldb step logs for a run that fails this way, we should be able to see at least what the error is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants