Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scx_bpfland: random stalls #674

Open
SoulHarsh007 opened this issue Sep 23, 2024 · 3 comments
Open

scx_bpfland: random stalls #674

SoulHarsh007 opened this issue Sep 23, 2024 · 3 comments

Comments

@SoulHarsh007
Copy link

πŸ‘‹ I am using scx-scheds-git from CachyOS V3 repository.
The build version is: 1.0.4.r74.ga3cc4c2-1

Description: After boot, in the first 10–12 minutes, scx_bpfland stalls and causes the session to freeze, but I am able to trigger a power off by pressing the power off button (not a hard reset, just pressing the power button triggers the shutdown sequence). The test machine has an AMD Ryzen 5500H (6C/12T).

Relevant logs:

Sep 23 13:02:57 NotXonoticP1 kernel: sched_ext: BPF scheduler "bpfland" disabled (runnable task stall)
Sep 23 13:02:57 NotXonoticP1 kernel: sched_ext: bpfland: kworker/10:1[105] failed to run for 7.040s
Sep 23 13:02:57 NotXonoticP1 kernel:    scx_watchdog_workfn+0x18d/0x220
Sep 23 13:02:57 NotXonoticP1 kernel:    process_one_work+0x192/0x350
Sep 23 13:02:57 NotXonoticP1 kernel:    worker_thread+0x24c/0x380
Sep 23 13:02:57 NotXonoticP1 kernel:    kthread+0xd2/0x100
Sep 23 13:02:57 NotXonoticP1 kernel:    ret_from_fork+0x34/0x50
Sep 23 13:02:57 NotXonoticP1 kernel:    ret_from_fork_asm+0x1a/0x30
Sep 23 13:02:58 NotXonoticP1 bash[1052]: DEBUG DUMP
Sep 23 13:02:58 NotXonoticP1 bash[1052]: ================================================================================
Sep 23 13:02:58 NotXonoticP1 bash[1052]: kworker/u48:11[32974] triggered exit kind 1026:
Sep 23 13:02:58 NotXonoticP1 bash[1052]:   runnable task stall (kworker/10:1[105] failed to run for 7.040s)
Sep 23 13:02:58 NotXonoticP1 bash[1052]: Backtrace:
Sep 23 13:02:58 NotXonoticP1 bash[1052]:   scx_watchdog_workfn+0x18d/0x220
Sep 23 13:02:58 NotXonoticP1 bash[1052]:   process_one_work+0x192/0x350
Sep 23 13:02:58 NotXonoticP1 bash[1052]:   worker_thread+0x24c/0x380
Sep 23 13:02:58 NotXonoticP1 bash[1052]:   kthread+0xd2/0x100
Sep 23 13:02:58 NotXonoticP1 bash[1052]:   ret_from_fork+0x34/0x50
Sep 23 13:02:58 NotXonoticP1 bash[1052]:   ret_from_fork_asm+0x1a/0x30
Sep 23 13:02:58 NotXonoticP1 bash[1052]: CPU states
Sep 23 13:02:58 NotXonoticP1 bash[1052]: ----------
Sep 23 13:02:58 NotXonoticP1 bash[1052]: CPU 8   : nr_run=1 flags=0x1 cpu_rel=0 ops_qseq=19999 pnt_seq=239865
Sep 23 13:02:58 NotXonoticP1 bash[1052]:           curr=kworker/u48:11[32974] class=ext_sched_class
Sep 23 13:02:58 NotXonoticP1 bash[1052]:  *R kworker/u48:11[32974] +0ms
Sep 23 13:02:58 NotXonoticP1 bash[1052]:       scx_state/flags=3/0xd dsq_flags=0x0 ops_state/qseq=0/0
Sep 23 13:02:58 NotXonoticP1 bash[1052]:       sticky/holding_cpu=-1/-1 dsq_id=(n/a) dsq_vtime=368488863894
Sep 23 13:02:58 NotXonoticP1 bash[1052]:       cpus=fff
Sep 23 13:02:58 NotXonoticP1 bash[1052]:     scx_dump_state+0x70d/0x8d0
Sep 23 13:02:58 NotXonoticP1 bash[1052]:     scx_ops_error_irq_workfn+0x40/0x50
Sep 23 13:02:58 NotXonoticP1 bash[1052]:     irq_work_run_list+0x53/0x90
Sep 23 13:02:58 NotXonoticP1 bash[1052]:     irq_work_run+0x18/0x50
Sep 23 13:02:58 NotXonoticP1 bash[1052]:     __sysvec_irq_work+0x1c/0xb0
Sep 23 13:02:58 NotXonoticP1 bash[1052]:     sysvec_irq_work+0x6c/0x90
Sep 23 13:02:58 NotXonoticP1 bash[1052]:     asm_sysvec_irq_work+0x1a/0x20
Sep 23 13:02:58 NotXonoticP1 bash[1052]:     scx_watchdog_workfn+0x1a6/0x220
Sep 23 13:02:58 NotXonoticP1 bash[1052]:     process_one_work+0x192/0x350
Sep 23 13:02:58 NotXonoticP1 bash[1052]:     worker_thread+0x24c/0x380
Sep 23 13:02:58 NotXonoticP1 bash[1052]:     kthread+0xd2/0x100
Sep 23 13:02:58 NotXonoticP1 bash[1052]:     ret_from_fork+0x34/0x50
Sep 23 13:02:58 NotXonoticP1 bash[1052]:     ret_from_fork_asm+0x1a/0x30
Sep 23 13:02:58 NotXonoticP1 bash[1052]: CPU 10  : nr_run=3 flags=0x1 cpu_rel=0 ops_qseq=24696 pnt_seq=296208
Sep 23 13:02:58 NotXonoticP1 bash[1052]:           curr=sway[1796] class=rt_sched_class
Sep 23 13:02:58 NotXonoticP1 bash[1052]:   R kworker/10:1[105] -7040ms
Sep 23 13:02:58 NotXonoticP1 bash[1052]:       scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
Sep 23 13:02:58 NotXonoticP1 bash[1052]:       sticky/holding_cpu=-1/-1 dsq_id=0xd dsq_vtime=365139714289
Sep 23 13:02:58 NotXonoticP1 bash[1052]:       cpus=400
Sep 23 13:02:58 NotXonoticP1 bash[1052]:     kthread+0xd2/0x100
Sep 23 13:02:58 NotXonoticP1 bash[1052]:     ret_from_fork+0x34/0x50
Sep 23 13:02:58 NotXonoticP1 bash[1052]:     ret_from_fork_asm+0x1a/0x30
Sep 23 13:02:58 NotXonoticP1 bash[1052]:   R kworker/10:1H[119] -1408ms
Sep 23 13:02:58 NotXonoticP1 bash[1052]:       scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
Sep 23 13:02:58 NotXonoticP1 bash[1052]:       sticky/holding_cpu=-1/-1 dsq_id=0xd dsq_vtime=367884478134
Sep 23 13:02:58 NotXonoticP1 bash[1052]:       cpus=400
Sep 23 13:02:58 NotXonoticP1 bash[1052]:     kthread+0xd2/0x100
Sep 23 13:02:58 NotXonoticP1 bash[1052]:     ret_from_fork+0x34/0x50
Sep 23 13:02:58 NotXonoticP1 bash[1052]:     ret_from_fork_asm+0x1a/0x30
Sep 23 13:02:58 NotXonoticP1 bash[1052]:   R Xwayland[2999] -1ms
Sep 23 13:02:58 NotXonoticP1 bash[1052]:       scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=0/0
Sep 23 13:02:58 NotXonoticP1 bash[1052]:       sticky/holding_cpu=-1/-1 dsq_id=0x8000000000000002 dsq_vtime=365090032150
Sep 23 13:02:58 NotXonoticP1 bash[1052]:       cpus=fff
Sep 23 13:02:58 NotXonoticP1 bash[1052]:     sock_def_readable+0x42/0xc0
Sep 23 13:02:58 NotXonoticP1 bash[1052]:     unix_stream_sendmsg+0x26e/0x6f0
Sep 23 13:02:58 NotXonoticP1 bash[1052]:     sock_write_iter+0x199/0x1a0
Sep 23 13:02:58 NotXonoticP1 bash[1052]:     do_iter_readv_writev+0x130/0x240
Sep 23 13:02:58 NotXonoticP1 bash[1052]:     vfs_writev+0x177/0x3c0
Sep 23 13:02:58 NotXonoticP1 bash[1052]:     __x64_sys_writev+0xec/0x140
Sep 23 13:02:58 NotXonoticP1 bash[1052]:     do_syscall_64+0x82/0x190
Sep 23 13:02:58 NotXonoticP1 bash[1052]:     entry_SYSCALL_64_after_hwframe+0x76/0x7e
Sep 23 13:02:58 NotXonoticP1 bash[1052]: ================================================================================
Sep 23 13:02:58 NotXonoticP1 bash[1052]: 07:32:58 [INFO] Unregister scx_bpfland scheduler
Sep 23 13:02:58 NotXonoticP1 bash[1052]: Error: EXIT: runnable task stall (kworker/10:1[105] failed to run for 7.040s)
Sep 23 13:02:58 NotXonoticP1 systemd[1]: scx.service: Main process exited, code=exited, status=1/FAILURE
Sep 23 13:02:58 NotXonoticP1 systemd[1]: scx.service: Failed with result 'exit-code'.
Sep 23 13:02:58 NotXonoticP1 systemd[1]: scx.service: Scheduled restart job, restart counter is at 1.
Sep 23 13:02:58 NotXonoticP1 systemd[1]: Started Start scx_scheduler.
Sep 23 13:02:58 NotXonoticP1 bash[42807]: 07:32:58 [INFO] scx_bpfland 1.0.4-ga3cc4c2-dirty x86_64-unknown-linux-gnu SMT on
Sep 23 13:02:58 NotXonoticP1 bash[42807]: 07:32:58 [INFO] primary CPU domain = 0xfff
Sep 23 13:02:58 NotXonoticP1 bash[42807]: 07:32:58 [INFO] cpufreq performance level: auto
Sep 23 13:02:58 NotXonoticP1 bash[42807]: 07:32:58 [INFO] L2 cache ID 1: sibling CPUs: [2, 3]
Sep 23 13:02:58 NotXonoticP1 bash[42807]: 07:32:58 [INFO] L2 cache ID 2: sibling CPUs: [4, 5]
Sep 23 13:02:58 NotXonoticP1 bash[42807]: 07:32:58 [INFO] L2 cache ID 4: sibling CPUs: [8, 9]
Sep 23 13:02:58 NotXonoticP1 bash[42807]: 07:32:58 [INFO] L2 cache ID 5: sibling CPUs: [10, 11]
Sep 23 13:02:58 NotXonoticP1 bash[42807]: 07:32:58 [INFO] L2 cache ID 0: sibling CPUs: [0, 1]
Sep 23 13:02:58 NotXonoticP1 bash[42807]: 07:32:58 [INFO] L2 cache ID 3: sibling CPUs: [6, 7]
Sep 23 13:02:58 NotXonoticP1 bash[42807]: 07:32:58 [INFO] L3 cache ID 0: sibling CPUs: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
Sep 23 13:02:58 NotXonoticP1 kernel: sched_ext: "bpfland" does not implement cgroup cpu.weight
Sep 23 13:02:58 NotXonoticP1 kernel: sched_ext: BPF scheduler "bpfland" enabled
Sep 23 13:03:01 NotXonoticP1 warp-svc[1057]: 2024-09-23T07:33:01.470Z DEBUG warp::warp_service::network_change: Routes changed:
Sep 23 13:03:01 NotXonoticP1 warp-svc[1057]: NewNeighbour; Destination: 2401:4900:53f8:af97:3892:2fff:fe8e:83ba;
Sep 23 13:03:11 NotXonoticP1 systemd[1]: systemd-hostnamed.service: Deactivated successfully.
Sep 23 13:03:18 NotXonoticP1 systemd-logind[1054]: Power key pressed short.
Sep 23 13:03:18 NotXonoticP1 systemd-logind[1054]: Powering off...
Sep 23 13:03:18 NotXonoticP1 systemd-logind[1054]: System is powering down.
@ptr1337
Copy link
Contributor

ptr1337 commented Sep 23, 2024

Please test with 1b44eca

@arighi
Copy link
Collaborator

arighi commented Sep 23, 2024

Can you post the output of lscpu -e?

@SoulHarsh007
Copy link
Author

Apologies for the late response, here is the lscpu -e output:

CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE    MAXMHZ   MINMHZ       MHZ
  0    0      0    0 0:0:0:0          yes 4280.0000 400.0000 1612.9460
  1    0      0    0 0:0:0:0          yes 4280.0000 400.0000  400.0000
  2    0      0    1 1:1:1:0          yes 4280.0000 400.0000 1676.8750
  3    0      0    1 1:1:1:0          yes 4280.0000 400.0000 2096.2109
  4    0      0    2 2:2:2:0          yes 4280.0000 400.0000 1675.5720
  5    0      0    2 2:2:2:0          yes 4280.0000 400.0000  400.0000
  6    0      0    3 3:3:3:0          yes 4280.0000 400.0000 2093.4351
  7    0      0    3 3:3:3:0          yes 4280.0000 400.0000 2096.2490
  8    0      0    4 4:4:4:0          yes 4280.0000 400.0000 1761.4611
  9    0      0    4 4:4:4:0          yes 4280.0000 400.0000 2089.0779
 10    0      0    5 5:5:5:0          yes 4280.0000 400.0000 1869.4690
 11    0      0    5 5:5:5:0          yes 4280.0000 400.0000 1674.8400

PS: I am yet to test with commit: 1b44eca

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants