Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loongarch64: xdp_dnsrrl bpf program cause CPU stall #70

Open
vincentmli opened this issue Jan 2, 2025 · 3 comments
Open

loongarch64: xdp_dnsrrl bpf program cause CPU stall #70

vincentmli opened this issue Jan 2, 2025 · 3 comments
Assignees

Comments

@vincentmli
Copy link
Owner

vincentmli commented Jan 2, 2025

after reconfigure kernel config with CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y, I am able to load XDP program with xdp-loader, but ran into another issue below where kernel reports CPU stall and loongfire auto reboot. the issue happens to Fedora loongarch release and fedora seems to be in hard lockup.

trigger CPU stall

xdp-loader load red0 -m skb -vvv -P 80 -p /sys/fs/bpf/xdp-dns -n xdp_dns /usr/lib/bpf/xdp_dnsrrl.bpf.o

the xdp_dnsrrl.bpf.o has bpf taill calls, not sure if it is related. other XDP program seems loading fine.
here is the source code of xdp_dnsrrl.bpf.c https://github.com/vincentmli/xdp-tools/blob/master/xdp-dnsrrl/xdp_dnsrrl.bpf.c

Jan  2 11:13:19 loongfire kernel: [ 6668.795062] rcu: INFO: rcu_preempt self-detected stall on CPU
Jan  2 11:13:19 loongfire kernel: [ 6668.795067] rcu: ^I0-....: (1 GPs behind) idle=220c/0/0x3 softirq=22456/22459 fqs=2376
Jan  2 11:13:19 loongfire kernel: [ 6668.795072] rcu: ^I(t=5250 jiffies g=67605 q=771 ncpus=8)
Jan  2 11:13:19 loongfire kernel: [ 6668.795077] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G           O       6.12.5-ipfire #1
Jan  2 11:13:19 loongfire kernel: [ 6668.795080] Tainted: [O]=OOT_MODULE
Jan  2 11:13:19 loongfire kernel: [ 6668.795081] Hardware name: Loongson Loongson-3A6000-7A2000-NUC/Loongson-3A6000-7A2000-NUC, BIOS Loongson-UDK2018-V4.0.05759-stable202405 07/12/24 15:49:14
Jan  2 11:13:19 loongfire kernel: [ 6668.795082] pc ffff800002bda854 ra ffff800002bda170 tp 9000000005ff4000 sp 9000000100147960
Jan  2 11:13:19 loongfire kernel: [ 6668.795084] a0 0000000000000006 a1 900000014a201124 a2 000000000000000e a3 0000000000000002
Jan  2 11:13:19 loongfire kernel: [ 6668.795086] a4 900000012862d068 a5 0000000000000002 a6 0000000000000021 a7 4fde45112e00d8f9
Jan  2 11:13:19 loongfire kernel: [ 6668.795087] t0 9000000100147a40 t1 0000000000000011 t2 0000000000000006 t3 900000014a201102
Jan  2 11:13:19 loongfire kernel: [ 6668.795089] t4 0000000000000022 t5 fffffffffe000000 t6 ffff800000000000 t7 0000800000000000
Jan  2 11:13:19 loongfire kernel: [ 6668.795090] t8 000000000000002f u0 0000000000000000 s9 90000001001479d0 s0 9000000100147a40
Jan  2 11:13:19 loongfire kernel: [ 6668.795092] s1 900000014a2010fa s2 900000014a201144 s3 900000014a201110 s4 9000000100147990
Jan  2 11:13:19 loongfire kernel: [ 6668.795094] s5 0000000000000021 s6 0000000000000008 s7 bea845a188641f8c s8 9000000106d3ec00
Jan  2 11:13:19 loongfire kernel: [ 6668.795095]    ra: ffff800002bda170 bpf_prog_4747e777e0d62339_xdp_dns+0x58/0x768
Jan  2 11:13:19 loongfire kernel: [ 6668.795102]   ERA: ffff800002bda854 bpf_prog_4747e777e0d62339_xdp_dns+0x73c/0x768
Jan  2 11:13:19 loongfire kernel: [ 6668.795103]  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
Jan  2 11:13:19 loongfire kernel: [ 6668.795108]  PRMD: 00000004 (PPLV0 +PIE -PWE)
Jan  2 11:13:19 loongfire kernel: [ 6668.795111]  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
Jan  2 11:13:19 loongfire kernel: [ 6668.795114]  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
Jan  2 11:13:19 loongfire kernel: [ 6668.795117] ESTAT: 00000800 [INT] (IS=11 ECode=0 EsubCode=0)
Jan  2 11:13:19 loongfire kernel: [ 6668.795120]  PRID: 0014d000 (Loongson-64bit, Loongson-3A6000)
Jan  2 11:13:19 loongfire kernel: [ 6668.795121] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G           O       6.12.5-ipfire #1
Jan  2 11:13:19 loongfire kernel: [ 6668.795123] Tainted: [O]=OOT_MODULE
Jan  2 11:13:19 loongfire kernel: [ 6668.795124] Hardware name: Loongson Loongson-3A6000-7A2000-NUC/Loongson-3A6000-7A2000-NUC, BIOS Loongson-UDK2018-V4.0.05759-stable202405 07/12/24 15:49:14
Jan  2 11:13:19 loongfire kernel: [ 6668.795125] Stack : 9000000100147328 0000000000000000 9000000003a942fc 9000000005ff4000
Jan  2 11:13:19 loongfire kernel: [ 6668.795128]         9000000100147250 9000000100147258 0000000000000000 9000000100147398
Jan  2 11:13:19 loongfire kernel: [ 6668.795131]         9000000100147390 9000000100147390 9000000100147110 0000000000000001
Jan  2 11:13:19 loongfire kernel: [ 6668.795134]         0000000000000001 9000000100147258 74f8ce97de2edd92 9000000100293680
Jan  2 11:13:19 loongfire kernel: [ 6668.795136]         80000000ffffe861 00000000ffffe861 0000000000000863 000000000000002d
Jan  2 11:13:19 loongfire kernel: [ 6668.795139]         0000000000000023 0000000000000030 0000000007038000 9000000006243d40
Jan  2 11:13:19 loongfire kernel: [ 6668.795142]         0000000000000000 0000000000000000 90000000057fbeb8 9000000006021000
Jan  2 11:13:19 loongfire kernel: [ 6668.795144]         0000000000000000 9000000006027168 9000000006027000 00000000000000b0
Jan  2 11:13:19 loongfire kernel: [ 6668.795147]         9000000006241000 0000000000000000 9000000003a94314 00007ffff0488020
Jan  2 11:13:19 loongfire kernel: [ 6668.795150]         00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d
Jan  2 11:13:19 loongfire kernel: [ 6668.795152]         ...
Jan  2 11:13:19 loongfire kernel: [ 6668.795153] Call Trace:
Jan  2 11:13:19 loongfire kernel: [ 6668.795155] [<9000000003a94314>] show_stack+0x5c/0x180
Jan  2 11:13:19 loongfire kernel: [ 6668.795160] [<9000000004f153e0>] dump_stack_lvl+0x6c/0xa0
Jan  2 11:13:19 loongfire kernel: [ 6668.795163] [<9000000004efc5c8>] rcu_dump_cpu_stacks+0xf8/0x158
Jan  2 11:13:19 loongfire kernel: [ 6668.795167] [<9000000003b795f0>] rcu_sched_clock_irq+0x728/0x1028
Jan  2 11:13:19 loongfire kernel: [ 6668.795170] [<9000000003b96f68>] update_process_times+0x90/0xf8
Jan  2 11:13:19 loongfire kernel: [ 6668.795174] [<9000000003baa6e8>] tick_nohz_handler+0x90/0x170
Jan  2 11:13:19 loongfire kernel: [ 6668.795178] [<9000000003b97b94>] __hrtimer_run_queues+0x184/0x380
Jan  2 11:13:19 loongfire kernel: [ 6668.795181] [<9000000003b98eb4>] hrtimer_interrupt+0x104/0x2e0
Jan  2 11:13:19 loongfire kernel: [ 6668.795184] [<9000000003a975dc>] constant_timer_interrupt+0x34/0x48
Jan  2 11:13:19 loongfire kernel: [ 6668.795186] [<9000000003b4c320>] __handle_irq_event_percpu+0x60/0x268
Jan  2 11:13:19 loongfire kernel: [ 6668.795189] [<9000000003b4c53c>] handle_irq_event_percpu+0x14/0x80
Jan  2 11:13:19 loongfire kernel: [ 6668.795192] [<9000000003b536c8>] handle_percpu_irq+0x50/0xa0
Jan  2 11:13:19 loongfire kernel: [ 6668.795194] [<9000000003b4b674>] generic_handle_domain_irq+0x24/0x40
Jan  2 11:13:19 loongfire kernel: [ 6668.795196] [<90000000045479c4>] handle_cpu_irq+0x64/0xa0
Jan  2 11:13:19 loongfire kernel: [ 6668.795201] [<9000000004f1582c>] handle_loongarch_irq+0x2c/0x48
Jan  2 11:13:19 loongfire kernel: [ 6668.795203] [<9000000004f15904>] do_vint+0xbc/0xe0
Jan  2 11:13:19 loongfire kernel: [ 6668.795206] [<9000000003a9228c>] handle_vint+0x148/0x1e8
Jan  2 11:13:19 loongfire kernel: [ 6668.795208] [<ffff800002bda854>] bpf_prog_4747e777e0d62339_xdp_dns+0x73c/0x768
Jan  2 11:13:19 loongfire kernel: [ 6668.795210] [<9000000004bf5624>] bpf_prog_run_generic_xdp+0x134/0x438
Jan  2 11:13:19 loongfire kernel: [ 6668.795214] [<9000000004bf5dbc>] do_xdp_generic+0x1e4/0x468
Jan  2 11:13:19 loongfire kernel: [ 6668.795217] [<9000000004bf61f4>] __netif_receive_skb_core.constprop.0+0x1b4/0x10e0
Jan  2 11:13:19 loongfire kernel: [ 6668.795220] [<9000000004bf7224>] __netif_receive_skb_list_core+0x104/0x268
Jan  2 11:13:19 loongfire kernel: [ 6668.795223] [<9000000004bf7a08>] netif_receive_skb_list_internal+0x220/0x368
Jan  2 11:13:19 loongfire kernel: [ 6668.795226] [<9000000004bf84b8>] napi_complete_done+0xa8/0x280
Jan  2 11:13:19 loongfire kernel: [ 6668.795230] [<ffff8000021ed008>] fxgmac_one_poll_rx+0x60/0x98 [yt6801]
Jan  2 11:13:19 loongfire kernel: [ 6668.795278] [<9000000004bf86cc>] __napi_poll+0x3c/0x230
Jan  2 11:13:19 loongfire kernel: [ 6668.795281] [<9000000004bf8d54>] net_rx_action+0x1a4/0x310
Jan  2 11:13:19 loongfire kernel: [ 6668.795284] [<9000000003abe1e8>] handle_softirqs+0x128/0x3b0
Jan  2 11:13:19 loongfire kernel: [ 6668.795287] [<9000000003abe7c8>] irq_exit_rcu+0x98/0xf0
Jan  2 11:13:19 loongfire kernel: [ 6668.795290] [<9000000004f158c4>] do_vint+0x7c/0xe0
Jan  2 11:13:19 loongfire kernel: [ 6668.795292] [<9000000003a92140>] __arch_cpu_idle+0x20/0x24
Jan  2 11:13:19 loongfire kernel: [ 6668.795294] [<9000000004f17c08>] arch_cpu_idle+0x18/0x50
Jan  2 11:13:19 loongfire kernel: [ 6668.795297] [<9000000004f17d9c>] default_idle_call+0x1c/0x110
Jan  2 11:13:19 loongfire kernel: [ 6668.795300] [<9000000003b1dc30>] do_idle+0xb8/0x130
Jan  2 11:13:19 loongfire kernel: [ 6668.795303] [<9000000003b1def4>] cpu_startup_entry+0x2c/0x38
Jan  2 11:13:19 loongfire kernel: [ 6668.795306] [<9000000004f18240>] kernel_entry_end+0xdc/0xe0
Jan  2 11:13:19 loongfire kernel: [ 6668.795309] [<9000000004f30e4c>] start_kernel+0x6f0/0x6f4
Jan  2 11:13:19 loongfire kernel: [ 6668.795312] [<9000000004f180f0>] kernel_entry+0xf0/0xf8
Jan  2 11:13:19 loongfire kernel: [ 6668.795315]
@vincentmli
Copy link
Owner Author

vincentmli commented Jan 3, 2025

tried upstream kernel 6.13.0-rc5 on fedora, hard lockup, not responsive to keyboard, here is the kernel config https://github.com/vincentmli/BPFire/blob/loongfire/config/kernel/kernel.config.loongarch64-ipfire

@vincentmli
Copy link
Owner Author

loongarch maintainer suggested to try disable jit echo 0 > /proc/sys/net/core/bpf_jit_enable, and the issue disappear after disable jit.

@vincentmli vincentmli self-assigned this Jan 3, 2025
@vincentmli
Copy link
Owner Author

I have simplified the XDP program to reproduce the issue https://github.com/vincentmli/xdp-tools/blob/loongjit/xdp-dnsrrl/xdp_dnsrrl.bpf.c, if apply xdp-project/xdp-tools@4d7b3c1 change, it does not trigger loongson bpf jit bug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant