Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel panic #3673

Open
maxpain opened this issue Nov 12, 2024 · 7 comments
Open

Kernel panic #3673

maxpain opened this issue Nov 12, 2024 · 7 comments
Assignees
Labels
type/bug Something isn't working

Comments

@maxpain
Copy link

maxpain commented Nov 12, 2024

I have kernel panic almost every day on Talos Linux v1.8.2 (Linux 6.6.58).
Talos is deployed on bare metal nodes (Dell R6615) with NVMe SSD.
For the network, I use Broadcom 2x25G (50G in LACP bonding) with MTU 9000 (jumbo frame).

I use an image built on factory.talos.dev:

customization:
    extraKernelArgs:
        - console=ttyS0,115200n8r
        - -lockdown
        - lockdown=integrity
        - cpufreq.default_governor=performance
        - amd_pstate=active
        - mitigations=off
        - iommu=off
    systemExtensions:
        officialExtensions:
            - siderolabs/amd-ucode
            - siderolabs/amdgpu-firmware
            - siderolabs/drbd

For CNI I use Cilium in eBPF mode.

[40145.614353] general protection fault, probably for non-canonical address 0x9e759c37ee555c76: 0000 [#1] SMP PTI
[40145.624361] CPU: 18 PID: 234918 Comm: conn48291 Tainted: G           O       6.6.58-talos #1
[40145.632800] Hardware name: Dell Inc. PowerEdge R6615/067N9T, BIOS 1.9.5 09/12/2024
[40145.640376] RIP: 0010:is_uprobe_at_func_entry+0x28/0x80
[40145.645609] Code: 90 90 0f 1f 44 00 00 65 48 8b 04 25 80 e3 02 00 48 83 b8 30 0b 00 00 00 74 60 48 8b 80 30 0b 00 00 48 8b 50 30 48 85 d2 74 50 <80> 3a 55 b8 01 00 00 00 74 1b 48 8b 8f 88 00 00 00 48 83 f9 33 74
[40145.664366] RSP: 0018:ffffc900007c8bc8 EFLAGS: 00010082
[40145.669599] RAX: ffff88813eafb120 RBX: ffffc900007c8c20 RCX: 00007f116e206296
[40145.676740] RDX: 9e759c37ee555c76 RSI: 0000000000000001 RDI: ffffc90111fa3f58
[40145.683880] RBP: ffffc90111fa3f58 R08: 000000000002aee0 R09: 0000000000000008
[40145.691021] R10: ffffc90111fa0000 R11: ffffc900007c8ff8 R12: 0000000000000000
[40145.698162] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[40145.705303] FS:  00007f113e959700(0000) GS:ffff88defb500000(0000) knlGS:0000000000000000
[40145.713398] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[40145.719155] CR2: 000015b40194c804 CR3: 0000000363b74003 CR4: 0000000000f70ee0
[40145.726294] PKRU: 55555554
[40145.729014] Call Trace:
[40145.731468]  <IRQ>
[40145.733502]  ? die_addr+0x36/0x90
[40145.736836]  ? exc_general_protection+0x217/0x420
[40145.741553]  ? asm_exc_general_protection+0x26/0x30
[40145.746450]  ? is_uprobe_at_func_entry+0x28/0x80
[40145.751083]  perf_callchain_user+0x20a/0x360
[40145.755365]  get_perf_callchain+0x147/0x1d0
[40145.759559]  bpf_get_stackid+0x60/0x90
[40145.763319]  bpf_prog_9aac297fb833e2f5_do_perf_event+0x434/0x53b
[40145.769333]  ? __smp_call_single_queue+0xad/0x120
[40145.774049]  bpf_overflow_handler+0x75/0x110
[40145.778330]  __perf_event_overflow+0x114/0x360
[40145.782787]  perf_swevent_hrtimer+0x134/0x150
[40145.787155]  ? __wake_up_common+0x73/0x180
[40145.791258]  ? timerqueue_del+0x2e/0x50
[40145.795107]  ? __pfx_perf_swevent_hrtimer+0x10/0x10
[40145.799996]  __hrtimer_run_queues+0x118/0x240
[40145.804365]  ? ktime_get_update_offsets_now+0x49/0x110
[40145.809511]  hrtimer_interrupt+0xf8/0x240
[40145.813531]  __sysvec_apic_timer_interrupt+0x4a/0xe0
[40145.818508]  sysvec_apic_timer_interrupt+0x6d/0x90
[40145.823310]  </IRQ>
[40145.825426]  <TASK>
[40145.827537]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[40145.832687] RIP: 0010:__kmem_cache_free+0x1cb/0x350
[40145.837576] Code: 48 85 db 0f 84 00 01 00 00 48 89 c2 48 0f ca 49 33 94 24 b8 00 00 00 48 89 10 49 8b 04 24 65 48 03 05 99 bd 37 61 48 8b 70 08 <4c> 39 68 10 0f 85 0b 01 00 00 48 8b 10 41 8b 44 24 28 48 01 d8 48
[40145.856331] RSP: 0018:ffffc90111fa3b70 EFLAGS: 00000282
[40145.861561] RAX: ffff88defb533910 RBX: ffff88813eafb120 RCX: ffffea0000000000
[40145.868698] RDX: 9e759c37ee555c76 RSI: 0000000000119862 RDI: ffff88810004e200
[40145.875836] RBP: ffffc90111fa3bc0 R08: 0000000000000086 R09: 00007f1153f9f9c0
[40145.882980] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88810004e200
[40145.890120] R13: ffffea0004fabec0 R14: 0000000000000000 R15: 0000000000000000
[40145.897266]  ? uprobe_free_utask+0x62/0x80
[40145.901378]  ? acct_collect+0x4c/0x220
[40145.905141]  uprobe_free_utask+0x62/0x80
[40145.909075]  mm_release+0x12/0xb0
[40145.912401]  do_exit+0x26b/0xaa0
[40145.915643]  __x64_sys_exit+0x1b/0x20
[40145.919317]  do_syscall_64+0x5a/0x80
[40145.922911]  entry_SYSCALL_64_after_hwframe+0x78/0xe2
[40145.927976] RIP: 0033:0x7f116e206296
[40145.931565] Code: 28 06 00 00 0f 84 ec 01 00 00 48 8b 44 24 08 f6 80 08 03 00 00 40 0f 85 7a 01 00 00 ba 3c 00 00 00 0f 1f 00 31 ff 89 d0 0f 05 <eb> f8 48 89 c8 48 c7 00 00 00 00 00 48 8d 48 f8 48 39 d0 75 ed 48
[40145.950321] RSP: 002b:00007f113e958a40 EFLAGS: 00000246 ORIG_RAX: 000000000000003c
[40145.957891] RAX: ffffffffffffffda RBX: 00007f113e859000 RCX: 00007f116e206296
[40145.965033] RDX: 000000000000003c RSI: 00007f1153f9f9c0 RDI: 0000000000000000
[40145.972177] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000056b90006
[40145.979317] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f1149f8925e
[40145.986456] R13: 00007f1149f8925f R14: 00007f113e959700 R15: 00007f113e958b00
[40145.993606]  </TASK>
[40145.995808] Modules linked in: drbd_transport_tcp(O) drbd(O) ahci i40e sp5100_tco bnxt_en amd64_edac megaraid_sas libahci nvme k10temp watchdog
[40146.008673] ---[ end trace 0000000000000000 ]---
[40146.013298] RIP: 0010:is_uprobe_at_func_entry+0x28/0x80
[40146.018531] Code: 90 90 0f 1f 44 00 00 65 48 8b 04 25 80 e3 02 00 48 83 b8 30 0b 00 00 00 74 60 48 8b 80 30 0b 00 00 48 8b 50 30 48 85 d2 74 50 <80> 3a 55 b8 01 00 00 00 74 1b 48 8b 8f 88 00 00 00 48 83 f9 33 74
[40146.037290] RSP: 0018:ffffc900007c8bc8 EFLAGS: 00010082
[40146.042521] RAX: ffff88813eafb120 RBX: ffffc900007c8c20 RCX: 00007f116e206296
[40146.049662] RDX: 9e759c37ee555c76 RSI: 0000000000000001 RDI: ffffc90111fa3f58
[40146.056805] RBP: ffffc90111fa3f58 R08: 000000000002aee0 R09: 0000000000000008
[40146.063946] R10: ffffc90111fa0000 R11: ffffc900007c8ff8 R12: 0000000000000000
[40146.071088] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[40146.078227] FS:  00007f113e959700(0000) GS:ffff88defb500000(0000) knlGS:0000000000000000
[40146.086321] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[40146.092077] CR2: 000015b40194c804 CR3: 0000000363b74003 CR4: 0000000000f70ee0
[40146.099222] PKRU: 55555554
[40146.101943] Kernel panic - not syncing: Fatal exception in interrupt
[40146.108739] Kernel Offset: disabled
[40146.112246] Rebooting in 10 seconds..
@korniltsev
Copy link
Collaborator

Seems like a kernel/ebpf subsystem bug.
Do you use pyroscope.ebpf component in alloy as profiler? Or is it coroot profiling? Which version? Which configuration?

bpf_prog_9aac297fb833e2f5_do_perf_event suggests it could be pyroscope either or coroot

Will you be able to share your kernel+modules so we could try reproducing it?

@korniltsev korniltsev self-assigned this Nov 18, 2024
@maxpain
Copy link
Author

maxpain commented Nov 19, 2024

Or is it coroot profiling?

Yes, It's coroot.
We deploy coroot using the official helm chart with default configuration.
Version of helm chart: 0.15.16

Will you be able to share your kernel+modules so we could try reproducing it?

I use Talos v1.8.2 (Linux 6.6.58) built on factory.talos.dev with following configuration:

customization:
    extraKernelArgs:
        - console=ttyS0,115200n8r
        - -lockdown
        - lockdown=integrity
        - cpufreq.default_governor=performance
        - amd_pstate=active
        - mitigations=off
        - iommu=off
    systemExtensions:
        officialExtensions:
            - siderolabs/amd-ucode
            - siderolabs/amdgpu-firmware
            - siderolabs/drbd

https://factory.talos.dev/image/c4402c8cf9c87bcdc3947f2cc6e9486f413ca69716fa3b0a4c0c9863aafe963f/v1.8.2/metal-amd64-secureboot.iso

@borkmann
Copy link

borkmann commented Jan 9, 2025

@maxpain would you be able to test a kernel patch?

@maxpain
Copy link
Author

maxpain commented Jan 9, 2025

@maxpain would you be able to test a kernel patch?

It would not be easy since I built Talos Linux images using factory.talos.dev and Secure Boot.

I think the panic could be caused by Puppeteer (chrome for developers) pods in my cluster. I can reproduce it on another hardware.

@borkmann
Copy link

borkmann commented Jan 9, 2025

Would you be able to test this one?

From e73a85a3fc1753656aba6d365640b16dca432ae1 Mon Sep 17 00:00:00 2001
From: Daniel Borkmann <[email protected]>
Date: Thu, 9 Jan 2025 09:01:59 +0000
Subject: [PATCH] events: Fix GPF due to corrupted utask->auprobe pointer

Fixes: cfa7f3d2c526 ("perf,x86: avoid missing caller address in stack traces captured in uprobe")
Signed-off-by: Daniel Borkmann <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Jiri Olsa <[email protected]>
Link: https://github.com/grafana/pyroscope/issues/3673
---
 arch/x86/events/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index c75c482d4c52..05f9cedf2691 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2835,6 +2835,8 @@ static bool is_uprobe_at_func_entry(struct pt_regs *regs)

        if (!current->utask)
                return false;
+       if (!current->utask->active_uprobe)
+               return false;

        auprobe = current->utask->auprobe;
        if (!auprobe)
-- 
2.43.0

@olsajiri
Copy link

olsajiri commented Jan 9, 2025

I can reproduce with following running in separate terminals

# while :; do bpftrace -e 'uprobe:/bin/ls:_start  { printf("hit\n"); }' -c ls; done
# bpftrace -e 'profile:hz:100000 { @[ustack()] = count(); }'

looks like we have a fix already, will send out shortly

intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this issue Jan 9, 2025
Max Makarov reported kernel panic [1] in perf user callchain code.

The reason for that is the race between uprobe_free_utask and bpf
profiler code doing the perf user stack unwind and is triggered
within uprobe_free_utask function:
  - after current->utask is freed and
  - before current->utask is set to NULL

 general protection fault, probably for non-canonical address 0x9e759c37ee555c76: 0000 [#1] SMP PTI
 RIP: 0010:is_uprobe_at_func_entry+0x28/0x80
 ...
  ? die_addr+0x36/0x90
  ? exc_general_protection+0x217/0x420
  ? asm_exc_general_protection+0x26/0x30
  ? is_uprobe_at_func_entry+0x28/0x80
  perf_callchain_user+0x20a/0x360
  get_perf_callchain+0x147/0x1d0
  bpf_get_stackid+0x60/0x90
  bpf_prog_9aac297fb833e2f5_do_perf_event+0x434/0x53b
  ? __smp_call_single_queue+0xad/0x120
  bpf_overflow_handler+0x75/0x110
  ...
  asm_sysvec_apic_timer_interrupt+0x1a/0x20
 RIP: 0010:__kmem_cache_free+0x1cb/0x350
 ...
  ? uprobe_free_utask+0x62/0x80
  ? acct_collect+0x4c/0x220
  uprobe_free_utask+0x62/0x80
  mm_release+0x12/0xb0
  do_exit+0x26b/0xaa0
  __x64_sys_exit+0x1b/0x20
  do_syscall_64+0x5a/0x80

It can be easily reproduced by running following commands in
separate terminals:

  # while :; do bpftrace -e 'uprobe:/bin/ls:_start  { printf("hit\n"); }' -c ls; done
  # bpftrace -e 'profile:hz:100000 { @[ustack()] = count(); }'

Fixing this by making sure current->utask pointer is set to NULL
before we start to release the utask object.

[1] grafana/pyroscope#3673
Reported-by: Max Makarov <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
@borkmann
Copy link

borkmann commented Jan 9, 2025

@olsajiri found a better variant which does not incur an additional runtime check, got submitted here: https://lore.kernel.org/bpf/[email protected]/

kernel-patches-daemon-bpf-rc bot pushed a commit to kernel-patches/bpf-rc that referenced this issue Jan 9, 2025
Max Makarov reported kernel panic [1] in perf user callchain code.

The reason for that is the race between uprobe_free_utask and bpf
profiler code doing the perf user stack unwind and is triggered
within uprobe_free_utask function:
  - after current->utask is freed and
  - before current->utask is set to NULL

 general protection fault, probably for non-canonical address 0x9e759c37ee555c76: 0000 [#1] SMP PTI
 RIP: 0010:is_uprobe_at_func_entry+0x28/0x80
 ...
  ? die_addr+0x36/0x90
  ? exc_general_protection+0x217/0x420
  ? asm_exc_general_protection+0x26/0x30
  ? is_uprobe_at_func_entry+0x28/0x80
  perf_callchain_user+0x20a/0x360
  get_perf_callchain+0x147/0x1d0
  bpf_get_stackid+0x60/0x90
  bpf_prog_9aac297fb833e2f5_do_perf_event+0x434/0x53b
  ? __smp_call_single_queue+0xad/0x120
  bpf_overflow_handler+0x75/0x110
  ...
  asm_sysvec_apic_timer_interrupt+0x1a/0x20
 RIP: 0010:__kmem_cache_free+0x1cb/0x350
 ...
  ? uprobe_free_utask+0x62/0x80
  ? acct_collect+0x4c/0x220
  uprobe_free_utask+0x62/0x80
  mm_release+0x12/0xb0
  do_exit+0x26b/0xaa0
  __x64_sys_exit+0x1b/0x20
  do_syscall_64+0x5a/0x80

It can be easily reproduced by running following commands in
separate terminals:

  # while :; do bpftrace -e 'uprobe:/bin/ls:_start  { printf("hit\n"); }' -c ls; done
  # bpftrace -e 'profile:hz:100000 { @[ustack()] = count(); }'

Fixing this by making sure current->utask pointer is set to NULL
before we start to release the utask object.

[1] grafana/pyroscope#3673
Reported-by: Max Makarov <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Oleg Nesterov <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit to kernel-patches/bpf that referenced this issue Jan 9, 2025
Max Makarov reported kernel panic [1] in perf user callchain code.

The reason for that is the race between uprobe_free_utask and bpf
profiler code doing the perf user stack unwind and is triggered
within uprobe_free_utask function:
  - after current->utask is freed and
  - before current->utask is set to NULL

 general protection fault, probably for non-canonical address 0x9e759c37ee555c76: 0000 [#1] SMP PTI
 RIP: 0010:is_uprobe_at_func_entry+0x28/0x80
 ...
  ? die_addr+0x36/0x90
  ? exc_general_protection+0x217/0x420
  ? asm_exc_general_protection+0x26/0x30
  ? is_uprobe_at_func_entry+0x28/0x80
  perf_callchain_user+0x20a/0x360
  get_perf_callchain+0x147/0x1d0
  bpf_get_stackid+0x60/0x90
  bpf_prog_9aac297fb833e2f5_do_perf_event+0x434/0x53b
  ? __smp_call_single_queue+0xad/0x120
  bpf_overflow_handler+0x75/0x110
  ...
  asm_sysvec_apic_timer_interrupt+0x1a/0x20
 RIP: 0010:__kmem_cache_free+0x1cb/0x350
 ...
  ? uprobe_free_utask+0x62/0x80
  ? acct_collect+0x4c/0x220
  uprobe_free_utask+0x62/0x80
  mm_release+0x12/0xb0
  do_exit+0x26b/0xaa0
  __x64_sys_exit+0x1b/0x20
  do_syscall_64+0x5a/0x80

It can be easily reproduced by running following commands in
separate terminals:

  # while :; do bpftrace -e 'uprobe:/bin/ls:_start  { printf("hit\n"); }' -c ls; done
  # bpftrace -e 'profile:hz:100000 { @[ustack()] = count(); }'

Fixing this by making sure current->utask pointer is set to NULL
before we start to release the utask object.

[1] grafana/pyroscope#3673
Reported-by: Max Makarov <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Oleg Nesterov <[email protected]>
kernel-patches-daemon-bpf-rc bot pushed a commit to kernel-patches/bpf-rc that referenced this issue Jan 9, 2025
Max Makarov reported kernel panic [1] in perf user callchain code.

The reason for that is the race between uprobe_free_utask and bpf
profiler code doing the perf user stack unwind and is triggered
within uprobe_free_utask function:
  - after current->utask is freed and
  - before current->utask is set to NULL

 general protection fault, probably for non-canonical address 0x9e759c37ee555c76: 0000 [#1] SMP PTI
 RIP: 0010:is_uprobe_at_func_entry+0x28/0x80
 ...
  ? die_addr+0x36/0x90
  ? exc_general_protection+0x217/0x420
  ? asm_exc_general_protection+0x26/0x30
  ? is_uprobe_at_func_entry+0x28/0x80
  perf_callchain_user+0x20a/0x360
  get_perf_callchain+0x147/0x1d0
  bpf_get_stackid+0x60/0x90
  bpf_prog_9aac297fb833e2f5_do_perf_event+0x434/0x53b
  ? __smp_call_single_queue+0xad/0x120
  bpf_overflow_handler+0x75/0x110
  ...
  asm_sysvec_apic_timer_interrupt+0x1a/0x20
 RIP: 0010:__kmem_cache_free+0x1cb/0x350
 ...
  ? uprobe_free_utask+0x62/0x80
  ? acct_collect+0x4c/0x220
  uprobe_free_utask+0x62/0x80
  mm_release+0x12/0xb0
  do_exit+0x26b/0xaa0
  __x64_sys_exit+0x1b/0x20
  do_syscall_64+0x5a/0x80

It can be easily reproduced by running following commands in
separate terminals:

  # while :; do bpftrace -e 'uprobe:/bin/ls:_start  { printf("hit\n"); }' -c ls; done
  # bpftrace -e 'profile:hz:100000 { @[ustack()] = count(); }'

Fixing this by making sure current->utask pointer is set to NULL
before we start to release the utask object.

[1] grafana/pyroscope#3673
Reported-by: Max Makarov <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Oleg Nesterov <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit to kernel-patches/bpf that referenced this issue Jan 9, 2025
Max Makarov reported kernel panic [1] in perf user callchain code.

The reason for that is the race between uprobe_free_utask and bpf
profiler code doing the perf user stack unwind and is triggered
within uprobe_free_utask function:
  - after current->utask is freed and
  - before current->utask is set to NULL

 general protection fault, probably for non-canonical address 0x9e759c37ee555c76: 0000 [#1] SMP PTI
 RIP: 0010:is_uprobe_at_func_entry+0x28/0x80
 ...
  ? die_addr+0x36/0x90
  ? exc_general_protection+0x217/0x420
  ? asm_exc_general_protection+0x26/0x30
  ? is_uprobe_at_func_entry+0x28/0x80
  perf_callchain_user+0x20a/0x360
  get_perf_callchain+0x147/0x1d0
  bpf_get_stackid+0x60/0x90
  bpf_prog_9aac297fb833e2f5_do_perf_event+0x434/0x53b
  ? __smp_call_single_queue+0xad/0x120
  bpf_overflow_handler+0x75/0x110
  ...
  asm_sysvec_apic_timer_interrupt+0x1a/0x20
 RIP: 0010:__kmem_cache_free+0x1cb/0x350
 ...
  ? uprobe_free_utask+0x62/0x80
  ? acct_collect+0x4c/0x220
  uprobe_free_utask+0x62/0x80
  mm_release+0x12/0xb0
  do_exit+0x26b/0xaa0
  __x64_sys_exit+0x1b/0x20
  do_syscall_64+0x5a/0x80

It can be easily reproduced by running following commands in
separate terminals:

  # while :; do bpftrace -e 'uprobe:/bin/ls:_start  { printf("hit\n"); }' -c ls; done
  # bpftrace -e 'profile:hz:100000 { @[ustack()] = count(); }'

Fixing this by making sure current->utask pointer is set to NULL
before we start to release the utask object.

[1] grafana/pyroscope#3673
Reported-by: Max Makarov <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Oleg Nesterov <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit to kernel-patches/bpf that referenced this issue Jan 10, 2025
Max Makarov reported kernel panic [1] in perf user callchain code.

The reason for that is the race between uprobe_free_utask and bpf
profiler code doing the perf user stack unwind and is triggered
within uprobe_free_utask function:
  - after current->utask is freed and
  - before current->utask is set to NULL

 general protection fault, probably for non-canonical address 0x9e759c37ee555c76: 0000 [#1] SMP PTI
 RIP: 0010:is_uprobe_at_func_entry+0x28/0x80
 ...
  ? die_addr+0x36/0x90
  ? exc_general_protection+0x217/0x420
  ? asm_exc_general_protection+0x26/0x30
  ? is_uprobe_at_func_entry+0x28/0x80
  perf_callchain_user+0x20a/0x360
  get_perf_callchain+0x147/0x1d0
  bpf_get_stackid+0x60/0x90
  bpf_prog_9aac297fb833e2f5_do_perf_event+0x434/0x53b
  ? __smp_call_single_queue+0xad/0x120
  bpf_overflow_handler+0x75/0x110
  ...
  asm_sysvec_apic_timer_interrupt+0x1a/0x20
 RIP: 0010:__kmem_cache_free+0x1cb/0x350
 ...
  ? uprobe_free_utask+0x62/0x80
  ? acct_collect+0x4c/0x220
  uprobe_free_utask+0x62/0x80
  mm_release+0x12/0xb0
  do_exit+0x26b/0xaa0
  __x64_sys_exit+0x1b/0x20
  do_syscall_64+0x5a/0x80

It can be easily reproduced by running following commands in
separate terminals:

  # while :; do bpftrace -e 'uprobe:/bin/ls:_start  { printf("hit\n"); }' -c ls; done
  # bpftrace -e 'profile:hz:100000 { @[ustack()] = count(); }'

Fixing this by making sure current->utask pointer is set to NULL
before we start to release the utask object.

[1] grafana/pyroscope#3673
Reported-by: Max Makarov <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Oleg Nesterov <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
kernel-patches-daemon-bpf-rc bot pushed a commit to kernel-patches/bpf-rc that referenced this issue Jan 10, 2025
Max Makarov reported kernel panic [1] in perf user callchain code.

The reason for that is the race between uprobe_free_utask and bpf
profiler code doing the perf user stack unwind and is triggered
within uprobe_free_utask function:
  - after current->utask is freed and
  - before current->utask is set to NULL

 general protection fault, probably for non-canonical address 0x9e759c37ee555c76: 0000 [#1] SMP PTI
 RIP: 0010:is_uprobe_at_func_entry+0x28/0x80
 ...
  ? die_addr+0x36/0x90
  ? exc_general_protection+0x217/0x420
  ? asm_exc_general_protection+0x26/0x30
  ? is_uprobe_at_func_entry+0x28/0x80
  perf_callchain_user+0x20a/0x360
  get_perf_callchain+0x147/0x1d0
  bpf_get_stackid+0x60/0x90
  bpf_prog_9aac297fb833e2f5_do_perf_event+0x434/0x53b
  ? __smp_call_single_queue+0xad/0x120
  bpf_overflow_handler+0x75/0x110
  ...
  asm_sysvec_apic_timer_interrupt+0x1a/0x20
 RIP: 0010:__kmem_cache_free+0x1cb/0x350
 ...
  ? uprobe_free_utask+0x62/0x80
  ? acct_collect+0x4c/0x220
  uprobe_free_utask+0x62/0x80
  mm_release+0x12/0xb0
  do_exit+0x26b/0xaa0
  __x64_sys_exit+0x1b/0x20
  do_syscall_64+0x5a/0x80

It can be easily reproduced by running following commands in
separate terminals:

  # while :; do bpftrace -e 'uprobe:/bin/ls:_start  { printf("hit\n"); }' -c ls; done
  # bpftrace -e 'profile:hz:100000 { @[ustack()] = count(); }'

Fixing this by making sure current->utask pointer is set to NULL
before we start to release the utask object.

[1] grafana/pyroscope#3673
Reported-by: Max Makarov <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Oleg Nesterov <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this issue Jan 10, 2025
Max Makarov reported kernel panic [1] in perf user callchain code.

The reason for that is the race between uprobe_free_utask and bpf
profiler code doing the perf user stack unwind and is triggered
within uprobe_free_utask function:
  - after current->utask is freed and
  - before current->utask is set to NULL

 general protection fault, probably for non-canonical address 0x9e759c37ee555c76: 0000 [#1] SMP PTI
 RIP: 0010:is_uprobe_at_func_entry+0x28/0x80
 ...
  ? die_addr+0x36/0x90
  ? exc_general_protection+0x217/0x420
  ? asm_exc_general_protection+0x26/0x30
  ? is_uprobe_at_func_entry+0x28/0x80
  perf_callchain_user+0x20a/0x360
  get_perf_callchain+0x147/0x1d0
  bpf_get_stackid+0x60/0x90
  bpf_prog_9aac297fb833e2f5_do_perf_event+0x434/0x53b
  ? __smp_call_single_queue+0xad/0x120
  bpf_overflow_handler+0x75/0x110
  ...
  asm_sysvec_apic_timer_interrupt+0x1a/0x20
 RIP: 0010:__kmem_cache_free+0x1cb/0x350
 ...
  ? uprobe_free_utask+0x62/0x80
  ? acct_collect+0x4c/0x220
  uprobe_free_utask+0x62/0x80
  mm_release+0x12/0xb0
  do_exit+0x26b/0xaa0
  __x64_sys_exit+0x1b/0x20
  do_syscall_64+0x5a/0x80

It can be easily reproduced by running following commands in
separate terminals:

  # while :; do bpftrace -e 'uprobe:/bin/ls:_start  { printf("hit\n"); }' -c ls; done
  # bpftrace -e 'profile:hz:100000 { @[ustack()] = count(); }'

Fixing this by making sure current->utask pointer is set to NULL
before we start to release the utask object.

[1] grafana/pyroscope#3673

Fixes: cfa7f3d ("perf,x86: avoid missing caller address in stack traces captured in uprobe")
Reported-by: Max Makarov <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Oleg Nesterov <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants