-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpf: Fix the release of inner map #653
Conversation
Upstream branch: b4e59c1 |
902178a
to
bc1b833
Compare
Upstream branch: 3815f89 |
e75f6f6
to
9e91aa7
Compare
0fbe9f3
to
b0d20d2
Compare
Upstream branch: 6f101db |
9e91aa7
to
0e780cc
Compare
1e02037
to
a98cbca
Compare
These three bpf_map_{lookup,update,delete}_elem() helpers are also available for sleepable bpf program, so add the corresponding lock assertion for sleepable bpf program, otherwise the following warning will be reported when a sleepable bpf program manipulates bpf map under interpreter mode (aka bpf_jit_enable=0): WARNING: CPU: 3 PID: 4985 at kernel/bpf/helpers.c:40 ...... CPU: 3 PID: 4985 Comm: test_progs Not tainted 6.6.0+ #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...... RIP: 0010:bpf_map_lookup_elem+0x54/0x60 ...... Call Trace: <TASK> ? __warn+0xa5/0x240 ? bpf_map_lookup_elem+0x54/0x60 ? report_bug+0x1ba/0x1f0 ? handle_bug+0x40/0x80 ? exc_invalid_op+0x18/0x50 ? asm_exc_invalid_op+0x1b/0x20 ? __pfx_bpf_map_lookup_elem+0x10/0x10 ? rcu_lockdep_current_cpu_online+0x65/0xb0 ? rcu_is_watching+0x23/0x50 ? bpf_map_lookup_elem+0x54/0x60 ? __pfx_bpf_map_lookup_elem+0x10/0x10 ___bpf_prog_run+0x513/0x3b70 __bpf_prog_run32+0x9d/0xd0 ? __bpf_prog_enter_sleepable_recur+0xad/0x120 ? __bpf_prog_enter_sleepable_recur+0x3e/0x120 bpf_trampoline_6442580665+0x4d/0x1000 __x64_sys_getpgid+0x5/0x30 ? do_syscall_64+0x36/0xb0 entry_SYSCALL_64_after_hwframe+0x6e/0x76 </TASK> Signed-off-by: Hou Tao <[email protected]>
There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two callbacks. For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need rcu-read-lock because array->ptrs will not be freed until the map-in-map is released. For bpf_fd_htab_map_update_elem(), htab_map_update_elem() requires rcu-read-lock to be held, so only use rcu_read_lock() during the invocation of htab_map_update_elem(). Signed-off-by: Hou Tao <[email protected]>
rcu_read_lock() is no longer held when invoking bpf_event_entry_gen() which is called by perf_event_fd_array_get_ptr(), so using GFP_KERNEL instead of GFP_ATOMIC to reduce the possibility of failures due to out-of-memory. Signed-off-by: Hou Tao <[email protected]>
need_defer tells the implementation to defer the reference release of the passed element and ensure that the element is still alive before the bpf program, which may manipulate it, exits. The following three cases will invoke map_fd_put_ptr() and different need_defer values will be passed to these callers: 1) release the reference of the old element in the map during map update or map deletion. The release must be deferred, otherwise the bpf program may incur use-after-free problem, so need_defer needs to be true. 2) release the reference of the to-be-added element in the error path of map update. The to-be-added element is not visible to any bpf program, so it is OK to pass false for need_defer parameter. 3) release the references of all elements in the map during map release. Any bpf program which has access to the map must have been exited and released, so need_defer=false will be OK. The need_defer parameter will be used by the following patches to fix the potential use-after-free problem for map-in-map. Signed-off-by: Hou Tao <[email protected]>
bpf_map_of_map_fd_get_ptr() will convert the map fd to the pointer saved in map-in-map. bpf_map_of_map_fd_put_ptr() will release the pointer saved in map-in-map. These two helpers will be used by the following patches to fix the use-after-free problems for map-in-map. Signed-off-by: Hou Tao <[email protected]>
bpf_map_of_map_fd_sys_lookup_elem() returns the map id as the value for map-in-map. It will be used when userspace applications do lookup operations for map-in-map. Signed-off-by: Hou Tao <[email protected]>
When updating or deleting a map in map array, the map may still be accessed by non-sleepable program or sleepable program. However bpf_fd_array_map_update_elem() decreases the ref-count of the inner map directly through bpf_map_put(), if the ref-count is the last ref-count which is true for most cases, the inner map will be free by ops->map_free() in a kworker. But for now, most .map_free() callbacks don't use synchronize_rcu() or its variants to wait for the elapse of a RCU grace period, so bpf program which is accessing the inner map may incur use-after-free problem. Fix it by deferring the invocation of bpf_map_put() after the elapse of both one RCU grace period and one tasks trace RCU grace period. Fixes: bba1dc0 ("bpf: Remove redundant synchronize_rcu.") Fixes: 638e4b8 ("bpf: Allows per-cpu maps and map-in-map in sleepable programs") Signed-off-by: Hou Tao <[email protected]>
When updating or deleting a map in map htab, the map may still be accessed by non-sleepable program or sleepable program. However bpf_fd_htab_map_update_elem() decreases the ref-count of the inner map directly through bpf_map_put(), if the ref-count is the last ref-count which is true for most cases, the inner map will be free by ops->map_free() in a kworker. But for now, most .map_free() callbacks don't use synchronize_rcu() or its variants to wait for the elapse of a RCU grace period, so bpf program which is accessing the inner map may incur use-after-free problem. Fix it by deferring the invocation of bpf_map_put() after the elapse of both one RCU grace period and one tasks trace RCU grace period. Fixes: bba1dc0 ("bpf: Remove redundant synchronize_rcu.") Fixes: 638e4b8 ("bpf: Allows per-cpu maps and map-in-map in sleepable programs") Signed-off-by: Hou Tao <[email protected]>
bpf_map_fd_put_ptr() and bpf_map_fd_sys_lookup_elem() are no longer used for map-in-map, so just remove these two helpers. bpf_map_fd_get_ptr() is only used internally, so make it be static. Signed-off-by: Hou Tao <[email protected]>
Upstream branch: e80742d |
Even before the introduction of deferred-map-put for inner map, the liveness test for inner map is not correct, because inner_map1 and inner_map2 are released directly in a kworker and there is no need for invoking kern_sync_rcu(). After deferring the invocation of bpf_map_put() for inner map, the time when the inner map is released is more complicated: each overwrite of the inner map will delay the release of the inner map after both one RCU grace period and one tasks trace RCU grace period. To avoid such complexity in selftest, just remove the liveness test for inner map. Signed-off-by: Hou Tao <[email protected]>
Add test cases to test the race between the destroy of inner map due to map-in-map update and the access of inner map in bpf program. The following 4 combination are added: (1) array map in map array + bpf program (2) array map in map array + sleepable bpf program (3) array map in map htab + bpf program (4) array map in map htab + sleepable bpf program Before apply the fixes, when running "./test_prog -a map_in_map" with net.core.bpf_jit_enable=0, the following error was reported: BUG: KASAN: slab-use-after-free in bpf_map_lookup_elem+0x25/0x60 Read of size 8 at addr ffff888162fbe000 by task test_progs/3282 CPU: 4 PID: 3282 Comm: test_progs Not tainted 6.6.0-rc5+ #21 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...... Call Trace: <TASK> dump_stack_lvl+0x4b/0x80 print_report+0xcf/0x610 kasan_report+0x9d/0xd0 __asan_load8+0x7e/0xb0 bpf_map_lookup_elem+0x25/0x60 ___bpf_prog_run+0x2569/0x3c50 __bpf_prog_run32+0xa1/0xe0 trace_call_bpf+0x1a9/0x5e0 kprobe_perf_func+0xce/0x450 kprobe_dispatcher+0xa1/0xb0 kprobe_ftrace_handler+0x27b/0x370 0xffffffffc02080f7 RIP: 0010:__x64_sys_getpgid+0x1/0x30 ...... </TASK> Allocated by task 3281: kasan_save_stack+0x26/0x50 kasan_set_track+0x25/0x30 kasan_save_alloc_info+0x1b/0x30 __kasan_kmalloc+0x84/0xa0 __kmalloc_node+0x67/0x170 __bpf_map_area_alloc+0x13f/0x160 bpf_map_area_alloc+0x10/0x20 array_map_alloc+0x11d/0x2c0 map_create+0x285/0xc30 __sys_bpf+0xcff/0x3350 __x64_sys_bpf+0x45/0x60 do_syscall_64+0x33/0x60 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Freed by task 1328: kasan_save_stack+0x26/0x50 kasan_set_track+0x25/0x30 kasan_save_free_info+0x2b/0x50 __kasan_slab_free+0x10f/0x1a0 __kmem_cache_free+0x1df/0x460 kfree+0x90/0x140 kvfree+0x2c/0x40 bpf_map_area_free+0xe/0x20 array_map_free+0x11f/0x270 bpf_map_free_deferred+0xda/0x200 process_scheduled_works+0x689/0xa20 worker_thread+0x2fd/0x5a0 kthread+0x1bf/0x200 ret_from_fork+0x39/0x70 ret_from_fork_asm+0x1b/0x30 Last potentially related work creation: kasan_save_stack+0x26/0x50 __kasan_record_aux_stack+0x92/0xa0 kasan_record_aux_stack_noalloc+0xb/0x20 insert_work+0x2a/0xc0 __queue_work+0x2a6/0x8d0 queue_work_on+0x7c/0x80 __bpf_map_put+0x103/0x140 bpf_map_put+0x10/0x20 bpf_map_fd_put_ptr+0x1e/0x30 bpf_fd_array_map_update_elem+0x18a/0x1d0 bpf_map_update_value+0x2ca/0x4b0 __sys_bpf+0x26ba/0x3350 __x64_sys_bpf+0x45/0x60 do_syscall_64+0x33/0x60 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Signed-off-by: Hou Tao <[email protected]>
0e780cc
to
455a1a0
Compare
At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=799448 expired. Closing PR. |
Pull request for series with
subject: bpf: Fix the release of inner map
version: 1
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=799448