Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf: Fix the release of inner map #653

Closed
wants to merge 11 commits into from

Conversation

danielocfb
Copy link
Owner

Pull request for series with
subject: bpf: Fix the release of inner map
version: 1
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=799448

@danielocfb
Copy link
Owner Author

Upstream branch: b4e59c1
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=799448
version: 1

@danielocfb
Copy link
Owner Author

Upstream branch: 3815f89
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=799448
version: 1

@danielocfb danielocfb force-pushed the series/799448=>bpf-next branch from e75f6f6 to 9e91aa7 Compare November 9, 2023 18:44
@danielocfb danielocfb force-pushed the bpf-next_base branch 2 times, most recently from 0fbe9f3 to b0d20d2 Compare November 9, 2023 19:02
@danielocfb
Copy link
Owner Author

Upstream branch: 6f101db
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=799448
version: 1

@danielocfb danielocfb force-pushed the series/799448=>bpf-next branch from 9e91aa7 to 0e780cc Compare November 9, 2023 19:05
@danielocfb danielocfb force-pushed the bpf-next_base branch 2 times, most recently from 1e02037 to a98cbca Compare November 9, 2023 19:19
Hou Tao added 9 commits November 9, 2023 11:22
These three bpf_map_{lookup,update,delete}_elem() helpers are also
available for sleepable bpf program, so add the corresponding lock
assertion for sleepable bpf program, otherwise the following warning
will be reported when a sleepable bpf program manipulates bpf map under
interpreter mode (aka bpf_jit_enable=0):

  WARNING: CPU: 3 PID: 4985 at kernel/bpf/helpers.c:40 ......
  CPU: 3 PID: 4985 Comm: test_progs Not tainted 6.6.0+ #2
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ......
  RIP: 0010:bpf_map_lookup_elem+0x54/0x60
  ......
  Call Trace:
   <TASK>
   ? __warn+0xa5/0x240
   ? bpf_map_lookup_elem+0x54/0x60
   ? report_bug+0x1ba/0x1f0
   ? handle_bug+0x40/0x80
   ? exc_invalid_op+0x18/0x50
   ? asm_exc_invalid_op+0x1b/0x20
   ? __pfx_bpf_map_lookup_elem+0x10/0x10
   ? rcu_lockdep_current_cpu_online+0x65/0xb0
   ? rcu_is_watching+0x23/0x50
   ? bpf_map_lookup_elem+0x54/0x60
   ? __pfx_bpf_map_lookup_elem+0x10/0x10
   ___bpf_prog_run+0x513/0x3b70
   __bpf_prog_run32+0x9d/0xd0
   ? __bpf_prog_enter_sleepable_recur+0xad/0x120
   ? __bpf_prog_enter_sleepable_recur+0x3e/0x120
   bpf_trampoline_6442580665+0x4d/0x1000
   __x64_sys_getpgid+0x5/0x30
   ? do_syscall_64+0x36/0xb0
   entry_SYSCALL_64_after_hwframe+0x6e/0x76
   </TASK>

Signed-off-by: Hou Tao <[email protected]>
There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or
ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two
callbacks.

For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need
rcu-read-lock because array->ptrs will not be freed until the map-in-map
is released. For bpf_fd_htab_map_update_elem(), htab_map_update_elem()
requires rcu-read-lock to be held, so only use rcu_read_lock() during
the invocation of htab_map_update_elem().

Signed-off-by: Hou Tao <[email protected]>
rcu_read_lock() is no longer held when invoking bpf_event_entry_gen()
which is called by perf_event_fd_array_get_ptr(), so using GFP_KERNEL
instead of GFP_ATOMIC to reduce the possibility of failures due to
out-of-memory.

Signed-off-by: Hou Tao <[email protected]>
need_defer tells the implementation to defer the reference release of
the passed element and ensure that the element is still alive before
the bpf program, which may manipulate it, exits.

The following three cases will invoke map_fd_put_ptr() and different
need_defer values will be passed to these callers:

1) release the reference of the old element in the map during map update
   or map deletion. The release must be deferred, otherwise the bpf
   program may incur use-after-free problem, so need_defer needs to be
   true.
2) release the reference of the to-be-added element in the error path of
   map update. The to-be-added element is not visible to any bpf
   program, so it is OK to pass false for need_defer parameter.
3) release the references of all elements in the map during map release.
   Any bpf program which has access to the map must have been exited and
   released, so need_defer=false will be OK.

The need_defer parameter will be used by the following patches to fix the
potential use-after-free problem for map-in-map.

Signed-off-by: Hou Tao <[email protected]>
bpf_map_of_map_fd_get_ptr() will convert the map fd to the pointer
saved in map-in-map. bpf_map_of_map_fd_put_ptr() will release the
pointer saved in map-in-map. These two helpers will be used by the
following patches to fix the use-after-free problems for map-in-map.

Signed-off-by: Hou Tao <[email protected]>
bpf_map_of_map_fd_sys_lookup_elem() returns the map id as the value for
map-in-map. It will be used when userspace applications do lookup
operations for map-in-map.

Signed-off-by: Hou Tao <[email protected]>
When updating or deleting a map in map array, the map may still be
accessed by non-sleepable program or sleepable program. However
bpf_fd_array_map_update_elem() decreases the ref-count of the inner map
directly through bpf_map_put(), if the ref-count is the last ref-count
which is true for most cases, the inner map will be free by
ops->map_free() in a kworker. But for now, most .map_free() callbacks
don't use synchronize_rcu() or its variants to wait for the elapse of a
RCU grace period, so bpf program which is accessing the inner map may
incur use-after-free problem.

Fix it by deferring the invocation of bpf_map_put() after the elapse of
both one RCU grace period and one tasks trace RCU grace period.

Fixes: bba1dc0 ("bpf: Remove redundant synchronize_rcu.")
Fixes: 638e4b8 ("bpf: Allows per-cpu maps and map-in-map in sleepable programs")
Signed-off-by: Hou Tao <[email protected]>
When updating or deleting a map in map htab, the map may still be
accessed by non-sleepable program or sleepable program. However
bpf_fd_htab_map_update_elem() decreases the ref-count of the inner map
directly through bpf_map_put(), if the ref-count is the last ref-count
which is true for most cases, the inner map will be free by
ops->map_free() in a kworker. But for now, most .map_free() callbacks
don't use synchronize_rcu() or its variants to wait for the elapse of a
RCU grace period, so bpf program which is accessing the inner map may
incur use-after-free problem.

Fix it by deferring the invocation of bpf_map_put() after the elapse of
both one RCU grace period and one tasks trace RCU grace period.

Fixes: bba1dc0 ("bpf: Remove redundant synchronize_rcu.")
Fixes: 638e4b8 ("bpf: Allows per-cpu maps and map-in-map in sleepable programs")
Signed-off-by: Hou Tao <[email protected]>
bpf_map_fd_put_ptr() and bpf_map_fd_sys_lookup_elem() are no longer used
for map-in-map, so just remove these two helpers. bpf_map_fd_get_ptr()
is only used internally, so make it be static.

Signed-off-by: Hou Tao <[email protected]>
@danielocfb
Copy link
Owner Author

Upstream branch: e80742d
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=799448
version: 1

Hou Tao added 2 commits November 9, 2023 11:22
Even before the introduction of deferred-map-put for inner map, the
liveness test for inner map is not correct, because inner_map1 and
inner_map2 are released directly in a kworker and there is no need for
invoking kern_sync_rcu().

After deferring the invocation of bpf_map_put() for inner map, the time
when the inner map is released is more complicated: each overwrite of
the inner map will delay the release of the inner map after both one RCU
grace period and one tasks trace RCU grace period. To avoid such
complexity in selftest, just remove the liveness test for inner map.

Signed-off-by: Hou Tao <[email protected]>
Add test cases to test the race between the destroy of inner map due to
map-in-map update and the access of inner map in bpf program. The
following 4 combination are added:
(1) array map in map array + bpf program
(2) array map in map array + sleepable bpf program
(3) array map in map htab + bpf program
(4) array map in map htab + sleepable bpf program

Before apply the fixes, when running "./test_prog -a map_in_map" with
net.core.bpf_jit_enable=0, the following error was reported:

  BUG: KASAN: slab-use-after-free in bpf_map_lookup_elem+0x25/0x60
  Read of size 8 at addr ffff888162fbe000 by task test_progs/3282

  CPU: 4 PID: 3282 Comm: test_progs Not tainted 6.6.0-rc5+ #21
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ......
  Call Trace:
   <TASK>
   dump_stack_lvl+0x4b/0x80
   print_report+0xcf/0x610
   kasan_report+0x9d/0xd0
   __asan_load8+0x7e/0xb0
   bpf_map_lookup_elem+0x25/0x60
   ___bpf_prog_run+0x2569/0x3c50
   __bpf_prog_run32+0xa1/0xe0
   trace_call_bpf+0x1a9/0x5e0
   kprobe_perf_func+0xce/0x450
   kprobe_dispatcher+0xa1/0xb0
   kprobe_ftrace_handler+0x27b/0x370
   0xffffffffc02080f7
  RIP: 0010:__x64_sys_getpgid+0x1/0x30
  ......
   </TASK>

  Allocated by task 3281:
   kasan_save_stack+0x26/0x50
   kasan_set_track+0x25/0x30
   kasan_save_alloc_info+0x1b/0x30
   __kasan_kmalloc+0x84/0xa0
   __kmalloc_node+0x67/0x170
   __bpf_map_area_alloc+0x13f/0x160
   bpf_map_area_alloc+0x10/0x20
   array_map_alloc+0x11d/0x2c0
   map_create+0x285/0xc30
   __sys_bpf+0xcff/0x3350
   __x64_sys_bpf+0x45/0x60
   do_syscall_64+0x33/0x60
   entry_SYSCALL_64_after_hwframe+0x6e/0xd8

  Freed by task 1328:
   kasan_save_stack+0x26/0x50
   kasan_set_track+0x25/0x30
   kasan_save_free_info+0x2b/0x50
   __kasan_slab_free+0x10f/0x1a0
   __kmem_cache_free+0x1df/0x460
   kfree+0x90/0x140
   kvfree+0x2c/0x40
   bpf_map_area_free+0xe/0x20
   array_map_free+0x11f/0x270
   bpf_map_free_deferred+0xda/0x200
   process_scheduled_works+0x689/0xa20
   worker_thread+0x2fd/0x5a0
   kthread+0x1bf/0x200
   ret_from_fork+0x39/0x70
   ret_from_fork_asm+0x1b/0x30

  Last potentially related work creation:
   kasan_save_stack+0x26/0x50
   __kasan_record_aux_stack+0x92/0xa0
   kasan_record_aux_stack_noalloc+0xb/0x20
   insert_work+0x2a/0xc0
   __queue_work+0x2a6/0x8d0
   queue_work_on+0x7c/0x80
   __bpf_map_put+0x103/0x140
   bpf_map_put+0x10/0x20
   bpf_map_fd_put_ptr+0x1e/0x30
   bpf_fd_array_map_update_elem+0x18a/0x1d0
   bpf_map_update_value+0x2ca/0x4b0
   __sys_bpf+0x26ba/0x3350
   __x64_sys_bpf+0x45/0x60
   do_syscall_64+0x33/0x60
   entry_SYSCALL_64_after_hwframe+0x6e/0xd8

Signed-off-by: Hou Tao <[email protected]>
@danielocfb danielocfb force-pushed the series/799448=>bpf-next branch from 0e780cc to 455a1a0 Compare November 9, 2023 19:22
@danielocfb
Copy link
Owner Author

At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=799448 expired. Closing PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant