Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v6.9-rc3-scx1 #21

Merged
merged 991 commits into from
Apr 12, 2024
Merged

v6.9-rc3-scx1 #21

merged 991 commits into from
Apr 12, 2024

Conversation

Byte-Lab
Copy link
Contributor

Some nice features were released this week, so let's do a release for rc3.

All selftests pass

Michael-zy2000 and others added 30 commits April 2, 2024 15:54
… and resume

We got a headphone detection issue after suspend and resume.
And we fixed it by modifying the configuration at es8326_suspend
and invoke es8326_irq at es8326_resume.

Signed-off-by: Zhang Yi <[email protected]>
Link: https://msgid.link/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
We removed the configuration of ES8326_ADC_SCALE
in es8326_jack_detect_handler because user changed
the configuration by snd_controls

Signed-off-by: Zhang Yi <[email protected]>
Link: https://msgid.link/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Modpost warns about missing module description, add it.

Reviewed-by: Cezary Rojewski <[email protected]>
Signed-off-by: Amadeusz Sławiński <[email protected]>
Link: https://msgid.link/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Linux 6.9 made the nvme multipath nodes not properly pick up changes when
the LBA size goes smaller after an nvme format.  This is because we now
try to inherit the queue settings for the multipath node entirely from
the individual paths.  That is the right thing to do for I/O size
limitations, which make up most of the queue limits, but it is wrong for
changes to the namespace configuration, where we do want to pick up the
new format, which will eventually show up on all paths once they are
re-queried.

Fix this by not inheriting the block size and related fields and always
for updating them.

Fixes: 8f03cfa ("nvme: don't use nvme_update_disk_info for the multipath disk")
Reported-by: Nilay Shroff <[email protected]>
Tested-by: Nilay Shroff <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
Avoid potential use-after-free bugs when walking DFS referrals,
mounting and performing DFS failover by ensuring that all children
from parent @tcon->ses are also refcounted.  They're all needed across
the entire DFS mount.  Get rid of @tcon->dfs_ses_list while we're at
it, too.

Cc: [email protected] # 6.4+
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]>
Signed-off-by: Steve French <[email protected]>
Avoid refreshing DFS referral with refpath_lock acquired as the I/O
could block for a while due to a potentially disconnected or slow DFS
root server and then making other threads - that use same @server and
don't require a DFS root server - unable to make any progress.

Cc: [email protected] # 6.4+
Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]>
Signed-off-by: Steve French <[email protected]>
The tcons created by cifs_construct_tcon() on multiuser mounts must
also be able to failover and refresh DFS referrals, so set the
appropriate fields in order to get a full DFS tcon.  They could be
shared among different superblocks later, too.

Cc: [email protected] # 6.4+
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]>
Signed-off-by: Steve French <[email protected]>
Serialise cifs_construct_tcon() with cifs_mount_mutex to handle
parallel mounts that may end up reusing the session and tcon created
by it.

Cc: [email protected] # 6.4+
Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]>
Signed-off-by: Steve French <[email protected]>
nvme_update_zone_info does (admin queue) I/O to the device and can fail.
We fail to abort the queue limits update if that happen, but really
should avoid with the frozen I/O queue as much as possible anyway.

Split the logic into a helper to query the information that can be
called on an unfrozen queue and one to apply it to the queue limits.

Fixes: 9b130d681443 ("nvme: use the atomic queue limits update API")
Reported-by: Kanchan Joshi <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Kanchan Joshi <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
…kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.9, part #1

 - Ensure perf events programmed to count during guest execution
   are actually enabled before entering the guest in the nVHE
   configuration.

 - Restore out-of-range handler for stage-2 translation faults.

 - Several fixes to stage-2 TLB invalidations to avoid stale
   translations, possibly including partial walk caches.

 - Fix early handling of architectural VHE-only systems to ensure E2H is
   appropriately set.

 - Correct a format specifier warning in the arch_timer selftest.

 - Make the KVM banner message correctly handle all of the possible
   configurations.
… into HEAD

KVM/riscv fixes for 6.9, take #1

- Fix spelling mistake in arch_timer selftest
- Remove redundant semicolon in num_isa_ext_regs()
- Fix APLIC setipnum_le/be write emulation
- Fix APLIC in_clrip[x] read emulation
mean_and_variance_test_2 and mean_and_variance_test_4 always fail.
The input parameters to those tests are identical to the input parameters
to tests 1 and 3, yet the expected result for tests 2 and 4 is different
for the mean and stddev tests. That will always fail.

     Expected mean_and_variance_get_mean(mv) == mean[i], but
        mean_and_variance_get_mean(mv) == 22 (0x16)
        mean[i] == 10 (0xa)

Drop the bad tests.

Fixes: 65bc410 ("mean and variance: More tests")
Closes: https://lore.kernel.org/lkml/[email protected]/
Cc: Kent Overstreet <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
…hefs

Pull bcachefs fixes from Kent Overstreet:
 "Lots of fixes for situations with extreme filesystem damage.

  One fix ("Fix journal pins in btree write buffer") applicable to
  normal usage; also a dio performance fix.

  New repair/construction code is in the final stages, should be ready
  in about a week. Anyone that lost btree interior nodes (or a variety
  of other damage) as a result of the splitbrain bug will be able to
  repair then"

* tag 'bcachefs-2024-04-01' of https://evilpiepirate.org/git/bcachefs: (32 commits)
  bcachefs: On emergency shutdown, print out current journal sequence number
  bcachefs: Fix overlapping extent repair
  bcachefs: Fix remove_dirent()
  bcachefs: Logged op errors should be ignored
  bcachefs: Improve -o norecovery; opts.recovery_pass_limit
  bcachefs: bch2_run_explicit_recovery_pass_persistent()
  bcachefs: Ensure bch_sb_field_ext always exists
  bcachefs: Flush journal immediately after replay if we did early repair
  bcachefs: Resume logged ops after fsck
  bcachefs: Add error messages to logged ops fns
  bcachefs: Split out recovery_passes.c
  bcachefs: fix backpointer for missing alloc key msg
  bcachefs: Fix bch2_btree_increase_depth()
  bcachefs: Kill bch2_bkey_ptr_data_type()
  bcachefs: Fix use after free in check_root_trans()
  bcachefs: Fix repair path for missing indirect extents
  bcachefs: Fix use after free in bch2_check_fix_ptrs()
  bcachefs: Fix btree node keys accounting in topology repair path
  bcachefs: Check btree ptr min_key in .invalid
  bcachefs: add REQ_SYNC and REQ_IDLE in write dio
  ...
SCX_OPS_SWITCH_PARTIAL was added recently but assigned the first bit
shifting all the other values. This breaks backward compatibility as flag
values are hard coded into e.g. sched_ext_ops definitions. Let's reorder the
enums so that the existing values aren't shifted around.
instead of btf__find_by_name_kind() w/ BTF_KIND_ENUM as it also needs to
read BTF_KIND_ENUM64's.
Some laptops where the thermal control is handled by the EC may
provide trip points that fail the kernels new validation, but still have
working temperature sensors. An example of this is the Framework 13 AMD.

This patch allows the thermal zone to still be registered without trip
points if the trip points fail validation, allowing the temperature
sensor to be viewed and used by the user.

Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218586
Fixes: 9c86472 ("ACPI: thermal: Use library functions to obtain trip point temperature values")
Signed-off-by: Stephen Horvath <[email protected]>
[ rjw: Subject edits, remove redundant braces ]
Signed-off-by: Rafael J. Wysocki <[email protected]>
Pull documentation fixes from Jonathan Corbet:
 "Four small documentation fixes"

* tag 'docs-6.9-fixes' of git://git.lwn.net/linux:
  docs: zswap: fix shell command format
  tracing: Fix documentation on tp_printk cmdline option
  docs: Fix bitfield handling in kernel-doc
  Documentation: dev-tools: Add link to RV docs
Merge series from Zhang Yi <[email protected]>:

We solved some issues related to headphone detection.And for using
the same configuration in different power conditions,we modified the
clock table
Print out the mode as a string, and also print out the btree and
watermark.

Signed-off-by: Kent Overstreet <[email protected]>
In the discard worker, we were failing to validate the bucket state -
meaning a corrupt needs_discard btree could cause us to discard a bucket
that we shouldn't.

If check_alloc_info hasn't run yet we just want to bail out, otherwise
it's a filesystem inconsistent error.

Signed-off-by: Kent Overstreet <[email protected]>
When the ax25 device is detaching, the ax25_dev_device_down()
calls ax25_ds_del_timer() to cleanup the slave_timer. When
the timer handler is running, the ax25_ds_del_timer() that
calls del_timer() in it will return directly. As a result,
the use-after-free bugs could happen, one of the scenarios
is shown below:

      (Thread 1)          |      (Thread 2)
                          | ax25_ds_timeout()
ax25_dev_device_down()    |
  ax25_ds_del_timer()     |
    del_timer()           |
  ax25_dev_put() //FREE   |
                          |  ax25_dev-> //USE

In order to mitigate bugs, when the device is detaching, use
timer_shutdown_sync() to stop the timer.

Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: Duoming Zhou <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Commit 82dfb54 ("VSOCK: Add virtio vsock vsockmon hooks") added
virtio_transport_deliver_tap_pkt() for handing packets to the
vsockmon device. However, in virtio_transport_send_pkt_work(),
the function is called before actually sending the packet (i.e.
before placing it in the virtqueue with virtqueue_add_sgs() and checking
whether it returned successfully).
Queuing the packet in the virtqueue can fail even multiple times.
However, in virtio_transport_deliver_tap_pkt() we deliver the packet
to the monitoring tap interface only the first time we call it.
This certainly avoids seeing the same packet replicated multiple times
in the monitoring interface, but it can show the packet sent with the
wrong timestamp or even before we succeed to queue it in the virtqueue.

Move virtio_transport_deliver_tap_pkt() after calling virtqueue_add_sgs()
and making sure it returned successfully.

Fixes: 82dfb54 ("VSOCK: Add virtio vsock vsockmon hooks")
Cc: [email protected]
Signed-off-by: Marco Pinna <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Just rely on the xarray for any kind of bgid. This simplifies things, and
it really doesn't bring us much, if anything.

Cc: [email protected] # v6.4+
Signed-off-by: Jens Axboe <[email protected]>
Now that xarray is being exclusively used for the buffer_list lookup,
this check is no longer needed. Get rid of it and the is_ready member.

Cc: [email protected] # v6.4+
Signed-off-by: Jens Axboe <[email protected]>
No functional changes in this patch, just in preparation for being able
to keep the buffer list alive outside of the ctx->uring_lock.

Cc: [email protected] # v6.4+
Signed-off-by: Jens Axboe <[email protected]>
If we look up the kbuf, ensure that it doesn't get unregistered until
after we're done with it. Since we're inside mmap, we cannot safely use
the io_uring lock. Rely on the fact that we can lookup the buffer list
under RCU now and grab a reference to it, preventing it from being
unregistered until we're done with it. The lookup returns the
io_buffer_list directly with it referenced.

Cc: [email protected] # v6.4+
Fixes: 5cf4f52 ("io_uring: free io_buffer_list entries via RCU")
Signed-off-by: Jens Axboe <[email protected]>
On some boards with this chip version the BIOS is buggy and misses
to reset the PHY page selector. This results in the PHY ID read
accessing registers on a different page, returning a more or
less random value. Fix this by resetting the page selector first.

Fixes: f1e911d ("r8169: add basic phylib support")
Cc: [email protected]
Signed-off-by: Heiner Kallweit <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
syzkaller reported infinite recursive calls of fib6_dump_done() during
netlink socket destruction.  [1]

From the log, syzkaller sent an AF_UNSPEC RTM_GETROUTE message, and then
the response was generated.  The following recvmmsg() resumed the dump
for IPv6, but the first call of inet6_dump_fib() failed at kzalloc() due
to the fault injection.  [0]

  12:01:34 executing program 3:
  r0 = socket$nl_route(0x10, 0x3, 0x0)
  sendmsg$nl_route(r0, ... snip ...)
  recvmmsg(r0, ... snip ...) (fail_nth: 8)

Here, fib6_dump_done() was set to nlk_sk(sk)->cb.done, and the next call
of inet6_dump_fib() set it to nlk_sk(sk)->cb.args[3].  syzkaller stopped
receiving the response halfway through, and finally netlink_sock_destruct()
called nlk_sk(sk)->cb.done().

fib6_dump_done() calls fib6_dump_end() and nlk_sk(sk)->cb.done() if it
is still not NULL.  fib6_dump_end() rewrites nlk_sk(sk)->cb.done() by
nlk_sk(sk)->cb.args[3], but it has the same function, not NULL, calling
itself recursively and hitting the stack guard page.

To avoid the issue, let's set the destructor after kzalloc().

[0]:
FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
CPU: 1 PID: 432110 Comm: syz-executor.3 Not tainted 6.8.0-12821-g537c2e91d354-dirty #11
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl (lib/dump_stack.c:117)
 should_fail_ex (lib/fault-inject.c:52 lib/fault-inject.c:153)
 should_failslab (mm/slub.c:3733)
 kmalloc_trace (mm/slub.c:3748 mm/slub.c:3827 mm/slub.c:3992)
 inet6_dump_fib (./include/linux/slab.h:628 ./include/linux/slab.h:749 net/ipv6/ip6_fib.c:662)
 rtnl_dump_all (net/core/rtnetlink.c:4029)
 netlink_dump (net/netlink/af_netlink.c:2269)
 netlink_recvmsg (net/netlink/af_netlink.c:1988)
 ____sys_recvmsg (net/socket.c:1046 net/socket.c:2801)
 ___sys_recvmsg (net/socket.c:2846)
 do_recvmmsg (net/socket.c:2943)
 __x64_sys_recvmmsg (net/socket.c:3041 net/socket.c:3034 net/socket.c:3034)

[1]:
BUG: TASK stack guard page was hit at 00000000f2fa9af1 (stack is 00000000b7912430..000000009a436beb)
stack guard page: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 223719 Comm: kworker/1:3 Not tainted 6.8.0-12821-g537c2e91d354-dirty #11
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: events netlink_sock_destruct_work
RIP: 0010:fib6_dump_done (net/ipv6/ip6_fib.c:570)
Code: 3c 24 e8 f3 e9 51 fd e9 28 fd ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 41 57 41 56 41 55 41 54 55 48 89 fd <53> 48 8d 5d 60 e8 b6 4d 07 fd 48 89 da 48 b8 00 00 00 00 00 fc ff
RSP: 0018:ffffc9000d980000 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffffffff84405990 RCX: ffffffff844059d3
RDX: ffff8881028e0000 RSI: ffffffff84405ac2 RDI: ffff88810c02f358
RBP: ffff88810c02f358 R08: 0000000000000007 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000224 R12: 0000000000000000
R13: ffff888007c82c78 R14: ffff888007c82c68 R15: ffff888007c82c68
FS:  0000000000000000(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc9000d97fff8 CR3: 0000000102309002 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
 <#DF>
 </#DF>
 <TASK>
 fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1))
 fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1))
 ...
 fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1))
 fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1))
 netlink_sock_destruct (net/netlink/af_netlink.c:401)
 __sk_destruct (net/core/sock.c:2177 (discriminator 2))
 sk_destruct (net/core/sock.c:2224)
 __sk_free (net/core/sock.c:2235)
 sk_free (net/core/sock.c:2246)
 process_one_work (kernel/workqueue.c:3259)
 worker_thread (kernel/workqueue.c:3329 kernel/workqueue.c:3416)
 kthread (kernel/kthread.c:388)
 ret_from_fork (arch/x86/kernel/process.c:153)
 ret_from_fork_asm (arch/x86/entry/entry_64.S:256)
Modules linked in:

Fixes: 1da177e ("Linux-2.6.12-rc2")
Reported-by: syzkaller <[email protected]>
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
htejun and others added 25 commits April 10, 2024 08:30
So that they are in the same order as the definitions in ext.c. No
functional changes.
So that the callers don't have to do it explicitly.
There's no reason to keep the defs of sched_ext_ops and friends in
include/linux/sched/ext.h. They're exposed only through vmlinux.h which
doesn't care where the ops is defined. Let's move all the things which don't
need to be shared in the kernel tree into ext.c. No functional change.

So that the callers don't have to do it explicitly.
…and scx_put_cpumask()

These are useful to have in general. scx_put_cpumask() is a bit sad but
hopefully BPF will develop a way to annotate static lifetime at some point
and we won't need these.
scx: Reorg code a bit and add possible/online cpumask helpers
Update the copyright in a selftest, and make a comment for an exit_code
field a bit more generic to reflect that exit_code can be defined when
gracefully exiting from the main kernel, not just BPF. Finally, update a
pr_err message to print the correct path to the sched_ext sysfs node.

Signed-off-by: David Vernet <[email protected]>
We currently have a possibly tricky race w.r.t. hotplug that schedulers
don't have a good way to account for. Once a scheduler has inspected a
host topology, if a hotplug event occurs before a scheduler is attached
and loaded, then the scheduler will have no way of knowing that its view
of the host topology is incorrect. Hotplug events _after_ this are fine,
as we'll either pass the events to the scheduler, or evict the scheduler
directly. But if a hotplug event happens between inspecting the host
topology and attaching the scheduler, we have a problem.

To address this, we can use a monotonically increasing hotplug sequence
number that is incremented any time a hotplug event occurs, and expose
it through a sysfs node in /sys/kernel/sched_ext/. Using this, a user
space scheduler can look at the sequence number before loading, and then
compare it to the sequence number during attach to see if a hotplug
event occurred. If so, we can fail to attach, and return to user space.

This patch adds the aforementioned sysfs node. A subsequent patch will
update the struct sched_ext_ops and the attach path to check this value
to ensure that a hotplug event hasn't occurred.

Signed-off-by: David Vernet <[email protected]>
We'll need to have a hotplug sequence number in struct sched_ext_ops if
we want to enable user space to deterministically detect a hotplug event
between reading a host's topology, and attaching its scheduler.

A prior change added a global hotplug sequence number and exported it
through a sysfs file. This one connects the two by also adding logic to
fail to attach if there is a mismatch between the two. A subsequent
patch will add tests.

Signed-off-by: David Vernet <[email protected]>
Now that we have the hotplug sequence number, schedulers can set the
sequence number when opening the skeleton to detect hotplug events. In
order to provide backwards compatibility and avoid excess boilerplate,
let's add a new SCX_OPS_OPEN() macro that encapsulates this for the
caller.

In addition, we add an SCX_HOTPLUG_SQN() macro that can be used to read
the current global sequence number from
/sys/kernel/sched_ext/hotplug_sqn. This is called by SCX_OPS_OPEN() when
running on a kernel with hotplug sqn support.

Signed-off-by: David Vernet <[email protected]>
Now that we have full hotplug sequence number support, as well as the
necessary macros in compat.h, let's extend the hotplug selftest to also
validate that the sequence number can be used to detect hotplug events.

Signed-off-by: David Vernet <[email protected]>
Use a struct instead of u64[2]. This will ease future changes. No functional
changes.
These two functions being inlined ends up bringing out a bunch of other
stuff into kernel/sched/ext.h. Let's uninline them.

- Uninline both and rename scx_notify_sched_tick() to scx_tick() and
  scx_notify_pick_next_task() to scx_next_task_picked(). The notify term is
  a bit unusual and more often used with notifier chains which isn't the
  case here.

- Call scx_tick() while holding rq lock. This doesn't make difference now
  but will ease future changes.

- Move the stuff which was in kernel/sched/ext.h to support the inline
  functions into kernel/sched/ext.c. After this, both ext header files are
  really lean containing only what's needed to integrate with the rest of
  the kernel.

- Some other cosmetic changes.

Other than scx_tick() being called under rq lock. No functional changes.
…rs()

RT, DL, thermal and irq load and utilization metrics need to be decayed and
updated periodically and before consumption to keep the numbers reasonable.
This is currently done from __update_blocked_others() as a part of the fair
class load balance path. Let's factor it out to update_other_load_avgs().
Pure refactor. No functional changes.

This will be used by the new BPF extensible scheduling class to ensure that
the above metrics are properly maintained.

Signed-off-by: Tejun Heo <[email protected]>
…chanisms

Without this, e.g., RT util metric gets stuck high which can mislead the
schedutil cpufreq governor.
sugov_cpu_is_busy() is used to avoid decreasing performance level while the
CPU is busy and called by sugov_update_single_freq() and
sugov_update_single_perf(). Both callers repeat the same pattern to first
test for uclamp and then the business. Let's refactor so that the tests
aren't repeated.

The new helper is named sugov_hold_freq() and tests both the uclamp
exception and CPU business. No functional changes. This will make adding
more exception conditions easier.

Signed-off-by: Tejun Heo <[email protected]>
This gets called on every tick if the CPU is executing an SCX task. e.g. It
can be used to terminate the slice of the current task early.
To monitor the current performance state of each CPU.
This allows the BPF scheduler to request a specific performance level for
each CPU. SCX defaults to max perf if scx_bpf_cpuperf_set() is not called.
scx: Mark sched_ext uclamp enabled
Signed-off-by: David Vernet <[email protected]>
@Byte-Lab Byte-Lab requested a review from htejun April 12, 2024 17:26
@htejun htejun merged commit 760de14 into scx-6.9rc.y Apr 12, 2024
1 check passed
@htejun htejun deleted the scx-6.9-rc3 branch April 12, 2024 17:26
htejun pushed a commit that referenced this pull request May 22, 2024
Byte-Lab pushed a commit that referenced this pull request Jun 21, 2024
[ Upstream commit 769e6a1 ]

ui_browser__show() is capturing the input title that is stack allocated
memory in hist_browser__run().

Avoid a use after return by strdup-ing the string.

Committer notes:

Further explanation from Ian Rogers:

My command line using tui is:
$ sudo bash -c 'rm /tmp/asan.log*; export
ASAN_OPTIONS="log_path=/tmp/asan.log"; /tmp/perf/perf mem record -a
sleep 1; /tmp/perf/perf mem report'
I then go to the perf annotate view and quit. This triggers the asan
error (from the log file):
```
==1254591==ERROR: AddressSanitizer: stack-use-after-return on address
0x7f2813331920 at pc 0x7f28180
65991 bp 0x7fff0a21c750 sp 0x7fff0a21bf10
READ of size 80 at 0x7f2813331920 thread T0
    #0 0x7f2818065990 in __interceptor_strlen
../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:461
    #1 0x7f2817698251 in SLsmg_write_wrapped_string
(/lib/x86_64-linux-gnu/libslang.so.2+0x98251)
    #2 0x7f28176984b9 in SLsmg_write_nstring
(/lib/x86_64-linux-gnu/libslang.so.2+0x984b9)
    #3 0x55c94045b365 in ui_browser__write_nstring ui/browser.c:60
    #4 0x55c94045c558 in __ui_browser__show_title ui/browser.c:266
    #5 0x55c94045c776 in ui_browser__show ui/browser.c:288
    #6 0x55c94045c06d in ui_browser__handle_resize ui/browser.c:206
    #7 0x55c94047979b in do_annotate ui/browsers/hists.c:2458
    #8 0x55c94047fb17 in evsel__hists_browse ui/browsers/hists.c:3412
    #9 0x55c940480a0c in perf_evsel_menu__run ui/browsers/hists.c:3527
    #10 0x55c940481108 in __evlist__tui_browse_hists ui/browsers/hists.c:3613
    #11 0x55c9404813f7 in evlist__tui_browse_hists ui/browsers/hists.c:3661
    #12 0x55c93ffa253f in report__browse_hists tools/perf/builtin-report.c:671
    #13 0x55c93ffa58ca in __cmd_report tools/perf/builtin-report.c:1141
    #14 0x55c93ffaf159 in cmd_report tools/perf/builtin-report.c:1805
    #15 0x55c94000c05c in report_events tools/perf/builtin-mem.c:374
    #16 0x55c94000d96d in cmd_mem tools/perf/builtin-mem.c:516
    #17 0x55c9400e44ee in run_builtin tools/perf/perf.c:350
    #18 0x55c9400e4a5a in handle_internal_command tools/perf/perf.c:403
    #19 0x55c9400e4e22 in run_argv tools/perf/perf.c:447
    #20 0x55c9400e53ad in main tools/perf/perf.c:561
    #21 0x7f28170456c9 in __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58
    #22 0x7f2817045784 in __libc_start_main_impl ../csu/libc-start.c:360
    #23 0x55c93ff544c0 in _start (/tmp/perf/perf+0x19a4c0) (BuildId:
84899b0e8c7d3a3eaa67b2eb35e3d8b2f8cd4c93)

Address 0x7f2813331920 is located in stack of thread T0 at offset 32 in frame
    #0 0x55c94046e85e in hist_browser__run ui/browsers/hists.c:746

  This frame has 1 object(s):
    [32, 192) 'title' (line 747) <== Memory access at offset 32 is
inside this variable
HINT: this may be a false positive if your program uses some custom
stack unwind mechanism, swapcontext or vfork
```
hist_browser__run isn't on the stack so the asan error looks legit.
There's no clean init/exit on struct ui_browser so I may be trading a
use-after-return for a memory leak, but that seems look a good trade
anyway.

Fixes: 05e8b08 ("perf ui browser: Stop using 'self'")
Signed-off-by: Ian Rogers <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Ben Gainey <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kajol Jain <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: K Prateek Nayak <[email protected]>
Cc: Li Dong <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: Paran Lee <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ravi Bangoria <[email protected]>
Cc: Sun Haiyong <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Yanteng Si <[email protected]>
Cc: Yicong Yang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.