Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cut an scx-6.8rc.y branch for 6.8-rc releases #1

Merged
merged 374 commits into from
Jan 23, 2024
Merged
This pull request is big! We’re only showing the most recent 250 commits.

Commits on Aug 1, 2023

  1. scx: Adjust a couple of small things in rusty.bpf.c

    rusty.bpf.c has a few small places where we can improve either the
    formatting of the code, or the logic. In rusty_select_cpu(), we declare
    the idle_smtmask as struct cpumask *, when it could be const. Also, when
    initializing the pcpu_ctx, we're using an actual for-loop instead of
    bpf_for. Let's just fix up these small issues.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Aug 1, 2023
    Configuration menu
    Copy the full SHA
    2d87e47 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #39 from sched-ext/rusty_bpf_adjustments

    scx: Adjust a couple of small things in rusty.bpf.c
    htejun authored Aug 1, 2023
    Configuration menu
    Copy the full SHA
    a0943ea View commit details
    Browse the repository at this point in the history

Commits on Aug 2, 2023

  1. scx: Rename "type" -> "exit_type"

    When used from a bpf scheduler that is launched via libbpf-rs this
    naming runs into issues because "type" is a reserved keyword in Rust.
    
    Signed-off-by: Dan Schatzberg <[email protected]>
    dschatzberg committed Aug 2, 2023
    Configuration menu
    Copy the full SHA
    4c52836 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #41 from dschatzberg/type_rename

    scx: Rename "type" -> "exit_type"
    Byte-Lab authored Aug 2, 2023
    Configuration menu
    Copy the full SHA
    94a5c60 View commit details
    Browse the repository at this point in the history

Commits on Aug 3, 2023

  1. scx: Make cpumask arg to ops.set_cpumask() const

    The struct cpumask * argument to the ops.set_cpumask() op isn't const.
    It doesn't really matter in terms of mutability in a BPF program, but
    let's make it const just because it really is.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Aug 3, 2023
    Configuration menu
    Copy the full SHA
    8b8596e View commit details
    Browse the repository at this point in the history
  2. Merge pull request #42 from sched-ext/struct_cpumask

    scx: Make cpumask arg to ops.set_cpumask() const
    htejun authored Aug 3, 2023
    Configuration menu
    Copy the full SHA
    aeaacb3 View commit details
    Browse the repository at this point in the history

Commits on Aug 8, 2023

  1. scx: Use unsigned long for rq->scx.pnt_seq instead of u64

    Andrea Righi reports that smp_load_acquire() can't be used on u64's on some
    32bit architectures. pnt_seq is used to close a very short race window and
    32bit should be more than enough. Use unsigned long instead of u64.
    htejun committed Aug 8, 2023
    Configuration menu
    Copy the full SHA
    f0fd99d View commit details
    Browse the repository at this point in the history
  2. scx: Allow calling some kfuncs from tracepoints

    Some of the sched_ext kfuncs are fine to call from tracepoints. For
    example, we may want to call scx_bpf_error_bstr() if some error
    condition is detected in a tracepoint rather than a sched_ext ops
    callback. This patch therefore separates the scx_kfunc_ids_any kfunc BTF
    set into two sets: one of which includes kfuncs that can only be called
    from struct_ops, and the other which can be called from both struct_ops
    and tracepoint progs.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Aug 8, 2023
    Configuration menu
    Copy the full SHA
    f2625bf View commit details
    Browse the repository at this point in the history
  3. Merge pull request #43 from sched-ext/kfunc_tracepoints

    scx: Allow calling some kfuncs from tracepoints
    htejun authored Aug 8, 2023
    Configuration menu
    Copy the full SHA
    36d4880 View commit details
    Browse the repository at this point in the history
  4. scx: Use atomic_long_t for scx_nr_rejected instead of atomic64_t

    atomic64_t can be pretty inefficient in 32bit archs and the counter being
    32bit on 32bit arch is fine. Let's use atomic_long_t instead.
    htejun committed Aug 8, 2023
    Configuration menu
    Copy the full SHA
    1d00785 View commit details
    Browse the repository at this point in the history
  5. scx: Make p->scx.ops_state atomic_long_t instead of atomic64_t

    Some 32bit archs can't do 64bit store_release/load_acquire. Use
    atomic_long_t instead.
    htejun committed Aug 8, 2023
    Configuration menu
    Copy the full SHA
    e453cbb View commit details
    Browse the repository at this point in the history
  6. Merge pull request #44 from sched-ext/scx-misc-updates

    Use unsigned longs for atomics
    htejun authored Aug 8, 2023
    Configuration menu
    Copy the full SHA
    35aef07 View commit details
    Browse the repository at this point in the history

Commits on Aug 14, 2023

  1. Configuration menu
    Copy the full SHA
    845aec9 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #20 from inwardvessel/resize_percpu_arrays_in_exam…

    …ples
    
    use resizing of datasec maps in examples
    Byte-Lab authored Aug 14, 2023
    Configuration menu
    Copy the full SHA
    8ade500 View commit details
    Browse the repository at this point in the history

Commits on Aug 30, 2023

  1. scx: bpf_scx_btf_struct_access() should return -EACCES for unknown ac…

    …cesses
    
    The function is currently returning 0 for unknown accesses which means
    allowing writes to anything. Fix the default return value.
    htejun committed Aug 30, 2023
    Configuration menu
    Copy the full SHA
    cb04f56 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #46 from sched-ext/scx-fix-write-all

    scx: bpf_scx_btf_struct_access() should return -EACCES for unknown accesses
    Byte-Lab authored Aug 30, 2023
    Configuration menu
    Copy the full SHA
    2c5e6d3 View commit details
    Browse the repository at this point in the history

Commits on Sep 19, 2023

  1. debug patches and fix

    htejun committed Sep 19, 2023
    Configuration menu
    Copy the full SHA
    d377f5e View commit details
    Browse the repository at this point in the history

Commits on Sep 20, 2023

  1. scx: Fix p->scx.flags corruption due to unsynchronized writes of SCX_…

    …TASK_ON_DSQ_PRIQ
    
    p->scx.flag is protected by the task's rq lock but one of the flags,
    SCX_TASK_ON_DSQ_PRIQ, is protected by p->dsq->lock, not its rq lock. This
    could lead to corruption of p->scx.flags through RMW races triggering
    watchdog and other sanity checks. Fix it moving it to its own flag field
    p->scx.dsq_flags which is protected by the dsq lock.
    htejun committed Sep 20, 2023
    Configuration menu
    Copy the full SHA
    21f4c19 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #47 from sched-ext/scx-fix-flags-corruption

    scx: Fix p->scx.flags corruption due to unsynchronized writes of SCX_TASK_ON_DSQ_PRIQ
    Byte-Lab authored Sep 20, 2023
    Configuration menu
    Copy the full SHA
    ee9077a View commit details
    Browse the repository at this point in the history
  3. xxx

    htejun committed Sep 20, 2023
    Configuration menu
    Copy the full SHA
    8424909 View commit details
    Browse the repository at this point in the history

Commits on Sep 21, 2023

  1. Configuration menu
    Copy the full SHA
    be81498 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #48 from sched-ext/rusty-keep-bpf-o

    scx_rusty: keep .bpf.o files for debugging
    htejun authored Sep 21, 2023
    Configuration menu
    Copy the full SHA
    664d650 View commit details
    Browse the repository at this point in the history

Commits on Sep 22, 2023

  1. Revert "Merge pull request #48 from sched-ext/rusty-keep-bpf-o"

    This reverts commit 664d650, reversing
    changes made to ee9077a.
    htejun committed Sep 22, 2023
    Configuration menu
    Copy the full SHA
    997c450 View commit details
    Browse the repository at this point in the history
  2. scx_rusty: Keep .bpf.o files for debugging

    (cherry picked from commit be81498)
    htejun committed Sep 22, 2023
    Configuration menu
    Copy the full SHA
    258510e View commit details
    Browse the repository at this point in the history
  3. Merge pull request #49 from sched-ext/rusty-keep-bpf-o

    Fix incorrect merge of #48
    htejun authored Sep 22, 2023
    Configuration menu
    Copy the full SHA
    2a3532d View commit details
    Browse the repository at this point in the history

Commits on Sep 26, 2023

  1. rusty: Don't use bpf_cpumask_full() to set task_ctx->all_cpus

    Instead, collect all per-dom cpumasks into all_cpumask and test whether
    that's a subset of a task's cpumask. bpf_cpumask_full() can incorrectly
    indicate that a task's affinity is restricted when it's not depending on the
    machine configuration.
    htejun committed Sep 26, 2023
    Configuration menu
    Copy the full SHA
    c70e7d3 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    b448bbd View commit details
    Browse the repository at this point in the history

Commits on Oct 2, 2023

  1. central: Allow specifying the slice length in the central scheduler

    Researchers at Inria-Paris are experimenting with the central
    scheduler, and want to try setting different slice lengths to see how
    they affect performance for VMs running the NAS benchmarks. Let's make
    this convenient by allowing it to be passed as a parameter from user
    space.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Oct 2, 2023
    Configuration menu
    Copy the full SHA
    f4fd473 View commit details
    Browse the repository at this point in the history

Commits on Oct 3, 2023

  1. Configuration menu
    Copy the full SHA
    8ba6ffd View commit details
    Browse the repository at this point in the history
  2. central: Pin timer callbacks to central CPU

    The scx_central scheduler specifies an infinite slice for all cores
    other than a "central" core where scheduling decisions are made. This
    scheduler currently suffers from the fact that the BPF timer may be
    invoked on a different core than the central scheduler, due to BPF
    timers not supporting being pinned to specific CPUs.
    
    That capability was proposed upstream for BPF in [0]. If and when it
    lands, we would need to invoke bpf_timer_start() from the core that we
    want the timer pinned to, because the API does not support specifying a
    core to have the timer invoked from. To accommodate this, we can
    affinitize the loading thread to the central CPU before loading the
    scheduler, and then pin from there.
    
    [0]: https://lore.kernel.org/bpf/[email protected]/T/
    
    Though the BPF timer pinning feature has not yet landed, we can still
    set the stage for leveraging it by adding the logic to affinitize the
    loading thread to the central CPU. While we won't yet have a guarantee
    that the timer will be pinned to the same core throughout the runtime
    of the scheduler, in practice, it seems that affinitizing in this manner
    does make it very likely regardless. In addition, the user space
    component of the central scheduler doesn't benefit from running on a
    tickless core, so keeping it affinitized to the central CPU avoids it
    from preempting a task on a tickless core that would otherwise benefit
    from less preemption.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Oct 3, 2023
    Configuration menu
    Copy the full SHA
    3e74dbd View commit details
    Browse the repository at this point in the history
  3. scx: Fix typo in tickless comment

    There's a comment that says can_stop_tick_scx(). The function is
    scx_can_stop_tick().
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Oct 3, 2023
    Configuration menu
    Copy the full SHA
    7b1ca19 View commit details
    Browse the repository at this point in the history
  4. Merge pull request #53 from sched-ext/fix_typo

    scx: Fix typo in tickless comment
    htejun authored Oct 3, 2023
    Configuration menu
    Copy the full SHA
    cdbc1a1 View commit details
    Browse the repository at this point in the history

Commits on Oct 4, 2023

  1. Merge pull request #52 from sched-ext/central_cpu_pin

    central: Pin timer callbacks to central CPU
    htejun authored Oct 4, 2023
    Configuration menu
    Copy the full SHA
    b41ddc3 View commit details
    Browse the repository at this point in the history

Commits on Oct 7, 2023

  1. Configuration menu
    Copy the full SHA
    dec4c12 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #54 from sched-ext/bpf-next-merge

    Bpf next merge
    htejun authored Oct 7, 2023
    Configuration menu
    Copy the full SHA
    ff56aee View commit details
    Browse the repository at this point in the history
  3. scx: Add missing piece

    Forgot to git add a small conflict to resolve
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Oct 7, 2023
    Configuration menu
    Copy the full SHA
    101c601 View commit details
    Browse the repository at this point in the history

Commits on Oct 9, 2023

  1. Merge branch 'bpf-master' into bpf-next-merge

    - Includes the latest timer pinning feature
    Byte-Lab committed Oct 9, 2023
    Configuration menu
    Copy the full SHA
    4f595b5 View commit details
    Browse the repository at this point in the history
  2. central: Pin timer to the central CPU

    In commit d6247ec ("bpf: Add ability to pin bpf timer to calling
    CPU"), BPF added the ability to be able to pin a BPF timer to the
    calling CPU. Let's use this capability from the central scheduler.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Oct 9, 2023
    Configuration menu
    Copy the full SHA
    88a818f View commit details
    Browse the repository at this point in the history
  3. Merge pull request #55 from sched-ext/bpf-next-merge

    Bpf next merge
    Byte-Lab authored Oct 9, 2023
    Configuration menu
    Copy the full SHA
    fbac810 View commit details
    Browse the repository at this point in the history

Commits on Oct 11, 2023

  1. scx: Refactor and clean up build system

    The current scx build system is a bit hacky. We put some build artifacts
    in a tools/ directory, and others (skel files and .bpf.o files) we leave
    in the current directory. This isn't conducive to environments that want
    to package sched_ext schedulers. This patch therefore updates the
    Makefile to have the build put all build artifacts (including the
    compiled binaries for the schedulers into an build/ directory
    (previously tools/). All artifacts will be deployed as follows:
    
    build/bin: Compiled binaries (e.g. scx_simple, scx_central, etc)
    build/sbin: Compiled binaries that are used as part of the build
                process, e.g. bpftool
    build/include: Headers that are visible from .c files
    build/obj: Contains object files and libraries that are used as part of
               the build process
    build/obj/bpftool: Build artifacts from compiling bpftool from source
    build/obj/libbpf: Build artifacts from compiling libbpf from source
    build/obj/sched_ext: Build artifacts from compiling and linking BPF
                         programs and their user space counterparts.
    build/release: Build output from Cargo for Rust schedulers
    
    This patch also adds the following enhancement:
    
    - Support for changing the build directory output by specifying the O
      environment variable, as in:
    
    $ O=/tmp/sched_ext make CC=clang LLVM=1 -j
    
    to output all artifacts for that build job to /tmp/sched_ext/build
    
    - Removing code duplication by defining a ccsched make function for
      compiling schedulers, and an $(SCX_COMMON_DEPS) variable for common
      dependencies.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Oct 11, 2023
    Configuration menu
    Copy the full SHA
    3b24175 View commit details
    Browse the repository at this point in the history
  2. scx: Add make install target for installing schedulers

    Another requirement of packaging systems is to be able to install
    compiled schedulers in some reachable PATH endpoint so they can be
    accessed easily. This patch adds a new install target in Make for this,
    which installs the schedulers on the system at /usr/bin. The user also
    has the option of specifying DESTDIR to indicate a prefix of /usr/bin.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Oct 11, 2023
    Configuration menu
    Copy the full SHA
    ac229e5 View commit details
    Browse the repository at this point in the history
  3. scx: Add Make help target for explaining build options

    It's mostly self evident, but now that we support environment variables
    to dictate build behavior, we should document them in a clean and easy
    to consume way.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Oct 11, 2023
    Configuration menu
    Copy the full SHA
    7fc3184 View commit details
    Browse the repository at this point in the history

Commits on Oct 12, 2023

  1. rusty: Support downloading rusty deps in separate build step

    Cargo supports the cargo fetch command to fetch dependencies via the
    network before compiling with cargo build. Let's put it into a separate
    Makefile target so that packaging systems can separate steps that
    require network access from just building.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Oct 12, 2023
    Configuration menu
    Copy the full SHA
    8180b1b View commit details
    Browse the repository at this point in the history
  2. scx: Don't specify nightly rustup as dependency

    We were previously under the impression that the rustup nightly
    toolchain was required to build the schedulers. Daan pointed out in [0]
    that he was able to build with stable, and I similarly was able to build
    with rust stable 1.70.0. Let's update the README accordingly.
    
    [0]: sched-ext/sched_ext#57
    
    We also update the README to not explicitly require compiling the
    schedulers with
    
    $ make LLVM=1 CC=clang
    
    The BPF schedulers are automatically compiled with clang. If you compile
    this way, the user space portions will be compiled with gcc, which is
    fine.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Oct 12, 2023
    Configuration menu
    Copy the full SHA
    8ce9d1e View commit details
    Browse the repository at this point in the history
  3. Merge pull request #60 from sched-ext/rust_nightly

    scx: Don't specify nightly rustup as dependency
    htejun authored Oct 12, 2023
    Configuration menu
    Copy the full SHA
    e23cb83 View commit details
    Browse the repository at this point in the history
  4. Merge pull request #59 from sched-ext/mkosi

    Update and refactor scheduler build system
    htejun authored Oct 12, 2023
    Configuration menu
    Copy the full SHA
    bac7dab View commit details
    Browse the repository at this point in the history

Commits on Oct 16, 2023

  1. rusty: Further tweak build system

    We previously separated the scx_rusty build into two steps -- a step to
    download dependencies, and another to build. That mostly works, except
    that the download-dependencies step is always run before the build step
    as it's a dependency. Even though it doesn't download any cargo
    dependencies, it still accesses the network.
    
    Let's add a way for builders to pass --offline to cargo via a
    CARGO_OFFLINE make variable so that we don't need scx_rusty_deps to be a
    dependency of scx_rusty.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    38ad0e8 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #62 from sched-ext/rusty_offline

    rusty: Further tweak build system
    DaanDeMeyer authored Oct 16, 2023
    Configuration menu
    Copy the full SHA
    52911e1 View commit details
    Browse the repository at this point in the history

Commits on Oct 30, 2023

  1. scx: Improve example schedulers README file

    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Oct 30, 2023
    Configuration menu
    Copy the full SHA
    bd8d7d2 View commit details
    Browse the repository at this point in the history
  2. scx: Add missing build/ entry to .gitignore

    We're missing an entry in .gitignore for the build-generated files when
    building the example schedulers.
    Byte-Lab committed Oct 30, 2023
    Configuration menu
    Copy the full SHA
    e77257c View commit details
    Browse the repository at this point in the history
  3. scx: clean sched_ext example schedulers on root mrproper target

    We've gotten some feedback that it's confusing and/or inconvenient to
    know what needs to be clean built in order to be able to correctly
    compile and run the example schedulers. Let's update the build targets
    to make this simpler by:
    
    1. Always cleaning sched_ext schedulers on make mrproper in the tree
       root
    2. Adding a make fullclean target to the sched_ext tools directory which
       also invokes the root make clean target.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Oct 30, 2023
    Configuration menu
    Copy the full SHA
    3f4b885 View commit details
    Browse the repository at this point in the history
  4. Merge pull request #64 from sched-ext/README

    Update README, and improve build usability
    htejun authored Oct 30, 2023
    Configuration menu
    Copy the full SHA
    9b7423e View commit details
    Browse the repository at this point in the history

Commits on Oct 31, 2023

  1. sched_ext: Add scx_layered

    htejun committed Oct 31, 2023
    Configuration menu
    Copy the full SHA
    2a5eb98 View commit details
    Browse the repository at this point in the history

Commits on Nov 1, 2023

  1. scx_examples: Address the interaction between yield and slice based r…

    …untime calculation
    
    Calculating runtime from the amount consumed from slice punishes
    yield(2)ers. There's nothing fundamentally wrong with it but it doesn't
    align well with how cfs does it and can have unexpected effects on
    applications.
    
    Note the caveat in the example schedulers and switch scx_rusty to use
    timestamp based one.
    htejun committed Nov 1, 2023
    Configuration menu
    Copy the full SHA
    c2f53c8 View commit details
    Browse the repository at this point in the history
  2. scx_rusty: Introduce lookup_task_ctx() and consistently use @TASKC as…

    … task_ctx var name
    htejun committed Nov 1, 2023
    Configuration menu
    Copy the full SHA
    e199c47 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    1b268b0 View commit details
    Browse the repository at this point in the history

Commits on Nov 2, 2023

  1. Configuration menu
    Copy the full SHA
    53f76a9 View commit details
    Browse the repository at this point in the history
  2. selftests/bpf: Convert CHECK macros to ASSERT_* macros in bpf_iter

    As it was pointed out by Yonghong Song [1], in the bpf selftests the use
    of the ASSERT_* series of macros is preferred over the CHECK macro.
    This patch replaces all CHECK calls in bpf_iter with the appropriate
    ASSERT_* macros.
    
    [1] https://lore.kernel.org/lkml/[email protected]
    
    Suggested-by: Yonghong Song <[email protected]>
    Signed-off-by: Yuran Pereira <[email protected]>
    Acked-by: Yonghong Song <[email protected]>
    Acked-by: Kui-Feng Lee <[email protected]>
    Link: https://lore.kernel.org/r/DB3PR10MB6835E9C8DFCA226DD6FEF914E8A3A@DB3PR10MB6835.EURPRD10.PROD.OUTLOOK.COM
    Signed-off-by: Alexei Starovoitov <[email protected]>
    yuranpereira authored and Alexei Starovoitov committed Nov 2, 2023
    Configuration menu
    Copy the full SHA
    ed47cb2 View commit details
    Browse the repository at this point in the history
  3. selftests/bpf: Add malloc failure checks in bpf_iter

    Since some malloc calls in bpf_iter may at times fail,
    this patch adds the appropriate fail checks, and ensures that
    any previously allocated resource is appropriately destroyed
    before returning the function.
    
    Signed-off-by: Yuran Pereira <[email protected]>
    Acked-by: Yonghong Song <[email protected]>
    Acked-by: Kui-Feng Lee <[email protected]>
    Link: https://lore.kernel.org/r/DB3PR10MB6835F0ECA792265FA41FC39BE8A3A@DB3PR10MB6835.EURPRD10.PROD.OUTLOOK.COM
    Signed-off-by: Alexei Starovoitov <[email protected]>
    yuranpereira authored and Alexei Starovoitov committed Nov 2, 2023
    Configuration menu
    Copy the full SHA
    cb3c6a5 View commit details
    Browse the repository at this point in the history
  4. selftests/bpf: fix RELEASE=1 build for tc_opts

    Compiler complains about malloc(). We also don't need to dynamically
    allocate anything, so make the life easier by using statically sized
    buffer.
    
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    anakryiko authored and Alexei Starovoitov committed Nov 2, 2023
    Configuration menu
    Copy the full SHA
    3cda077 View commit details
    Browse the repository at this point in the history
  5. selftests/bpf: satisfy compiler by having explicit return in btf test

    Some compilers complain about get_pprint_mapv_size() not returning value
    in some code paths. Fix with explicit return.
    
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    anakryiko authored and Alexei Starovoitov committed Nov 2, 2023
    Configuration menu
    Copy the full SHA
    7bcc07d View commit details
    Browse the repository at this point in the history
  6. bpf: derive smin/smax from umin/max bounds

    Add smin/smax derivation from appropriate umin/umax values. Previously the
    logic was surprisingly asymmetric, trying to derive umin/umax from smin/smax
    (if possible), but not trying to do the same in the other direction. A simple
    addition to __reg64_deduce_bounds() fixes this.
    
    Added also generic comment about u64/s64 ranges and their relationship.
    Hopefully that helps readers to understand all the bounds deductions
    a bit better.
    
    Acked-by: Eduard Zingerman <[email protected]>
    Acked-by: Shung-Hsi Yu <[email protected]>
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    anakryiko authored and Alexei Starovoitov committed Nov 2, 2023
    Configuration menu
    Copy the full SHA
    2e74aef View commit details
    Browse the repository at this point in the history
  7. bpf: derive smin32/smax32 from umin32/umax32 bounds

    All the logic that applies to u64 vs s64, equally applies for u32 vs s32
    relationships (just taken in a smaller 32-bit numeric space). So do the
    same deduction of smin32/smax32 from umin32/umax32, if we can.
    
    Acked-by: Eduard Zingerman <[email protected]>
    Acked-by: Shung-Hsi Yu <[email protected]>
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    anakryiko authored and Alexei Starovoitov committed Nov 2, 2023
    Configuration menu
    Copy the full SHA
    f188765 View commit details
    Browse the repository at this point in the history
  8. bpf: derive subreg bounds from full bounds when upper 32 bits are con…

    …stant
    
    Comments in code try to explain the idea behind why this is correct.
    Please check the code and comments.
    
    Acked-by: Eduard Zingerman <[email protected]>
    Acked-by: Shung-Hsi Yu <[email protected]>
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    anakryiko authored and Alexei Starovoitov committed Nov 2, 2023
    Configuration menu
    Copy the full SHA
    f404ef3 View commit details
    Browse the repository at this point in the history
  9. bpf: add special smin32/smax32 derivation from 64-bit bounds

    Add a special case where we can derive valid s32 bounds from umin/umax
    or smin/smax by stitching together negative s32 subrange and
    non-negative s32 subrange. That requires upper 32 bits to form a [N, N+1]
    range in u32 domain (taking into account wrap around, so 0xffffffff
    to 0x00000000 is a valid [N, N+1] range in this sense). See code comment
    for concrete examples.
    
    Eduard Zingerman also provided an alternative explanation ([0]) for more
    mathematically inclined readers:
    
    Suppose:
    . there are numbers a, b, c
    . 2**31 <= b < 2**32
    . 0 <= c < 2**31
    . umin = 2**32 * a + b
    . umax = 2**32 * (a + 1) + c
    
    The number of values in the range represented by [umin; umax] is:
    . N = umax - umin + 1 = 2**32 + c - b + 1
    . min(N) = 2**32 + 0 - (2**32-1) + 1 = 2, with b = 2**32-1, c = 0
    . max(N) = 2**32 + (2**31 - 1) - 2**31 + 1 = 2**32, with b = 2**31, c = 2**31-1
    
    Hence [(s32)b; (s32)c] forms a valid range.
    
      [0] https://lore.kernel.org/bpf/[email protected]/
    
    Acked-by: Eduard Zingerman <[email protected]>
    Acked-by: Shung-Hsi Yu <[email protected]>
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    anakryiko authored and Alexei Starovoitov committed Nov 2, 2023
    Configuration menu
    Copy the full SHA
    6533e0a View commit details
    Browse the repository at this point in the history
  10. bpf: improve deduction of 64-bit bounds from 32-bit bounds

    Add a few interesting cases in which we can tighten 64-bit bounds based
    on newly learnt information about 32-bit bounds. E.g., when full u64/s64
    registers are used in BPF program, and then eventually compared as
    u32/s32. The latter comparison doesn't change the value of full
    register, but it does impose new restrictions on possible lower 32 bits
    of such full registers. And we can use that to derive additional full
    register bounds information.
    
    Acked-by: Eduard Zingerman <[email protected]>
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Acked-by: Shung-Hsi Yu <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    anakryiko authored and Alexei Starovoitov committed Nov 2, 2023
    Configuration menu
    Copy the full SHA
    3d6940d View commit details
    Browse the repository at this point in the history
  11. bpf: try harder to deduce register bounds from different numeric domains

    There are cases (caught by subsequent reg_bounds tests in selftests/bpf)
    where performing one round of __reg_deduce_bounds() doesn't propagate
    all the information from, say, s32 to u32 bounds and than from newly
    learned u32 bounds back to u64 and s64. So perform __reg_deduce_bounds()
    twice to make sure such derivations are propagated fully after
    reg_bounds_sync().
    
    One such example is test `(s64)[0xffffffff00000001; 0] (u64)<
    0xffffffff00000000` from selftest patch from this patch set. It demonstrates an
    intricate dance of u64 -> s64 -> u64 -> u32 bounds adjustments, which requires
    two rounds of __reg_deduce_bounds(). Here are corresponding refinement log from
    selftest, showing evolution of knowledge.
    
    REFINING (FALSE R1) (u64)SRC=[0xffffffff00000000; U64_MAX] (u64)DST_OLD=[0; U64_MAX] (u64)DST_NEW=[0xffffffff00000000; U64_MAX]
    REFINING (FALSE R1) (u64)SRC=[0xffffffff00000000; U64_MAX] (s64)DST_OLD=[0xffffffff00000001; 0] (s64)DST_NEW=[0xffffffff00000001; -1]
    REFINING (FALSE R1) (s64)SRC=[0xffffffff00000001; -1] (u64)DST_OLD=[0xffffffff00000000; U64_MAX] (u64)DST_NEW=[0xffffffff00000001; U64_MAX]
    REFINING (FALSE R1) (u64)SRC=[0xffffffff00000001; U64_MAX] (u32)DST_OLD=[0; U32_MAX] (u32)DST_NEW=[1; U32_MAX]
    
    R1 initially has smin/smax set to [0xffffffff00000001; -1], while umin/umax is
    unknown. After (u64)< comparison, in FALSE branch we gain knowledge that
    umin/umax is [0xffffffff00000000; U64_MAX]. That causes smin/smax to learn that
    zero can't happen and upper bound is -1. Then smin/smax is adjusted from
    umin/umax improving lower bound from 0xffffffff00000000 to 0xffffffff00000001.
    And then eventually umin32/umax32 bounds are drived from umin/umax and become
    [1; U32_MAX].
    
    Selftest in the last patch is actually implementing a multi-round fixed-point
    convergence logic, but so far all the tests are handled by two rounds of
    reg_bounds_sync() on the verifier state, so we keep it simple for now.
    
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    anakryiko authored and Alexei Starovoitov committed Nov 2, 2023
    Configuration menu
    Copy the full SHA
    558c06e View commit details
    Browse the repository at this point in the history
  12. bpf: drop knowledge-losing __reg_combine_{32,64}_into_{64,32} logic

    When performing 32-bit conditional operation operating on lower 32 bits
    of a full 64-bit register, register full value isn't changed. We just
    potentially gain new knowledge about that register's lower 32 bits.
    
    Unfortunately, __reg_combine_{32,64}_into_{64,32} logic that
    reg_set_min_max() performs as a last step, can lose information in some
    cases due to __mark_reg64_unbounded() and __reg_assign_32_into_64().
    That's bad and completely unnecessary. Especially __reg_assign_32_into_64()
    looks completely out of place here, because we are not performing
    zero-extending subregister assignment during conditional jump.
    
    So this patch replaced __reg_combine_* with just a normal
    reg_bounds_sync() which will do a proper job of deriving u64/s64 bounds
    from u32/s32, and vice versa (among all other combinations).
    
    __reg_combine_64_into_32() is also used in one more place,
    coerce_reg_to_size(), while handling 1- and 2-byte register loads.
    Looking into this, it seems like besides marking subregister as
    unbounded before performing reg_bounds_sync(), we were also performing
    deduction of smin32/smax32 and umin32/umax32 bounds from respective
    smin/smax and umin/umax bounds. It's now redundant as reg_bounds_sync()
    performs all the same logic more generically (e.g., without unnecessary
    assumption that upper 32 bits of full register should be zero).
    
    Long story short, we remove __reg_combine_64_into_32() completely, and
    coerce_reg_to_size() now only does resetting subreg to unbounded and then
    performing reg_bounds_sync() to recover as much information as possible
    from 64-bit umin/umax and smin/smax bounds, set explicitly in
    coerce_reg_to_size() earlier.
    
    Acked-by: Eduard Zingerman <[email protected]>
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Acked-by: Shung-Hsi Yu <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    anakryiko authored and Alexei Starovoitov committed Nov 2, 2023
    Configuration menu
    Copy the full SHA
    b929d49 View commit details
    Browse the repository at this point in the history
  13. bpf: rename is_branch_taken reg arguments to prepare for the second one

    Just taking mundane refactoring bits out into a separate patch. No
    functional changes.
    
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Acked-by: Shung-Hsi Yu <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    anakryiko authored and Alexei Starovoitov committed Nov 2, 2023
    Configuration menu
    Copy the full SHA
    cdeb5da View commit details
    Browse the repository at this point in the history
  14. bpf: generalize is_branch_taken() to work with two registers

    While still assuming that second register is a constant, generalize
    is_branch_taken-related code to accept two registers instead of register
    plus explicit constant value. This also, as a side effect, allows to
    simplify check_cond_jmp_op() by unifying BPF_K case with BPF_X case, for
    which we use a fake register to represent BPF_K's imm constant as
    a register.
    
    Acked-by: Eduard Zingerman <[email protected]>
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Acked-by: Shung-Hsi Yu <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    anakryiko authored and Alexei Starovoitov committed Nov 2, 2023
    Configuration menu
    Copy the full SHA
    fc3615d View commit details
    Browse the repository at this point in the history
  15. bpf: move is_branch_taken() down

    Move is_branch_taken() slightly down. In subsequent patched we'll need
    both flip_opcode() and is_pkt_ptr_branch_taken() for is_branch_taken(),
    but instead of sprinkling forward declarations around, it makes more
    sense to move is_branch_taken() lower below is_pkt_ptr_branch_taken(),
    and also keep it closer to very tightly related reg_set_min_max(), as
    they are two critical parts of the same SCALAR range tracking logic.
    
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    anakryiko authored and Alexei Starovoitov committed Nov 2, 2023
    Configuration menu
    Copy the full SHA
    dd2a2cc View commit details
    Browse the repository at this point in the history
  16. bpf: generalize is_branch_taken to handle all conditional jumps in on…

    …e place
    
    Make is_branch_taken() a single entry point for branch pruning decision
    making, handling both pointer vs pointer, pointer vs scalar, and scalar
    vs scalar cases in one place. This also nicely cleans up check_cond_jmp_op().
    
    Acked-by: Eduard Zingerman <[email protected]>
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    anakryiko authored and Alexei Starovoitov committed Nov 2, 2023
    Configuration menu
    Copy the full SHA
    171de12 View commit details
    Browse the repository at this point in the history
  17. bpf: unify 32-bit and 64-bit is_branch_taken logic

    Combine 32-bit and 64-bit is_branch_taken logic for SCALAR_VALUE
    registers. It makes it easier to see parallels between two domains
    (32-bit and 64-bit), and makes subsequent refactoring more
    straightforward.
    
    No functional changes.
    
    Acked-by: Eduard Zingerman <[email protected]>
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    anakryiko authored and Alexei Starovoitov committed Nov 2, 2023
    Configuration menu
    Copy the full SHA
    761a9e5 View commit details
    Browse the repository at this point in the history
  18. bpf: prepare reg_set_min_max for second set of registers

    Similarly to is_branch_taken()-related refactorings, start preparing
    reg_set_min_max() to handle more generic case of two non-const
    registers. Start with renaming arguments to accommodate later addition
    of second register as an input argument.
    
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    anakryiko authored and Alexei Starovoitov committed Nov 2, 2023
    Configuration menu
    Copy the full SHA
    4c61728 View commit details
    Browse the repository at this point in the history
  19. bpf: generalize reg_set_min_max() to handle two sets of two registers

    Change reg_set_min_max() to take FALSE/TRUE sets of two registers each,
    instead of assuming that we are always comparing to a constant. For now
    we still assume that right-hand side registers are constants (and make
    sure that's the case by swapping src/dst regs, if necessary), but
    subsequent patches will remove this limitation.
    
    reg_set_min_max() is now called unconditionally for any register
    comparison, so that might include pointer vs pointer. This makes it
    consistent with is_branch_taken() generality. But we currently only
    support adjustments based on SCALAR vs SCALAR comparisons, so
    reg_set_min_max() has to guard itself againts pointers.
    
    Taking two by two registers allows to further unify and simplify
    check_cond_jmp_op() logic. We utilize fake register for BPF_K
    conditional jump case, just like with is_branch_taken() part.
    
    Acked-by: Eduard Zingerman <[email protected]>
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    anakryiko authored and Alexei Starovoitov committed Nov 2, 2023
    Configuration menu
    Copy the full SHA
    9a14d62 View commit details
    Browse the repository at this point in the history
  20. Merge branch 'bpf-register-bounds-logic-and-testing-improvements'

    Andrii Nakryiko says:
    
    ====================
    BPF register bounds logic and testing improvements
    
    This patch set adds a big set of manual and auto-generated test cases
    validating BPF verifier's register bounds tracking and deduction logic. See
    details in the last patch.
    
    We start with building a tester that validates existing <range> vs <scalar>
    verifier logic for range bounds. To make all this work, BPF verifier's logic
    needed a bunch of improvements to handle some cases that previously were not
    covered. This had no implications as to correctness of verifier logic, but it
    was incomplete enough to cause significant disagreements with alternative
    implementation of register bounds logic that tests in this patch set
    implement. So we need BPF verifier logic improvements to make all the tests
    pass. This is what we do in patches #3 through #9.
    
    The end goal of this work, though, is to extend BPF verifier range state
    tracking such as to allow to derive new range bounds when comparing non-const
    registers. There is some more investigative work required to investigate and
    fix existing potential issues with range tracking as part of ALU/ALU64
    operations, so <range> x <range> part of v5 patch set ([0]) is dropped until
    these issues are sorted out.
    
    For now, we include preparatory refactorings and clean ups, that set up BPF
    verifier code base to extend the logic to <range> vs <range> logic in
    subsequent patch set. Patches #10-#16 perform preliminary refactorings without
    functionally changing anything. But they do clean up check_cond_jmp_op() logic
    and generalize a bunch of other pieces in is_branch_taken() logic.
    
      [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=797178&state=*
    
    v5->v6:
      - dropped <range> vs <range> patches (original patches #18 through #23) to
        add more register range sanity checks and fix preexisting issues;
      - comments improvements, addressing other feedback on first 17 patches
        (Eduard, Alexei);
    v4->v5:
      - added entirety of verifier reg bounds tracking changes, now handling
        <range> vs <range> cases (Alexei);
      - added way more comments trying to explain why deductions added are
        correct, hopefully they are useful and clarify things a bit (Daniel,
        Shung-Hsi);
      - added two preliminary selftests fixes necessary for RELEASE=1 build to
        work again, it keeps breaking.
    v3->v4:
      - improvements to reg_bounds tester (progress report, split 32-bit and
        64-bit ranges, fix various verbosity output issues, etc);
    v2->v3:
      - fix a subtle little-endianness assumption inside parge_reg_state() (CI);
    v1->v2:
      - fix compilation when building selftests with llvm-16 toolchain (CI).
    ====================
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Alexei Starovoitov <[email protected]>
    Alexei Starovoitov committed Nov 2, 2023
    Configuration menu
    Copy the full SHA
    e68ed64 View commit details
    Browse the repository at this point in the history
  21. selftests/bpf: Use value with enough-size when updating per-cpu map

    When updating per-cpu map in map_percpu_stats test, patch_map_thread()
    only passes 4-bytes-sized value to bpf_map_update_elem(). The expected
    size of the value is 8 * num_possible_cpus(), so fix it by passing a
    value with enough-size for per-cpu map update.
    
    Signed-off-by: Hou Tao <[email protected]>
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Link: https://lore.kernel.org/bpf/[email protected]
    Hou Tao authored and anakryiko committed Nov 2, 2023
    Configuration menu
    Copy the full SHA
    3f1f234 View commit details
    Browse the repository at this point in the history
  22. selftests/bpf: Export map_update_retriable()

    Export map_update_retriable() to make it usable for other map_test
    cases. These cases may only need retry for specific errno, so add
    a new callback parameter to let map_update_retriable() decide whether or
    not the errno is retriable.
    
    Signed-off-by: Hou Tao <[email protected]>
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Link: https://lore.kernel.org/bpf/[email protected]
    Hou Tao authored and anakryiko committed Nov 2, 2023
    Configuration menu
    Copy the full SHA
    ff38534 View commit details
    Browse the repository at this point in the history
  23. selftsets/bpf: Retry map update for non-preallocated per-cpu map

    BPF CI failed due to map_percpu_stats_percpu_hash from time to time [1].
    It seems that the failure reason is per-cpu bpf memory allocator may not
    be able to allocate per-cpu pointer successfully and it can not refill
    free llist timely, and bpf_map_update_elem() will return -ENOMEM.
    
    So mitigate the problem by retrying the update operation for
    non-preallocated per-cpu map.
    
    [1]: https://github.com/kernel-patches/bpf/actions/runs/6713177520/job/18244865326?pr=5909
    
    Signed-off-by: Hou Tao <[email protected]>
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Link: https://lore.kernel.org/bpf/[email protected]
    Hou Tao authored and anakryiko committed Nov 2, 2023
    Configuration menu
    Copy the full SHA
    57688b2 View commit details
    Browse the repository at this point in the history
  24. Merge branch 'selftests/bpf: Fixes for map_percpu_stats test'

    Hou Tao says:
    
    ====================
    List-Subscribe: <mailto:[email protected]>
    List-Unsubscribe: <mailto:[email protected]>
    MIME-Version: 1.0
    X-CM-TRANSID: gCh0CgCHHt6+xEFlSpMJEg--.58519S4
    X-Coremail-Antispam: 1UD129KBjvJXoW7Jr1UJrW3JFyUXw4rJrWxCrg_yoW8JrW5pF
    	WrK3WrKrZ7tryaqw13tanrW3yrtrs5W3WjkF13tr4YvF1UJ34xKr48KF1jgrZxCrZYqr1a
    	yay8tF1xWa1xZrUanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2
    	9KBjDU0xBIdaVrnRJUUUk2b4IE77IF4wAFF20E14v26r4j6ryUM7CY07I20VC2zVCF04k2
    	6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4
    	vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7Cj
    	xVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x
    	0267AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG
    	6I80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFV
    	Cjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JM4IIrI8v6xkF7I0E8cxan2IY04v7MxAIw28I
    	cxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2
    	IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUtVW8ZwCIc40Y0x0EwIxGrwCI
    	42IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8JVWxJwCI42
    	IY6xAIw20EY4v20xvaj40_WFyUJVCq3wCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E
    	87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjxUrR6zUUUUU
    X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/
    X-Patchwork-Delegate: [email protected]
    
    From: Hou Tao <[email protected]>
    
    Hi,
    
    BPF CI failed due to map_percpu_stats_percpu_hash from time to time [1].
    It seems that the failure reason is per-cpu bpf memory allocator may not
    be able to allocate per-cpu pointer successfully and it can not refill
    free llist timely, and bpf_map_update_elem() will return -ENOMEM.
    
    Patch #1 fixes the size of value passed to per-cpu map update API. The
    problem was found when fixing the ENOMEM problem, so also post it in
    this patchset. Patch #2 & #3 mitigates the ENOMEM problem by retrying
    the update operation for non-preallocated per-cpu map.
    
    Please see individual patches for more details. And comments are always
    welcome.
    
    Regards,
    Tao
    
    [1]: https://github.com/kernel-patches/bpf/actions/runs/6713177520/job/18244865326?pr=5909
    ====================
    
    Signed-off-by: Andrii Nakryiko <[email protected]>
    anakryiko committed Nov 2, 2023
    Configuration menu
    Copy the full SHA
    e869ffc View commit details
    Browse the repository at this point in the history
  25. selftests/bpf: Consolidate VIRTIO/9P configs in config.vm file

    Those configs are needed to be able to run VM somewhat consistently.
    For instance, ATM, s390x is missing the `CONFIG_VIRTIO_CONSOLE` which
    prevents s390x kernels built in CI to leverage qemu-guest-agent.
    
    By moving them to `config,vm`, we should have selftest kernels which are
    equal in term of VM functionalities when they include this file.
    
    The set of config unabled were picked using
    
        grep -h -E '(_9P|_VIRTIO)' config.x86_64 config | sort | uniq
    
    added to `config.vm` and then
        grep -vE '(_9P|_VIRTIO)' config.{x86_64,aarch64,s390x}
    
    as a side-effect, some config may have disappeared to the aarch64 and
    s390x kernels, but they should not be needed. CI will tell.
    
    Signed-off-by: Manu Bretelle <[email protected]>
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Link: https://lore.kernel.org/bpf/[email protected]
    chantra authored and anakryiko committed Nov 2, 2023
    Configuration menu
    Copy the full SHA
    1a119e2 View commit details
    Browse the repository at this point in the history
  26. Configuration menu
    Copy the full SHA
    cf37d0a View commit details
    Browse the repository at this point in the history
  27. Configuration menu
    Copy the full SHA
    7dc6a8b View commit details
    Browse the repository at this point in the history

Commits on Nov 3, 2023

  1. scx: Fix skel and .bpf.o Make deps

    With the recent Makefile refactor that puts all build artifacts into a
    build/ directory output, there was a regression in that Make would now
    always rebuild schedulers even if they were unchanged. This is happening
    because when Make looks at a target, it looks to see if that file
    exists. If it doesn't, it executes the target. There are a few targets
    that are improperly tracked:
    
    1. We were taking a dependency on the sched.skel.h target (e.g.
       scx_simple.skel.h). In the old build system this was an actual file,
       but now it's just a target as the target name was never updated to
       point to the full path to the include file output.
    
    2. The same goes for sched.bpf.o, which is a dependency of the skel
       file.
    
    3. The scheduler itself, which now resides in build/bin.
    
    The first two we can fix by updating the targets to include the build
    directories. The latter we'll have to fix with some more complex Make
    magic, which we'll do in the subsequent commit.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Nov 3, 2023
    Configuration menu
    Copy the full SHA
    298bec1 View commit details
    Browse the repository at this point in the history
  2. scx: Don't rebuild schedulers unnecessarily

    Now that the scheduler binaries are written to the build/bin/ directory,
    Make gets confused because it doesn't see the binary file in the same
    directory anymore and tries to rebuild it. This makes things kind of
    tricky, because make will always execute the recipe for the target,
    which is to compile it.
    
    We could add a layer of indirection by instead having the base scheduler
    target be empty, and just take a dependency on the actual binary that's
    created the compiler. This patch does that, and also cleans up the build
    to avoid copy-pasted scheduler recipes.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Nov 3, 2023
    Configuration menu
    Copy the full SHA
    62e2315 View commit details
    Browse the repository at this point in the history
  3. scx: Aggregate build logic for rust schedulers

    scx_rusty currently defines several build targets and recipes that would
    have to be duplicated by any other rust scheduler we may add. Let's add
    some build scaffolding to avoid people having to copy paste.
    
    Note that we can't fully avoid running any make logic if we take the
    same approach as with the C schedulers. The C schedulers add a layer of
    indirection where the "base" target (e.g. scx_simple) do nothing but
    take a dependency on the binary output file. This doesn't work with rust
    schedulers, because we're relying on Cargo to tell us when it needs to
    be rebuilt.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Nov 3, 2023
    Configuration menu
    Copy the full SHA
    2c76843 View commit details
    Browse the repository at this point in the history
  4. bpftool: Fix prog object type in manpage

    bpftool's man page lists "program" as one of possible values for OBJECT,
    while in fact bpftool accepts "prog" instead.
    
    Reported-by: Jerry Snitselaar <[email protected]>
    Signed-off-by: Artem Savkov <[email protected]>
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Acked-by: Yonghong Song <[email protected]>
    Acked-by: Quentin Monnet <[email protected]>
    Link: https://lore.kernel.org/bpf/[email protected]
    sm00th authored and anakryiko committed Nov 3, 2023
    Configuration menu
    Copy the full SHA
    b94df28 View commit details
    Browse the repository at this point in the history
  5. Merge pull request #67 from sched-ext/make_deps

    Fix Makefile dependency tracking
    htejun authored Nov 3, 2023
    Configuration menu
    Copy the full SHA
    41728bb View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    58e2a66 View commit details
    Browse the repository at this point in the history
  7. scx_rusty: ravg WIP

    htejun committed Nov 3, 2023
    Configuration menu
    Copy the full SHA
    a4fbd6f View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    b24bc9b View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    d401cf1 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    fbf0ccf View commit details
    Browse the repository at this point in the history

Commits on Nov 4, 2023

  1. rusty: Fully switch to ravg

    htejun committed Nov 4, 2023
    Configuration menu
    Copy the full SHA
    8895ddd View commit details
    Browse the repository at this point in the history
  2. ravg: Fix ravg_transfer()

    htejun committed Nov 4, 2023
    Configuration menu
    Copy the full SHA
    ca211c6 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    8111b6e View commit details
    Browse the repository at this point in the history
  4. scx_rusty: Minor cleanup

    htejun committed Nov 4, 2023
    Configuration menu
    Copy the full SHA
    46f07fa View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    f244d5e View commit details
    Browse the repository at this point in the history
  6. Merge pull request #68 from sched-ext/scx-cleanups

    Misc example scheduler cleanups
    Byte-Lab authored Nov 4, 2023
    Configuration menu
    Copy the full SHA
    d6a788a View commit details
    Browse the repository at this point in the history

Commits on Nov 5, 2023

  1. sched_ext: Test sched_class directly in scx_task_iter_next_filtered()

    scx_task_iter_next_filtered() is used to iterate all non-idle tasks in the
    init and exit paths. Idle tasks are determined using is_idle_task().
    Unfortunately, cff9b23 ("kernel/sched: Modify initial boot task idle
    setup") changed idle task initialization so that %PF_IDLE is set during CPU
    startup. So, CPUs that are not brought up during boot (such as CPUs which
    can never be online in some AMD processors) don't have the flag set and thus
    fails is_idle_task() test.
    
    This makes sched_ext incorrectly try to operate on idle tasks in init/exit
    paths leading to oopses. Fix it by directly testing p->sched_class against
    idle_sched_class.
    htejun committed Nov 5, 2023
    Configuration menu
    Copy the full SHA
    a60668f View commit details
    Browse the repository at this point in the history

Commits on Nov 6, 2023

  1. Merge pull request #73 from sched-ext/scx-fix-crash

    Fix sched_ext crashes on v6.6
    Byte-Lab authored Nov 6, 2023
    Configuration menu
    Copy the full SHA
    21777bc View commit details
    Browse the repository at this point in the history
  2. selftests/bpf: Disable CONFIG_DEBUG_INFO_REDUCED in config.aarch64

    Building an arm64 kernel and seftests/bpf with defconfig +
    selftests/bpf/config and selftests/bpf/config.aarch64 the fragment
    CONFIG_DEBUG_INFO_REDUCED is enabled in arm64's defconfig, it should be
    disabled in file sefltests/bpf/config.aarch64 since if its not disabled
    CONFIG_DEBUG_INFO_BTF wont be enabled.
    
    Signed-off-by: Anders Roxell <[email protected]>
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Link: https://lore.kernel.org/bpf/[email protected]
    roxell authored and anakryiko committed Nov 6, 2023
    Configuration menu
    Copy the full SHA
    dfee93e View commit details
    Browse the repository at this point in the history
  3. bpf, lpm: Fix check prefixlen before walking trie

    When looking up an element in LPM trie, the condition 'matchlen ==
    trie->max_prefixlen' will never return true, if key->prefixlen is larger
    than trie->max_prefixlen. Consequently all elements in the LPM trie will
    be visited and no element is returned in the end.
    
    To resolve this, check key->prefixlen first before walking the LPM trie.
    
    Fixes: b95a5c4 ("bpf: add a longest prefix match trie map implementation")
    Signed-off-by: Florian Lehner <[email protected]>
    Signed-off-by: Andrii Nakryiko <[email protected]>
    Link: https://lore.kernel.org/bpf/[email protected]
    florianl authored and anakryiko committed Nov 6, 2023
    Configuration menu
    Copy the full SHA
    856624f View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    53cb301 View commit details
    Browse the repository at this point in the history
  5. Merge pull request #70 from sched-ext/scx_rusty-ravg

    scx_rusty: Usage ravg for dom and task loads
    htejun authored Nov 6, 2023
    Configuration menu
    Copy the full SHA
    90e1ad1 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    f599483 View commit details
    Browse the repository at this point in the history
  7. Merge pull request #75 from sched-ext/scx-fix-rust_sched_deps

    tools/sched_ext/Makefile: Don't hard code scx_rusty in rust-sched _deps target
    Byte-Lab authored Nov 6, 2023
    Configuration menu
    Copy the full SHA
    eff9487 View commit details
    Browse the repository at this point in the history
  8. scx_common: Improve MEMBER_VPTR()

    So that it can be used on deref'd pointers to structs.
    htejun committed Nov 6, 2023
    Configuration menu
    Copy the full SHA
    1dff6ea View commit details
    Browse the repository at this point in the history
  9. Merge pull request #76 from sched-ext/scx-update-MEMBER_VPTR

    scx_common: Improve MEMBER_VPTR()
    Byte-Lab authored Nov 6, 2023
    Configuration menu
    Copy the full SHA
    a281119 View commit details
    Browse the repository at this point in the history

Commits on Nov 7, 2023

  1. scx: Fix !CONFIG_SCHED_CLASS_EXT builds

    cpu_local_stat_show() expects CONFIG_SCHED_CLASS_EXT or
    CONFIG_RT_GROUP_SCHED.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Nov 7, 2023
    Configuration menu
    Copy the full SHA
    1d773bd View commit details
    Browse the repository at this point in the history
  2. Merge pull request #77 from sched-ext/fix_notext_build

    scx: Fix !CONFIG_SCHED_CLASS_EXT builds
    htejun authored Nov 7, 2023
    Configuration menu
    Copy the full SHA
    fdee025 View commit details
    Browse the repository at this point in the history
  3. scx: Print scx info when dumping stack

    It would be useful to see what the sched_ext scheduler state is, and
    what scheduler is running, when we're dumping a task's stack. This patch
    therefore adds a new print_scx_info() function that's called in the same
    context as print_worker_info() and print_stop_info().
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Nov 7, 2023
    Configuration menu
    Copy the full SHA
    54d303d View commit details
    Browse the repository at this point in the history
  4. Merge pull request #66 from sched-ext/panic_msg

    scx: Print scheduler state in panic message
    htejun authored Nov 7, 2023
    Configuration menu
    Copy the full SHA
    ee4efa7 View commit details
    Browse the repository at this point in the history
  5. scx_common: Add message to _Static_assert in MEMBER_VPTR

    _Static_assert() without message is a later extension and can fail
    compilation depending on compile flag.
    htejun committed Nov 7, 2023
    Configuration menu
    Copy the full SHA
    5c2f39c View commit details
    Browse the repository at this point in the history
  6. tools/sched_ext/ravg: Separate out ravg_read.rs.h and update build deps

    We want to use rust ravg_read() in other implementations too. Separate out
    it into a .h file and include it. Note that it also needs to take the inputs
    in scalar types as the ravg_data types aren't considered the same across
    different skel's. This can also be a module but for now let's keep it an
    include file so that it can be copied elsewhere together with the BPF header
    files.
    
    While at it, make BPF builds depend on ravg[_impl].bpf.h. cargo does the
    right thing without further instructions.
    htejun committed Nov 7, 2023
    Configuration menu
    Copy the full SHA
    a32fa87 View commit details
    Browse the repository at this point in the history
  7. scx_rusty: Misc update

    htejun committed Nov 7, 2023
    Configuration menu
    Copy the full SHA
    e322e56 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    8619d7f View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    e00a136 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    d30e64d View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    687fe29 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    ecbff41 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    1ad52c7 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    42a1f1f View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    d70e209 View commit details
    Browse the repository at this point in the history
  16. scx_layered: Cleanups

    htejun committed Nov 7, 2023
    Configuration menu
    Copy the full SHA
    9695b05 View commit details
    Browse the repository at this point in the history
  17. Merge pull request #78 from sched-ext/scx-misc-updates

    Misc updates to let scx_layered share ravg with rusty
    htejun authored Nov 7, 2023
    Configuration menu
    Copy the full SHA
    5a685d4 View commit details
    Browse the repository at this point in the history
  18. Merge pull request #65 from sched-ext/add-scx_layered

    sched_ext: Add scx_layered
    htejun authored Nov 7, 2023
    Configuration menu
    Copy the full SHA
    924c005 View commit details
    Browse the repository at this point in the history
  19. Configuration menu
    Copy the full SHA
    665658c View commit details
    Browse the repository at this point in the history
  20. Configuration menu
    Copy the full SHA
    d0be8b2 View commit details
    Browse the repository at this point in the history
  21. Merge pull request #79 from sched-ext/scx-pull-bpf

    Pull bpf/for-next.
    Byte-Lab authored Nov 7, 2023
    Configuration menu
    Copy the full SHA
    0bd6f76 View commit details
    Browse the repository at this point in the history

Commits on Nov 8, 2023

  1. scx: CGROUP_WEIGHT_* should be outside CONFIG_CGROUPS

    sched_ext needs these consts even when !CGROUPS. They got accidentally moved
    back inside CONFIG_CGROUPS through merge resolution.
    htejun committed Nov 8, 2023
    Configuration menu
    Copy the full SHA
    9dae233 View commit details
    Browse the repository at this point in the history
  2. scx: cpu_local_stat_show() doesn't have dependency on RT_GROUP_SCHED …

    …or EXT_GROUP_SCHED
    
    This was incorrectly fixed after an errant merge resolution. Fix it back.
    htejun committed Nov 8, 2023
    Configuration menu
    Copy the full SHA
    7410ecc View commit details
    Browse the repository at this point in the history
  3. scx: Kill stray check_preempt_cur() prototype

    Merge artifact.
    htejun committed Nov 8, 2023
    Configuration menu
    Copy the full SHA
    08e09f3 View commit details
    Browse the repository at this point in the history
  4. scx: s/scx_exit_type/scx_exit_kind/ s/scx_exit_info\.type/scx_exit_in…

    …fo\.kind/
    
    These are accessed from userspace and "type" is a reserved token in many
    modern languages. Let's use "kind" instead.
    htejun committed Nov 8, 2023
    Configuration menu
    Copy the full SHA
    7a001f5 View commit details
    Browse the repository at this point in the history
  5. scx: tools/sched_ext/Makefile updates

    * Remove duplicate target lists. c-sched-targets and rust-sched-targets are
      the source of truth now.
    
    * Drop fullclean target. It's unexpected and unnecessary to have a target
      which steps up and cleans.
    
    * Minor formatting updates.
    htejun committed Nov 8, 2023
    Configuration menu
    Copy the full SHA
    2c21348 View commit details
    Browse the repository at this point in the history
  6. scx: Reorder tools/sched_ext/README.md

    To match patch / Makefile order.
    htejun committed Nov 8, 2023
    Configuration menu
    Copy the full SHA
    dde311c View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    2e58977 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    0b2403f View commit details
    Browse the repository at this point in the history
  9. scx: whitespace update

    htejun committed Nov 8, 2023
    Configuration menu
    Copy the full SHA
    39b906e View commit details
    Browse the repository at this point in the history
  10. Merge pull request #80 from sched-ext/scx-cleanups-from-split

    Scx cleanups from split
    htejun authored Nov 8, 2023
    Configuration menu
    Copy the full SHA
    607afb6 View commit details
    Browse the repository at this point in the history
  11. scx_rusty: doc comment update

    htejun committed Nov 8, 2023
    Configuration menu
    Copy the full SHA
    725cfa3 View commit details
    Browse the repository at this point in the history
  12. Merge pull request #81 from sched-ext/scx-cleanups-from-split

    scx_rusty: doc comment update
    htejun authored Nov 8, 2023
    Configuration menu
    Copy the full SHA
    c818dc5 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    ea98edf View commit details
    Browse the repository at this point in the history
  14. Merge pull request #82 from sched-ext/scx-cleanups-from-split

    scx: Update print_scx_info() comment
    htejun authored Nov 8, 2023
    Configuration menu
    Copy the full SHA
    9a64d87 View commit details
    Browse the repository at this point in the history

Commits on Nov 10, 2023

  1. scx: Update print_scx_info()

    - p->scx.runnable_at is in jiffies and rq->clock is in ktime ns. Subtracting
      the two doesn't yield anything useful. Also, it's more intuitive for
      negative delta to represent past. Fix delta calculation.
    
    - ops_state is always 0 for running tasks. Let's skip it for now.
    
    - Use return value from copy_from_kernel_nofault() to determine whether the
      read was successful and clearly report read failures.
    
    - scx_enabled() is always nested inside scx_ops_enable_state() != DISABLED.
      Let's just test the latter.
    htejun committed Nov 10, 2023
    Configuration menu
    Copy the full SHA
    f23fbab View commit details
    Browse the repository at this point in the history
  2. Merge pull request #83 from sched-ext/scx_print_info-updates

    scx: Update print_scx_info()
    htejun authored Nov 10, 2023
    Configuration menu
    Copy the full SHA
    b0d2ae0 View commit details
    Browse the repository at this point in the history

Commits on Nov 14, 2023

  1. Configuration menu
    Copy the full SHA
    b7e1419 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #84 from sched-ext/rusty-doc-update

    rusty: Improve overview documentation as suggested by Josh Don
    htejun authored Nov 14, 2023
    Configuration menu
    Copy the full SHA
    1d88c4a View commit details
    Browse the repository at this point in the history
  3. scx: Move scx_ops_enable_state_str[] outside CONFIG_SCHED_DEBUG

    The new print_scx_info() uses scx_ops_enable_state_str[] outside
    CONFIG_SCHED_DEBUG. Let's relocated it outside of CONFIG_SCHED_DEBUG and to
    the top.
    
    Reported-by: Changwoo Min <[email protected]>
    Reported-by: Andrea Righi <[email protected]>
    Signed-off-by: Tejun Heo <[email protected]>
    htejun committed Nov 14, 2023
    Configuration menu
    Copy the full SHA
    ca712f8 View commit details
    Browse the repository at this point in the history
  4. Merge pull request #85 from sched-ext/misc-fixes

    scx: Move scx_ops_enable_state_str[] outside CONFIG_SCHED_DEBUG
    htejun authored Nov 14, 2023
    Configuration menu
    Copy the full SHA
    e69323c View commit details
    Browse the repository at this point in the history

Commits on Nov 25, 2023

  1. Configuration menu
    Copy the full SHA
    6b245e8 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #87 from sched-ext/atomic_long-fix

    scx: Fix a straggling atomic64_set
    htejun authored Nov 25, 2023
    Configuration menu
    Copy the full SHA
    df9ef4e View commit details
    Browse the repository at this point in the history

Commits on Nov 28, 2023

  1. scx: Use .bpf.[sub]skel.h suffix instead of .[sub]skel.h when buildin…

    …g schedulers
    
    This is to make life easier for the user sched/tools repo which uses meson
    to build.
    htejun committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    70331a6 View commit details
    Browse the repository at this point in the history
  2. scx: Add s/uSIZE typedefs in scx_common.h

    The availability of s/uSIZE types are hit and miss. Let's always define them
    in terms of stdint types. This makes life easier for the scx user repo.
    htejun committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    7a1c90f View commit details
    Browse the repository at this point in the history
  3. Merge pull request #88 from sched-ext/misc-updates

    Misc updates for example schedulers to make life easier for user sched repo
    Byte-Lab authored Nov 28, 2023
    Configuration menu
    Copy the full SHA
    48b4554 View commit details
    Browse the repository at this point in the history
  4. scx_{rusty|layered}: Generate skel file in $OUT_DIR

    Currently, skel files are put in src/bpf/.output. Place it inside $OUT_DIR
    where build artifacts belong.
    htejun committed Nov 28, 2023
    Configuration menu
    Copy the full SHA
    bc7c2af View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    1d9acf6 View commit details
    Browse the repository at this point in the history

Commits on Nov 29, 2023

  1. scx_{rusty|layered}: Make naming and build consistent between the two…

    … rust userland schedulers
    
    - NAME_sys and NAME was used to refer to rust wrapper of the
      bindgen-generated header file and the bpf skeleton, respectively. The NAME
      part is self-referential and thus doesn't really signify anything and _sys
      suffix is arbitrary too. Let's use bpf_intf and bpf_skel instead.
    
    - The env vars that are used during build are a bit unusual and the
      SCX_RUST_CLANG name is a bit confusing as it doesn't indicate it's for
      compiling BPF. Let's use the names BPF_CLANG and BPF_CFLAGS instead.
    
    - build.rs is now identical between the two schedulers.
    htejun committed Nov 29, 2023
    Configuration menu
    Copy the full SHA
    2e2daa7 View commit details
    Browse the repository at this point in the history
  2. scx_{rusty|layered}: Run bindgen's clang with CLANG_CFLAGS and remove…

    … explicit paths from includes
    
    So that build env can decide where to put these headers.
    htejun committed Nov 29, 2023
    Configuration menu
    Copy the full SHA
    2d46bf9 View commit details
    Browse the repository at this point in the history
  3. scx_{rusty|layered}: Factor out build.rs's into scx_utils::build_helpers

    This greatly simplifies build.rs and allows building more common logic into
    build_helpers such as discovering BPF_CFLAGS on its own without depending on
    upper level Makefile. Some caveats:
    
    - Dropped static libbpf-sys dep. scx_utils is out of kernel tree and pulls
      in libbpf-sys through libbpf-cargo which conflicts with the explicit
      libbpf-sys dependency. This means that we use packaged version of
      libbpf-cargo for skel generation. Should be fine.
    
    - Path dependency for scx_utils is temporary during development. Should be
      dropped later.
    htejun committed Nov 29, 2023
    Configuration menu
    Copy the full SHA
    65d1b96 View commit details
    Browse the repository at this point in the history

Commits on Nov 30, 2023

  1. Configuration menu
    Copy the full SHA
    df7ea88 View commit details
    Browse the repository at this point in the history

Commits on Dec 3, 2023

  1. Configuration menu
    Copy the full SHA
    5f200bb View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    47c9356 View commit details
    Browse the repository at this point in the history

Commits on Dec 4, 2023

  1. Configuration menu
    Copy the full SHA
    d6bd20a View commit details
    Browse the repository at this point in the history
  2. Merge pull request #89 from sched-ext/misc-updates

    scx: Common include files relocated and more build updates
    Byte-Lab authored Dec 4, 2023
    Configuration menu
    Copy the full SHA
    f0566ba View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    234eb2c View commit details
    Browse the repository at this point in the history
  4. Merge pull request #91 from sched-ext/scx-sync

    scx_sync: Sync scheduler changes from https://github.com/sched-ext/scx
    htejun authored Dec 4, 2023
    Configuration menu
    Copy the full SHA
    61ce4fe View commit details
    Browse the repository at this point in the history
  5. scx: Disable vtime ordering for internal DSQs

    Internal DSQs, i.e. SCX_DSQ_LOCAL and SCX_DSQ_GLOBAL, have somewhat
    special behavior in that they're automatically consumed by the internal
    ext.c logic. A user could therefore accidentally starve tasks on either
    of the DSQs if they dispatch to both the vtime and FIFO queues, as
    they're consumed in a specific order by the internal logic. It likely
    doesn't make sense to ever use both FIFO and PRIQ ordering in the same
    DSQ, so let's explicitly disable it for the internal DSQs. In a
    follow-on change, we'll error out a scheduler if a user dispatches to
    both FIFO and vtime for any DSQ.
    
    Reported-by: Changwoo Min <[email protected]>
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Dec 4, 2023
    Configuration menu
    Copy the full SHA
    25a5d10 View commit details
    Browse the repository at this point in the history
  6. scx: Enforce either/or usage of DSQ FIFO/PRIQ dispatching

    Currently, a user can do both FIFO and PRIQ dispatching to a DSQ. This
    can result in non-intuitive behavior. For example, if a user
    PRIQ-dispatches to a DSQ, and then subsequently FIFO dispatches, an
    scx_bpf_consume() operation will always favor the FIFO-dispatched task.
    While we could add something like an scx_bpf_consume_vtime() kfunc,
    given that there's not a clear use-case for doing both types of
    dispatching in a single DSQ, for now we'll elect to just enforce that
    only a single type is being used at any given time.
    
    Reported-by: Changwoo Min <[email protected]>
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Dec 4, 2023
    Configuration menu
    Copy the full SHA
    346fd9d View commit details
    Browse the repository at this point in the history

Commits on Dec 5, 2023

  1. Merge pull request #92 from sched-ext/internal_priq

    Change semantics of FIFO/PRIQ dispatching
    Byte-Lab authored Dec 5, 2023
    Configuration menu
    Copy the full SHA
    4d61801 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    03b9a1f View commit details
    Browse the repository at this point in the history
  3. Merge pull request #93 from sched-ext/scx-sync

    scx_sync: Sync scheduler changes from https://github.com/sched-ext/scx
    htejun authored Dec 5, 2023
    Configuration menu
    Copy the full SHA
    e5078c1 View commit details
    Browse the repository at this point in the history

Commits on Dec 6, 2023

  1. Configuration menu
    Copy the full SHA
    782f273 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    2f6ba98 View commit details
    Browse the repository at this point in the history

Commits on Dec 8, 2023

  1. Configuration menu
    Copy the full SHA
    9c18e3d View commit details
    Browse the repository at this point in the history
  2. Merge pull request #95 from sched-ext/sync

    scx_sync: Sync scheduler changes from https://github.com/sched-ext/scx
    Byte-Lab authored Dec 8, 2023
    Configuration menu
    Copy the full SHA
    5bb3614 View commit details
    Browse the repository at this point in the history
  3. scx: Add missing ) to $(error) invocation in Makefile

    We're missing a closing ) on a branch that we never take. Let's close it
    just for correctness.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Dec 8, 2023
    Configuration menu
    Copy the full SHA
    36d3838 View commit details
    Browse the repository at this point in the history
  4. Merge pull request #96 from sched-ext/makefile_fix

    scx: Add missing ) to $(error) invocation in Makefile
    Byte-Lab authored Dec 8, 2023
    Configuration menu
    Copy the full SHA
    963fc30 View commit details
    Browse the repository at this point in the history
  5. scx: Add skeleton for scx testing framework

    We should build a selftest suite to do some basic sanity testing of scx.
    Some elements are going to be borrowed from tools/testing/selftests/bpf,
    as we're going to be building and loading BPF progs, and sometimes
    verifying that BPF progs fail to load.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Dec 8, 2023
    Configuration menu
    Copy the full SHA
    d3f9558 View commit details
    Browse the repository at this point in the history
  6. Merge pull request #97 from sched-ext/scx_selftests

    scx: Add skeleton for scx testing framework
    htejun authored Dec 8, 2023
    Configuration menu
    Copy the full SHA
    177edd6 View commit details
    Browse the repository at this point in the history

Commits on Dec 28, 2023

  1. kernfs: convert kernfs_idr_lock to an irq safe raw spinlock

    bpf_cgroup_from_id() (provided by sched-ext) needs to acquire
    kernfs_idr_lock and it can be used in the scheduler dispatch path with
    rq->_lock held.
    
    But any kernfs function that is acquiring kernfs_idr_lock can be
    interrupted by a scheduler tick, that would try to acquire rq->_lock,
    triggering the following deadlock scenario:
    
            CPU0                    CPU1
            ----                    ----
       lock(kernfs_idr_lock);
                                    lock(rq->__lock);
                                    lock(kernfs_idr_lock);
       <Interrupt>
        lock(rq->__lock);
    
    More in general, considering that bpf_cgroup_from_id() is provided as a
    kfunc, potentially similar deadlock conditions can be triggered from any
    kprobe/tracepoint/fentry.
    
    For this reason, in order to prevent any potential deadlock scenario,
    convert kernfs_idr_lock to a raw irq safe spinlock.
    
    Signed-off-by: Andrea Righi <[email protected]>
    Andrea Righi committed Dec 28, 2023
    Configuration menu
    Copy the full SHA
    dad3fb6 View commit details
    Browse the repository at this point in the history
  2. sched_ext: fix race in scx_move_task() with exiting tasks

    There is a race with exiting tasks in scx_move_tasks() where we may fail
    to check for autogroup tasks, leading to the following oops:
    
     WARNING: CPU: 2 PID: 100 at kernel/sched/ext.c:2571 scx_move_task+0x9f/0xb0
     ...
     Sched_ext: flatcg (enabled+all), task: runnable_at=-5ms
     RIP: 0010:scx_move_task+0x9f/0xb0
     Call Trace:
      <TASK>
      ? scx_move_task+0x9f/0xb0
      ? __warn+0x85/0x170
      ? scx_move_task+0x9f/0xb0
      ? report_bug+0x171/0x1a0
      ? handle_bug+0x3b/0x70
      ? exc_invalid_op+0x17/0x70
      ? asm_exc_invalid_op+0x1a/0x20
      ? scx_move_task+0x9f/0xb0
      sched_move_task+0x104/0x300
      do_exit+0x37d/0xb70
      ? lock_release+0xbe/0x270
      do_group_exit+0x37/0xa0
      __x64_sys_exit_group+0x18/0x20
      do_syscall_64+0x44/0xf0
      entry_SYSCALL_64_after_hwframe+0x6f/0x77
    
    And a related NULL pointer dereference afterwards:
    
     BUG: kernel NULL pointer dereference, address: 0000000000000148
    
    Prevent this by skipping scx_move_tasks() actions for exiting tasks.
    
    Moreover, make scx_move_tasks() more reliable by triggering only the
    WARN_ON_ONCE() and returning, instead of triggering also the bug
    afterwards.
    
    Signed-off-by: Andrea Righi <[email protected]>
    Andrea Righi committed Dec 28, 2023
    Configuration menu
    Copy the full SHA
    6b747e0 View commit details
    Browse the repository at this point in the history
  3. Merge pull request #101 from arighi/fix-move-task-race

    sched_ext: fix race in scx_move_task() with exiting tasks
    htejun authored Dec 28, 2023
    Configuration menu
    Copy the full SHA
    79d694e View commit details
    Browse the repository at this point in the history

Commits on Jan 3, 2024

  1. scx: Support direct dispatching from ops.select_cpu()

    A common pattern in schedulers is to find and reserve an idle core in
    ops.select_cpu(), and to then use a task local storage map to specify
    that the task should be enqueued in SCX_DSQ_LOCAL on the ops.enqueue()
    path. At the same time, we also have a special SCX_TASK_ENQ_LOCAL
    enqueue flag which is used by scx_select_cpu_dfl() to notify
    ops.enqueue() that it may want to do a local enqueue.
    
    Taking a step back, direct dispatch is something that should be
    supported from the ops.select_cpu() path as well. The contract is that
    doing a direct dispatch to SCX_DSQ_LOCAL will dispatch the task to the
    local CPU of whatever is returned by ops.select_cpu(). With that in
    mind, let's just extend the API a bit to support direct dispatch from
    ops.select_cpu().
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Jan 3, 2024
    Configuration menu
    Copy the full SHA
    07acdca View commit details
    Browse the repository at this point in the history

Commits on Jan 4, 2024

  1. scx: Remove SCX_ENQ_LOCAL flag

    Now that we support dispatching directly from ops.select_cpu(), the
    SCX_ENQ_LOCAL flag isn't needed. The last place it was used was on the
    SCX_ENQ_LAST path to control whether a task would be dispatched locally
    if ops.enqueue() wasn't defined. It doesn't really make sense to define
    SCX_OPS_ENQ_LAST but not ops.enqueue(), so let's remove SCX_ENQ_LOCAL
    and validate that SCX_OPS_ENQ_LAST is never passed if ops.enqueue()
    isn't defined.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Jan 4, 2024
    Configuration menu
    Copy the full SHA
    08fc865 View commit details
    Browse the repository at this point in the history
  2. scx: Add scx_bpf_select_cpu_dfl() kfunc

    Some scheduler implementations may want to have ops.enqueue() invoked
    even if scx_select_cpu_dfl() finds an idle core for the enqueuing task
    to run on. In order to enable this, we can add a new
    scx_bpf_select_cpu_dfl() kfunc which allows a BPF scheduler to get the
    same behavior as the default ops.select_cpu() implementation, and then
    decide whether they want to dispatch directly from ops.select_cpu().
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Jan 4, 2024
    Configuration menu
    Copy the full SHA
    fadfa2f View commit details
    Browse the repository at this point in the history
  3. scx: Add selftests for new select_cpu dispatch semantics

    Let's test the new semantics for being able to do direct dispatch from
    ops.select_cpu(), including testing when SCX_OPS_ENQ_DFL_NO_DISPATCH is
    specified. Also adds a testcase validating that we can't load a
    scheduler with SCX_OPS_ENQ_LAST if ops.enqueue() is not defined.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Jan 4, 2024
    Configuration menu
    Copy the full SHA
    9fd2c3b View commit details
    Browse the repository at this point in the history
  4. Merge pull request #104 from sched-ext/select_cpu_dfl

    Allow dispatching from ops.select_cpu()
    Byte-Lab authored Jan 4, 2024
    Configuration menu
    Copy the full SHA
    d788214 View commit details
    Browse the repository at this point in the history
  5. scx: Error for a priq builtin DSQ in dispatch_enqueue()

    We're currently checking whether a builtin DSQ is being used with priq
    in scx_bpf_dispatch_vtime(). This neglects the fact that we could end up
    falling back to scx_dsq_global if there's an error. If we error out with
    SCX_ENQ_DSQ_PRIQ set in enqueue flags, we would trigger a warning in
    dispatch_enqueue(). Let's instead just move the check to inside of
    dispatch_enqueue().
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Jan 4, 2024
    Configuration menu
    Copy the full SHA
    2638aff View commit details
    Browse the repository at this point in the history
  6. scx: Add testcases for vtime-dispatching to builtin DSQs

    Let's verify that we're disallowing builtin DSQs from being dispatched
    to.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Jan 4, 2024
    Configuration menu
    Copy the full SHA
    d5b84a4 View commit details
    Browse the repository at this point in the history
  7. Merge pull request #105 from sched-ext/fix_fallback

    Fix fallback
    Byte-Lab authored Jan 4, 2024
    Configuration menu
    Copy the full SHA
    902d364 View commit details
    Browse the repository at this point in the history

Commits on Jan 5, 2024

  1. scx: Always set task scx weight before enable

    We were previously only calling it on the fork path, but we need to be
    calling it on the enable path as well.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Jan 5, 2024
    Configuration menu
    Copy the full SHA
    56b2ec9 View commit details
    Browse the repository at this point in the history
  2. scx: Call enable / disable on entry / exit to scx

    Currently, the ops.enable() and ops.disable() callbacks are invoked a
    single time for every task on the system. ops.enable() is invoked
    shortly after a task succeeds in ops.prep_enable(), and ops.disable() is
    invoked when a task exits, or when the BPF scheduler is unloaded.
    
    This API is a bit odd because ops.enable() can be invoked well before a
    task actually starts running in the BPF scheduler, so it's not
    necessarily useful as a way to bootstrap a process. For example,
    scx_simple does the following:
    
    void BPF_STRUCT_OPS(simple_enable, struct task_struct *p,
                        struct scx_enable_args *args)
    {
            p->scx.dsq_vtime = vtime_now;
    }
    
    If the task later switches to sched_ext, the value will of course be
    stale. While it ends up balancing out due to logic elsewhere in the
    scheduler, it's indicative of a somewhat awkward component of the API
    that can be improved.
    
    Instead, this patch has ops.enable() be invoked when a task is entering
    the scheduler for the first time, and and ops.disable() be invoked
    whenever a task is leaving the scheduler; be it because of exiting, the
    scheduler being unloaded, or the task manually switching sched policies.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Jan 5, 2024
    Configuration menu
    Copy the full SHA
    9604441 View commit details
    Browse the repository at this point in the history
  3. scx: Rename prep_enable() and cancel_enable(), add exit_task()

    ops.prep_enable() and ops.cancel_enable() have become arguably somewhat
    misnomers in that ops.enable() and ops.disable() may be called multiple
    times throughout a BPF prog being loaded, but ops.prep_enable() and
    ops.cancel_enable() will be called at most once. ops.prep_enable() is
    really more akin to initializing the task rather than preparing for
    ops.enable(), so let's rename it to ops.init_task() and
    ops.cancel_init() to reflect this.
    
    In addition, some schedulers are currently using ops.disable() to clean
    up whatever was initialized in (what was previously) ops.prep_enable().
    This doesn't work now that ops.disable() can be called multiple times,
    so we also need to add a new callback called exit_task() which is called
    exactly once when a task is exiting (if it was previously successfully
    initialized).
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Jan 5, 2024
    Configuration menu
    Copy the full SHA
    81e1051 View commit details
    Browse the repository at this point in the history
  4. scx: Add init_enable_count testcase

    We expect to have some sched_ext_ops callbacks be called differently
    depending on the scheduler, and the tasks running on the system.  Let's
    add a testcase that verifies that the init_task(), exit_task(),
    enable(), and disable() callbacks are all invoked correctly and as
    expected.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Jan 5, 2024
    Configuration menu
    Copy the full SHA
    aa60d9e View commit details
    Browse the repository at this point in the history
  5. scx: Move sched_ext_entity.ddsq_id out of modifiable fields

    When we added support for dispatching from ops.select_cpu(), I
    accidentally put the sched_ext_entity.ddsq_id field into the "modifiable
    fields" part of struct sched_ext_entity. It should be harmless, but
    there shouldn't be any reason for a scheduler to muck with it, so let's
    move it up.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Jan 5, 2024
    Configuration menu
    Copy the full SHA
    6b8ccfd View commit details
    Browse the repository at this point in the history
  6. scx: Add missing DSQ fallback test files

    I forgot to include these in the patch set that fixes and tests us
    gracefully falling back to the global DSQ.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Jan 5, 2024
    Configuration menu
    Copy the full SHA
    367eab2 View commit details
    Browse the repository at this point in the history
  7. Merge pull request #100 from sched-ext/fix_enable

    Fix and update semantics for ops.enable() and ops.disable()
    Byte-Lab authored Jan 5, 2024
    Configuration menu
    Copy the full SHA
    88568ae View commit details
    Browse the repository at this point in the history
  8. scx: Claim idle core in scx_select_cpu_dfl for nr_cpus_allowed ==1

    In scx_select_cpu_dfl(), we're currently returning prev_cpu if
    p->nr_cpus_allowed == 1. It makes sense to return prev_cpu if the task
    can't run on any other cores, but we might as well also try to claim the
    core as idle so that:
    
    1. scx_select_cpu_dfl() will directly dispatch it
    2. To prevent another core from incorrectly assuming that core will be
       idle when in reality that task will be enqueued to it. The mask will
       eventually be updated in __scx_update_idle(), but this seems more
       efficient.
    3. To have the idle cpumask bit be unset when the task is enqueued in
       ops.enqueue() (if the core scheduler is using
       scx_bpf_select_cpu_dfl()).
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Jan 5, 2024
    Configuration menu
    Copy the full SHA
    2cf297c View commit details
    Browse the repository at this point in the history
  9. scx: Make select_cpu_dfl test a bit less brittle

    select_cpu_dfl checks whether a task that's successfully dispatched from
    the default select_cpu implementation isn't subsequently enqueued. It's
    only doing the check for non-pcpu threads, but that's not really the
    condition we want to look for. We don't want to do the check for any
    task that's being enqueued on the enable path, because it won't have
    gone through the select_cpu path. Instead, let's just check the task
    name to verify it's the test task.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Jan 5, 2024
    Configuration menu
    Copy the full SHA
    e6cb892 View commit details
    Browse the repository at this point in the history

Commits on Jan 6, 2024

  1. Merge pull request #106 from sched-ext/prev_cpu_idle_reserve

    Claim idle core in scx_select_cpu_dfl for nr_cpus_allowed ==1
    htejun authored Jan 6, 2024
    Configuration menu
    Copy the full SHA
    8ccf9d7 View commit details
    Browse the repository at this point in the history

Commits on Jan 8, 2024

  1. scx: Avoid possible deadlock with cpus_read_lock()

    Han Xing Yi reported a syzbot lockdep error over the weekend:
    
    ======================================================
    WARNING: possible circular locking dependency detected
    6.6.0-g2f6ba98e2d3d #4 Not tainted
    ------------------------------------------------------
    syz-executor.0/2181 is trying to acquire lock:
    ffffffff84772410 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x216/0x590 net/core/net_namespace.c:487
    but task is already holding lock:
    ffffffff8449dc50 (scx_fork_rwsem){++++}-{0:0}, at: sched_fork+0x3b/0x190 kernel/sched/core.c:4810
    which lock already depends on the new lock.
    the existing dependency chain (in reverse order) is:
    -> #3 (scx_fork_rwsem){++++}-{0:0}:
           percpu_down_write+0x51/0x210 kernel/locking/percpu-rwsem.c:227
           scx_ops_enable+0x230/0xf90 kernel/sched/ext.c:3271
           bpf_struct_ops_link_create+0x1b9/0x220 kernel/bpf/bpf_struct_ops.c:914
           link_create kernel/bpf/syscall.c:4938 [inline]
           __sys_bpf+0x35af/0x4ac0 kernel/bpf/syscall.c:5453
           __do_sys_bpf kernel/bpf/syscall.c:5487 [inline]
           __se_sys_bpf kernel/bpf/syscall.c:5485 [inline]
           __x64_sys_bpf+0x48/0x60 kernel/bpf/syscall.c:5485
           do_syscall_x64 arch/x86/entry/common.c:51 [inline]
           do_syscall_64+0x46/0x100 arch/x86/entry/common.c:82
           entry_SYSCALL_64_after_hwframe+0x6e/0x76
    -> #2 (cpu_hotplug_lock){++++}-{0:0}:
           percpu_down_read include/linux/percpu-rwsem.h:51 [inline]
           cpus_read_lock+0x42/0x1b0 kernel/cpu.c:489
           flush_all_backlogs net/core/dev.c:5885 [inline]
           unregister_netdevice_many_notify+0x30a/0x1070 net/core/dev.c:10965
           unregister_netdevice_many+0x19/0x20 net/core/dev.c:11039
           sit_exit_batch_net+0x433/0x460 net/ipv6/sit.c:1887
           ops_exit_list+0xc5/0xe0 net/core/net_namespace.c:175
           cleanup_net+0x3e2/0x750 net/core/net_namespace.c:614
           process_one_work+0x50d/0xc20 kernel/workqueue.c:2630
           process_scheduled_works kernel/workqueue.c:2703 [inline]
           worker_thread+0x50b/0x950 kernel/workqueue.c:2784
           kthread+0x1fa/0x250 kernel/kthread.c:388
           ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147
           ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242
    -> #1 (rtnl_mutex){+.+.}-{3:3}:
           __mutex_lock_common kernel/locking/mutex.c:603 [inline]
           __mutex_lock+0xc1/0xea0 kernel/locking/mutex.c:747
           mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:799
           rtnl_lock+0x17/0x20 net/core/rtnetlink.c:79
           register_netdevice_notifier+0x25/0x1c0 net/core/dev.c:1741
           rtnetlink_init+0x3a/0x6e0 net/core/rtnetlink.c:6657
           netlink_proto_init+0x23d/0x2f0 net/netlink/af_netlink.c:2946
           do_one_initcall+0xb3/0x5f0 init/main.c:1232
           do_initcall_level init/main.c:1294 [inline]
           do_initcalls init/main.c:1310 [inline]
           do_basic_setup init/main.c:1329 [inline]
           kernel_init_freeable+0x40c/0x5d0 init/main.c:1547
           kernel_init+0x1d/0x350 init/main.c:1437
           ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147
           ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242
    -> #0 (pernet_ops_rwsem){++++}-{3:3}:
           check_prev_add kernel/locking/lockdep.c:3134 [inline]
           check_prevs_add kernel/locking/lockdep.c:3253 [inline]
           validate_chain kernel/locking/lockdep.c:3868 [inline]
           __lock_acquire+0x16b4/0x2b30 kernel/locking/lockdep.c:5136
           lock_acquire kernel/locking/lockdep.c:5753 [inline]
           lock_acquire+0xc1/0x2b0 kernel/locking/lockdep.c:5718
           down_read_killable+0x5d/0x280 kernel/locking/rwsem.c:1549
           copy_net_ns+0x216/0x590 net/core/net_namespace.c:487
           create_new_namespaces+0x2ed/0x770 kernel/nsproxy.c:110
           copy_namespaces+0x488/0x540 kernel/nsproxy.c:179
           copy_process+0x1b52/0x4680 kernel/fork.c:2504
           kernel_clone+0x116/0x660 kernel/fork.c:2914
           __do_sys_clone3+0x192/0x220 kernel/fork.c:3215
           __se_sys_clone3 kernel/fork.c:3199 [inline]
           __x64_sys_clone3+0x30/0x40 kernel/fork.c:3199
           do_syscall_x64 arch/x86/entry/common.c:51 [inline]
           do_syscall_64+0x46/0x100 arch/x86/entry/common.c:82
           entry_SYSCALL_64_after_hwframe+0x6e/0x76
    other info that might help us debug this:
    Chain exists of:
      pernet_ops_rwsem --> cpu_hotplug_lock --> scx_fork_rwsem
     Possible unsafe locking scenario:
           CPU0                    CPU1
           ----                    ----
      rlock(scx_fork_rwsem);
                                   lock(cpu_hotplug_lock);
                                   lock(scx_fork_rwsem);
      rlock(pernet_ops_rwsem);
     *** DEADLOCK ***
    1 lock held by syz-executor.0/2181:
     #0: ffffffff8449dc50 (scx_fork_rwsem){++++}-{0:0}, at: sched_fork+0x3b/0x190 kernel/sched/core.c:4810
    stack backtrace:
    CPU: 0 PID: 2181 Comm: syz-executor.0 Not tainted 6.6.0-g2f6ba98e2d3d #4
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
    Sched_ext: serialise (enabled), task: runnable_at=-6ms
    Call Trace:
     <TASK>
     __dump_stack lib/dump_stack.c:89 [inline]
     dump_stack_lvl+0x91/0xf0 lib/dump_stack.c:107
     dump_stack+0x15/0x20 lib/dump_stack.c:114
     check_noncircular+0x134/0x150 kernel/locking/lockdep.c:2187
     check_prev_add kernel/locking/lockdep.c:3134 [inline]
     check_prevs_add kernel/locking/lockdep.c:3253 [inline]
     validate_chain kernel/locking/lockdep.c:3868 [inline]
     __lock_acquire+0x16b4/0x2b30 kernel/locking/lockdep.c:5136
     lock_acquire kernel/locking/lockdep.c:5753 [inline]
     lock_acquire+0xc1/0x2b0 kernel/locking/lockdep.c:5718
     down_read_killable+0x5d/0x280 kernel/locking/rwsem.c:1549
     copy_net_ns+0x216/0x590 net/core/net_namespace.c:487
     create_new_namespaces+0x2ed/0x770 kernel/nsproxy.c:110
     copy_namespaces+0x488/0x540 kernel/nsproxy.c:179
     copy_process+0x1b52/0x4680 kernel/fork.c:2504
     kernel_clone+0x116/0x660 kernel/fork.c:2914
     __do_sys_clone3+0x192/0x220 kernel/fork.c:3215
     __se_sys_clone3 kernel/fork.c:3199 [inline]
     __x64_sys_clone3+0x30/0x40 kernel/fork.c:3199
     do_syscall_x64 arch/x86/entry/common.c:51 [inline]
     do_syscall_64+0x46/0x100 arch/x86/entry/common.c:82
     entry_SYSCALL_64_after_hwframe+0x6e/0x76
    RIP: 0033:0x7f9f764e240d
    Code: c3 e8 97 2b 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
    RSP: 002b:00007f9f75851ee8 EFLAGS: 00000246 ORIG_RAX: 00000000000001b3
    RAX: ffffffffffffffda RBX: 00007f9f7661ef80 RCX: 00007f9f764e240d
    RDX: 0000000000000100 RSI: 0000000000000058 RDI: 00007f9f75851f00
    RBP: 00007f9f765434a6 R08: 0000000000000000 R09: 0000000000000058
    R10: 00007f9f75851f00 R11: 0000000000000246 R12: 0000000000000058
    R13: 0000000000000006 R14: 00007f9f7661ef80 R15: 00007f9f75832000
     </TASK>
    
    The issue is that we're acquiring the cpus_read_lock() _before_ we
    acquire scx_fork_rwsem in scx_ops_enable() and scx_ops_disable(), but we
    acquire and hold scx_fork_rwsem around basically the whole fork() path.
    I don't see how a deadlock could actually occur in practice, but it
    should be safe to acquire the scx_fork_rwsem and scx_cgroup_rwsem
    semaphores before the hotplug lock, so let's do that.
    
    Reported-by: Han Xing Yi <[email protected]>
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Jan 8, 2024
    Configuration menu
    Copy the full SHA
    c3c7041 View commit details
    Browse the repository at this point in the history
  2. scx: Set default slice for default select_cpu dispatch

    If ops.select_cpu() isn't defined, scx_select_cpu_dfl() will be called,
    and a task will be dispatched directly to a core if one is found. I
    neglected to also set the task slice, so we see the following warning if
    we use the direct dispatch:
    
    [root@arch scx]# ./select_cpu_dfl
    [   23.184426] sched_ext: select_cpu_dfl[356] has zero slice in pick_next_task_scx()
    
    I'm not sure why this wasn't being printed when I tested this before,
    but let's fix it.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Jan 8, 2024
    Configuration menu
    Copy the full SHA
    8bbe0db View commit details
    Browse the repository at this point in the history
  3. Merge pull request #109 from sched-ext/dfl_slice

    scx: Set default slice for default select_cpu dispatch
    htejun authored Jan 8, 2024
    Configuration menu
    Copy the full SHA
    15f2f4f View commit details
    Browse the repository at this point in the history
  4. Merge pull request #108 from sched-ext/avoid_deadlock

    scx: Avoid possible deadlock with cpus_read_lock()
    htejun authored Jan 8, 2024
    Configuration menu
    Copy the full SHA
    dd92f1a View commit details
    Browse the repository at this point in the history
  5. scx: Use READ/WRITE_ONCE() for scx_watchdog_timeout/timestamp

    They're accessed without any locking and check_rq_for_timeouts() seems to
    assume that last_runnable doesn't get fetched multipled times which isn't
    true without READ_ONCE().
    
    Signed-off-by: Tejun Heo <[email protected]>
    htejun committed Jan 8, 2024
    Configuration menu
    Copy the full SHA
    4164e16 View commit details
    Browse the repository at this point in the history
  6. scx: Rename rq->scx.watchdog_list and friends to runnable_list and co…

    …unterparts
    
    The list will be used for another purpose too. Rename to indicate the
    generic nature.
    
    Signed-off-by: Tejun Heo <[email protected]>
    htejun committed Jan 8, 2024
    Configuration menu
    Copy the full SHA
    9c0a799 View commit details
    Browse the repository at this point in the history
  7. scx: Factor out scx_ops_bypass() and s/scx_ops_disabling()/scx_ops_by…

    …passing()/g
    
    Guaranteeing forward progress by forcing global FIFO behavior is currently
    used only in the disabling path. This will be used for something else too.
    Let's factor it out and rename accordingly.
    
    No functional change intended.
    
    Signed-off-by: Tejun Heo <[email protected]>
    htejun committed Jan 8, 2024
    Configuration menu
    Copy the full SHA
    215f0ff View commit details
    Browse the repository at this point in the history
  8. scx: Implement bypass depth and always bypass while disabling

    Implement bypass depth so that multiple users can request bypassing without
    conflicts. This decouples bypass on/off from ops state so that bypassing can
    be used in combination with any ops state. The unbypassing path isn't used
    yet and is to be implemented.
    
    Note that task_should_scx() needs to test whether DISABLING rather than
    bypassing and thus updated to test scx_ops_enable_state() explicitly.
    
    The disable path now always uses bypassing to balance bypass depth. This
    also leads to simpler code.
    
    Signed-off-by: Tejun Heo <[email protected]>
    htejun committed Jan 8, 2024
    Configuration menu
    Copy the full SHA
    f4c4ef2 View commit details
    Browse the repository at this point in the history
  9. scx: Implement turning off bypassing

    Bypassing overrides ops.enqueue() and .dispatch() to force global FIFO
    behavior. However, this was an irreversible action making it impossible to
    turn off bypassing. Instead, add behaviors conditional on
    scx_ops_bypassing() to implement global FIFO behavior while bypassing. This
    adds two condition checks to hot paths but they're easily predictable and
    shouldn't add noticeable overhead.
    
    Signed-off-by: Tejun Heo <[email protected]>
    htejun committed Jan 8, 2024
    Configuration menu
    Copy the full SHA
    a00ac85 View commit details
    Browse the repository at this point in the history
  10. scx: Optimize scx_ops_bypass()

    scx_ops_bypass() involves scanning all tasks in the system and can thus
    become pretty expensive which limits its utility. scx_ops_bypass() isn't
    making any persistent changes to tasks. It just wants to dequeue and
    re-enqueue runnable tasks so that they're queued according to the current
    bypass state. As such, it can iterate the runnable tasks rather than all.
    
    This patch makes scx_ops_bypass() iterate each CPU's rq->scx.runnable_list.
    There are subtle complications due to the inability to trust the scheduler
    and each task going off and getting back on the runnable_list as they get
    cycled. See the comments for details.
    
    After this optimization, [un]bypassing should be pretty cheap in most
    circumstances.
    
    Signed-off-by: Tejun Heo <[email protected]>
    htejun committed Jan 8, 2024
    Configuration menu
    Copy the full SHA
    8583a03 View commit details
    Browse the repository at this point in the history
  11. scx: Expose bypassing state to userland

    Signed-off-by: Tejun Heo <[email protected]>
    htejun committed Jan 8, 2024
    Configuration menu
    Copy the full SHA
    303c346 View commit details
    Browse the repository at this point in the history
  12. scx: s/register_ext_kfuncs()/scx_init()/

    We need more stuff to do in the init function. Give it a more generic name.
    
    Signed-off-by: Tejun Heo <[email protected]>
    htejun committed Jan 8, 2024
    Configuration menu
    Copy the full SHA
    a37ef8e View commit details
    Browse the repository at this point in the history
  13. scx: Bypass while PM operations are in progress

    SCX schedulers often have userspace components which are sometimes involved
    in critial scheduling paths. PM operations involve freezing userspace which
    can lead to scheduling misbehaviors including stalls. Let's bypass while PM
    operations are in progress.
    
    Signed-off-by: Tejun Heo <[email protected]>
    Reported-by: Andrea Righi <[email protected]>
    htejun committed Jan 8, 2024
    Configuration menu
    Copy the full SHA
    df28190 View commit details
    Browse the repository at this point in the history
  14. scx: Disabling scx_bpf_kick_cpu() while bypassing

    scx_bpf_kick_cpu() uses irq_work. However, if called while e.g. suspending,
    IRQ handling may already be offline and scheduling irq_work can hang
    indefinitely. There's no need for kicking while bypassing anyway, let's
    suppress scx_bpf_kick_cpu() while bypassing.
    
    Signed-off-by: Tejun Heo <[email protected]>
    htejun committed Jan 8, 2024
    Configuration menu
    Copy the full SHA
    a62d59c View commit details
    Browse the repository at this point in the history
  15. Merge pull request #103 from sched-ext/htejun

    Implement generic bypass mode and use it while PM operations are in progress
    htejun authored Jan 8, 2024
    Configuration menu
    Copy the full SHA
    a7150a9 View commit details
    Browse the repository at this point in the history
  16. Revert "scx: Avoid possible deadlock with cpus_read_lock()"

    This reverts commit c3c7041.
    
    We hit a locking ordering issue in the other direction. Let's revert for
    now.
    
    [    9.378773] ======================================================
    [    9.379476] WARNING: possible circular locking dependency detected
    [    9.379532] 6.6.0-work-10442-ga7150a9168f8-dirty #134 Not tainted
    [    9.379532] ------------------------------------------------------
    [    9.379532] scx_rustland/1622 is trying to acquire lock:
    [    9.379532] ffffffff8325f828 (cpu_hotplug_lock){++++}-{0:0}, at: bpf_scx_reg+0xe4/0xcf0
    [    9.379532]
    [    9.379532] but task is already holding lock:
    [    9.379532] ffffffff83271be8 (scx_cgroup_rwsem){++++}-{0:0}, at: bpf_scx_reg+0xdf/0xcf0
    [    9.379532]
    [    9.379532] which lock already depends on the new lock.
    [    9.379532]
    [    9.379532]
    [    9.379532] the existing dependency chain (in reverse order) is:
    [    9.379532]
    [    9.379532] -> #2 (scx_cgroup_rwsem){++++}-{0:0}:
    [    9.379532]        percpu_down_read+0x2e/0xb0
    [    9.379532]        scx_cgroup_can_attach+0x25/0x200
    [    9.379532]        cpu_cgroup_can_attach+0xe/0x10
    [    9.379532]        cgroup_migrate_execute+0xaf/0x450
    [    9.379532]        cgroup_apply_control+0x227/0x2a0
    [    9.379532]        cgroup_subtree_control_write+0x425/0x4b0
    [    9.379532]        cgroup_file_write+0x82/0x260
    [    9.379532]        kernfs_fop_write_iter+0x131/0x1c0
    [    9.379532]        vfs_write+0x1f9/0x270
    [    9.379532]        ksys_write+0x62/0xc0
    [    9.379532]        __x64_sys_write+0x1b/0x20
    [    9.379532]        do_syscall_64+0x40/0xe0
    [    9.379532]        entry_SYSCALL_64_after_hwframe+0x46/0x4e
    [    9.379532]
    [    9.379532] -> #1 (cgroup_threadgroup_rwsem){++++}-{0:0}:
    [    9.379532]        percpu_down_write+0x35/0x1e0
    [    9.379532]        cgroup_procs_write_start+0x8a/0x210
    [    9.379532]        __cgroup_procs_write+0x4c/0x160
    [    9.379532]        cgroup_procs_write+0x17/0x30
    [    9.379532]        cgroup_file_write+0x82/0x260
    [    9.379532]        kernfs_fop_write_iter+0x131/0x1c0
    [    9.379532]        vfs_write+0x1f9/0x270
    [    9.379532]        ksys_write+0x62/0xc0
    [    9.379532]        __x64_sys_write+0x1b/0x20
    [    9.379532]        do_syscall_64+0x40/0xe0
    [    9.379532]        entry_SYSCALL_64_after_hwframe+0x46/0x4e
    [    9.379532]
    [    9.379532] -> #0 (cpu_hotplug_lock){++++}-{0:0}:
    [    9.379532]        __lock_acquire+0x142d/0x2a30
    [    9.379532]        lock_acquire+0xbf/0x1f0
    [    9.379532]        cpus_read_lock+0x2f/0xc0
    [    9.379532]        bpf_scx_reg+0xe4/0xcf0
    [    9.379532]        bpf_struct_ops_link_create+0xb6/0x100
    [    9.379532]        link_create+0x49/0x200
    [    9.379532]        __sys_bpf+0x351/0x3e0
    [    9.379532]        __x64_sys_bpf+0x1c/0x20
    [    9.379532]        do_syscall_64+0x40/0xe0
    [    9.379532]        entry_SYSCALL_64_after_hwframe+0x46/0x4e
    [    9.379532]
    [    9.379532] other info that might help us debug this:
    [    9.379532]
    [    9.379532] Chain exists of:
    [    9.379532]   cpu_hotplug_lock --> cgroup_threadgroup_rwsem --> scx_cgroup_rwsem
    [    9.379532]
    [    9.379532]  Possible unsafe locking scenario:
    [    9.379532]
    [    9.379532]        CPU0                    CPU1
    [    9.379532]        ----                    ----
    [    9.379532]   lock(scx_cgroup_rwsem);
    [    9.379532]                                lock(cgroup_threadgroup_rwsem);
    [    9.379532]                                lock(scx_cgroup_rwsem);
    [    9.379532]   rlock(cpu_hotplug_lock);
    [    9.379532]
    [    9.379532]  *** DEADLOCK ***
    [    9.379532]
    [    9.379532] 3 locks held by scx_rustland/1622:
    [    9.379532]  #0: ffffffff83272708 (scx_ops_enable_mutex){+.+.}-{3:3}, at: bpf_scx_reg+0x2a/0xcf0
    [    9.379532]  #1: ffffffff83271aa0 (scx_fork_rwsem){++++}-{0:0}, at: bpf_scx_reg+0xd3/0xcf0
    [    9.379532]  #2: ffffffff83271be8 (scx_cgroup_rwsem){++++}-{0:0}, at: bpf_scx_reg+0xdf/0xcf0
    [    9.379532]
    [    9.379532] stack backtrace:
    [    9.379532] CPU: 7 PID: 1622 Comm: scx_rustland Not tainted 6.6.0-work-10442-ga7150a9168f8-dirty #134
    [    9.379532] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS unknown 2/2/2022
    [    9.379532] Sched_ext: rustland (prepping)
    [    9.379532] Call Trace:
    [    9.379532]  <TASK>
    [    9.379532]  dump_stack_lvl+0x55/0x70
    [    9.379532]  dump_stack+0x10/0x20
    [    9.379532]  print_circular_bug+0x2ea/0x2f0
    [    9.379532]  check_noncircular+0xe2/0x100
    [    9.379532]  __lock_acquire+0x142d/0x2a30
    [    9.379532]  ? lock_acquire+0xbf/0x1f0
    [    9.379532]  ? rcu_sync_func+0x2c/0xa0
    [    9.379532]  lock_acquire+0xbf/0x1f0
    [    9.379532]  ? bpf_scx_reg+0xe4/0xcf0
    [    9.379532]  cpus_read_lock+0x2f/0xc0
    [    9.379532]  ? bpf_scx_reg+0xe4/0xcf0
    [    9.379532]  bpf_scx_reg+0xe4/0xcf0
    [    9.379532]  ? alloc_file+0xa4/0x160
    [    9.379532]  ? alloc_file_pseudo+0x99/0xd0
    [    9.379532]  ? anon_inode_getfile+0x79/0xc0
    [    9.379532]  ? bpf_link_prime+0xe2/0x1a0
    [    9.379532]  bpf_struct_ops_link_create+0xb6/0x100
    [    9.379532]  link_create+0x49/0x200
    [    9.379532]  __sys_bpf+0x351/0x3e0
    [    9.379532]  __x64_sys_bpf+0x1c/0x20
    [    9.379532]  do_syscall_64+0x40/0xe0
    [    9.379532]  ? sysvec_apic_timer_interrupt+0x44/0x80
    [    9.379532]  entry_SYSCALL_64_after_hwframe+0x46/0x4e
    [    9.379532] RIP: 0033:0x7fc391f7473d
    [    9.379532] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 95 0c 00 f7 d8 64 89 01 48
    [    9.379532] RSP: 002b:00007ffeb4fe4108 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
    [    9.379532] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc391f7473d
    [    9.379532] RDX: 0000000000000030 RSI: 00007ffeb4fe4120 RDI: 000000000000001c
    [    9.379532] RBP: 000000000000000c R08: 000000000000000c R09: 000055d0a75b1a10
    [    9.379532] R10: 0000000000000050 R11: 0000000000000246 R12: 000000000000002c
    [    9.379532] R13: 00007ffeb4fe4628 R14: 0000000000000000 R15: 00007ffeb4fe4328
    [    9.379532]  </TASK>
    htejun committed Jan 8, 2024
    Configuration menu
    Copy the full SHA
    8588d4f View commit details
    Browse the repository at this point in the history
  17. Merge pull request #110 from sched-ext/lockdep-revert

    Revert "scx: Avoid possible deadlock with cpus_read_lock()"
    htejun authored Jan 8, 2024
    Configuration menu
    Copy the full SHA
    ca86e0d View commit details
    Browse the repository at this point in the history
  18. scx: Make scx_task_state handling more idiomatic

    Functionally equivalent. Just a bit more idiomatic.
    
    Signed-off-by: Tejun Heo <[email protected]>
    htejun committed Jan 8, 2024
    Configuration menu
    Copy the full SHA
    22c3627 View commit details
    Browse the repository at this point in the history
  19. Merge tag 'v6.7' into scx-sync-upstream

    Linux 6.7
    htejun committed Jan 8, 2024
    Configuration menu
    Copy the full SHA
    b7858a0 View commit details
    Browse the repository at this point in the history
  20. Configuration menu
    Copy the full SHA
    5445296 View commit details
    Browse the repository at this point in the history
  21. Configuration menu
    Copy the full SHA
    8c7f9b2 View commit details
    Browse the repository at this point in the history

Commits on Jan 9, 2024

  1. Configuration menu
    Copy the full SHA
    88e7560 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #113 from sched-ext/htejun

    scx: Sync schedulers from SCX v0.1.5 (74923c6cdbc3)
    htejun authored Jan 9, 2024
    Configuration menu
    Copy the full SHA
    f4dc571 View commit details
    Browse the repository at this point in the history
  3. scx: Fix direct dispatch for non-builtin DSQs

    If we've done a direct dispatch from ops.select_cpu(), we can currently
    hang the host if we dispatch to a non-local DSQ. This is because we
    circumvent some important checks, such as whether we should be bypassing
    ops.enqueue() and dispatching directly to the local or global DSQ.
    
    Doing a local dispatch now doesn't hang the host because we happen to be
    dispatching to a safe, builtin DSQ. Let's instead update the logic to
    only do the direct dispatch after these critical checks.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Jan 9, 2024
    Configuration menu
    Copy the full SHA
    9ad5535 View commit details
    Browse the repository at this point in the history
  4. scx: Keep track of enq flags in direct dispatch

    We're currently not remembering enq flags during direct dispatch. Let's
    record them in case someone wants to pass e.g. SCX_ENQ_PREEMPT from
    ops.select_cpu().
    
    Let's also reset ddsq_id and ddsq_enq_flags before calling
    dispatch_enqueue() to ensure there's no races with the task being
    consumed from another core.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Jan 9, 2024
    Configuration menu
    Copy the full SHA
    4b56f6e View commit details
    Browse the repository at this point in the history
  5. scx: Test vtime dispatching from ops.select_cpu()

    Let's test that we properly stash enq flags by doing vtime dispatching
    from ops.select_cpu().
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Jan 9, 2024
    Configuration menu
    Copy the full SHA
    59ad5bd View commit details
    Browse the repository at this point in the history
  6. Merge pull request #115 from sched-ext/enq_flags

    Stash enq_flags when marking for direct dispatch
    Byte-Lab authored Jan 9, 2024
    Configuration menu
    Copy the full SHA
    7909b33 View commit details
    Browse the repository at this point in the history

Commits on Jan 10, 2024

  1. scx: Implement scx selftests framework

    We want to make it as easy as possible both to run tests, and to
    implement them. This means we ideally want a single test runner binary
    that can run the testcases, while also making it trivial to add a
    testcase without worrying about having to update the runner itself.
    
    To accomplish this, this patch adds a new declarative mechanism for
    defining scx tests by implementing a struct scx_test object. Tests can
    simply define such a struct, and then register it with the testrunner
    using a REGISTER_SCX_TEST macro. The build system will automatically
    compile the testcase and add machinery to have it be auto-registered
    into the runner binary. The runner binary then outputs test results in
    ktap [0] format so it can be consumed by CI systems.
    
    [0]: https://docs.kernel.org/dev-tools/ktap.html
    
    This patch simply implements the framework, adds a test_example.c file
    that illustrates how to add a testcase, and converts a few existing
    testcases to use the framework. If the framework is acceptable, we can
    convert the rest.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Jan 10, 2024
    Configuration menu
    Copy the full SHA
    c64a804 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #117 from sched-ext/refactor_tests

    scx: Implement scx selftests framework
    htejun authored Jan 10, 2024
    Configuration menu
    Copy the full SHA
    228db9d View commit details
    Browse the repository at this point in the history
  3. scx: Convert remaining testcases to use new framework

    Now that the framework has been merged, let's update the remaining
    testcases to use it.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Jan 10, 2024
    Configuration menu
    Copy the full SHA
    d5061f9 View commit details
    Browse the repository at this point in the history
  4. scx: Update ddsp testcases to check for error exits

    We're checking that we don't crash when we encounter these error
    conditions, but let's also test that we exit with the expected error
    condition. The next patch will update this to be built into the test
    framework.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Jan 10, 2024
    Configuration menu
    Copy the full SHA
    1fa672f View commit details
    Browse the repository at this point in the history
  5. scx: Copy scx_exit_kind to scx_test.h

    Rather than define the error value in each test, let's just define it in
    scx_test.h.
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Jan 10, 2024
    Configuration menu
    Copy the full SHA
    8d7a79e View commit details
    Browse the repository at this point in the history
  6. Merge pull request #118 from sched-ext/refactor_tests

    scx: Convert remaining testcases to use new framework
    htejun authored Jan 10, 2024
    Configuration menu
    Copy the full SHA
    7592388 View commit details
    Browse the repository at this point in the history
  7. scx: Narrow cpus_read_lock() critical section in scx_ops_enable()

    cpus_read_lock() is needed for two purposes in scx_ops_enable(). First, to
    keep CPUs stable between ops.init() and enabling of ops.cpu_on/offline().
    Second, to work around the locking order issue between scx_cgroup_rwsem and
    cpu_hotplug_lock caused by static_branch_*().
    
    Currently, scx_ops_enable() acquires cpus_read_lock() and holds it through
    most of ops enabling covering both use cases. This makes it difficult to
    understand what lock is held where and resolve locking order issues among
    these system-wide locks.
    
    Let's separate out the two sections so that ops.init() and
    ops.cpu_on/offline() enabling are contained in its own critical section and
    cpus_read_lock() is droped and then reacquired for the second use case.
    
    Signed-off-by: Tejun Heo <[email protected]>
    htejun committed Jan 10, 2024
    Configuration menu
    Copy the full SHA
    4bbb07c View commit details
    Browse the repository at this point in the history
  8. scx: Reorder scx_fork_rwsem, cpu_hotplug_lock and scx_cgroup_rwsem

    scx_cgroup_rwsem and scx_fork_rwsem, respectively, are in the following
    locking dependency chain.
    
      cpu_hotplug_lock --> cgroup_threadgroup_rwsem --> scx_cgroup_rwsem
      scx_fork_rwsem --> pernet_ops_rwsem --> cpu_hotplug_lock
    
    And we need to flip static_key which requires CPUs stable. The only locking
    order which satifies all three requirements is
    
      scx_fork_rwsem --> cpu_hotplug_lock --> scx_cgroup_rwsem
    
    Reorder locking in scx_ops_enable() and scx_ops_disable_workfn().
    
    Signed-off-by: Tejun Heo <[email protected]>
    htejun committed Jan 10, 2024
    Configuration menu
    Copy the full SHA
    1225a90 View commit details
    Browse the repository at this point in the history

Commits on Jan 11, 2024

  1. Merge pull request #119 from sched-ext/htejun

    scx: Fix locking order
    Byte-Lab authored Jan 11, 2024
    Configuration menu
    Copy the full SHA
    4361d23 View commit details
    Browse the repository at this point in the history
  2. scx: Sync from scx repo

    b32d73ae4e19 ("Merge pull request #82 from sched-ext/htejun")
    htejun committed Jan 11, 2024
    Configuration menu
    Copy the full SHA
    dfb1210 View commit details
    Browse the repository at this point in the history
  3. Merge pull request #120 from sched-ext/htejun

    scx: Sync from scx repo
    htejun authored Jan 11, 2024
    Configuration menu
    Copy the full SHA
    6eb6c92 View commit details
    Browse the repository at this point in the history

Commits on Jan 18, 2024

  1. ci: add github workflow to test the sched-ext kernel

    Add a github action to test the sched-ext kernel with all the shipped
    schedulers.
    
    The test uses a similar approach to the scx workflow [1], using
    virtme-ng to run each scheduler inside a sched-ext enabled kernel for a
    certain amount of time (30 sec) and checking for potential stall, oops
    or bug conditions.
    
    In this case we can use `virtme-ng --build` to build a kernel with bare
    minimum support to run inside virtme-ng itself, instead of generating a
    fully featured kernel, to expedite the testing process.
    
    The mandatory .config options required by sched-ext are stored in
    `.github/workflows/sched-ext.config` and they are passed to virtme-ng
    via the `--config` option.
    
    The test itself is defined in `.github/workflows/run-schedulers`: the
    script looks for all the binaries in `tools/sched_ext/build/bin` and
    runs each one in a separate virtme-ng instance, to ensure that each run
    does not impact the others.
    
    [1] https://github.com/sched-ext/scx/blob/main/.github/workflows/build-scheds.yml
    
    Signed-off-by: Andrea Righi <[email protected]>
    Andrea Righi committed Jan 18, 2024
    Configuration menu
    Copy the full SHA
    74cdbb0 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #116 from arighi/github-ci

    ci: add github workflow to test the sched-ext kernel
    Andrea Righi authored Jan 18, 2024
    Configuration menu
    Copy the full SHA
    4e59c90 View commit details
    Browse the repository at this point in the history
  3. scx: Make the pointer passing to .dispatch MAYBE_NULL.

    The struct task_struct pointer passing to .dispatch can be NULL.
    However, we assume that the pointers passing to a struct_ops programs
    are always trusted (PTR_TRUSTED), that means it is always valid (not
    NULL). It makes the verifier fail to validate programs, and may cause
    a kernel crash when running these programs.
    
    This patch marks the second argument of .dispatch with
    PTR_MAYBE_NULL | PTR_TO_BTF_ID | PTR_TRUSTED in
    bpf_scx_is_valid_access(). The verifier  will ensures the programs
    always check if the argument is NULL before reading the pointed memory.
    ThinkerYzu1 committed Jan 18, 2024
    Configuration menu
    Copy the full SHA
    7d420b5 View commit details
    Browse the repository at this point in the history
  4. selftests/scx: Check if MAYBE_NULL works for the 2nd argument of .dis…

    …patch.
    
    Check if the verifier can catch the invalid access if a .dispatch
    program doesn't check the 2nd argument before accessing the pointed
    memory. Also check if the verifier allows a program which check the
    2nd argument before accessing the pointed memory.
    ThinkerYzu1 committed Jan 18, 2024
    Configuration menu
    Copy the full SHA
    b21b258 View commit details
    Browse the repository at this point in the history
  5. scx: Add /sys/kernel/sched_ext interface

    /sys/kernel/debug/sched/ext is the current interface file which can be used
    to determine the current state of scx. This is problematic in that it's
    dependent on CONFIG_SCHED_DEBUG. On kernels which don't have the option
    enabled, there is no easy way to tell whether scx is currently in use.
    
    Let's add a new kobject based interface which is created under
    /sys/kernel/sched_ext. The directory contains:
    
    - System level interface files. As it's now a non-debug interface, confine
      the exposed files to "state", "switch_all" and "nr_rejected".
    
    - Per-scheduler directory which currently only contains "ops". The directory
      is always named "root" for now. This is in preparation of the future where
      there can be multiple schedulers loaded in a system. Loading and unloading
      of a scheduler also generates a uevent with SCXOPS attribute.
    
    Signed-off-by: Tejun Heo <[email protected]>
    htejun committed Jan 18, 2024
    Configuration menu
    Copy the full SHA
    e7a7781 View commit details
    Browse the repository at this point in the history
  6. scx: Replace /sys/kernel/debug/sched/ext with tools/sched_ext/scx_sho…

    …w_state.py
    
    Now that the state is visible through /sys/kernel/sched_ext,
    /sys/kernel/debug/sched/ext isn't needed to determine the current state of
    scx. However, /sys/kernel/sched_ext shows only a subset of information that
    was available in the debug interface and it can be useful to have access to
    the rest for debugging. Remove /sys/kernel/debug/sched/ext and add the drgn
    script, tools/sched_ext/scx_show_state.py, which shows the same information.
    
    Signed-off-by: Tejun Heo <[email protected]>
    htejun committed Jan 18, 2024
    Configuration menu
    Copy the full SHA
    a1392ed View commit details
    Browse the repository at this point in the history
  7. Merge pull request #122 from sched-ext/htejun

    scx: Replace /sys/kernel/debug/sched/ext with /sys/kernel/sched_ext
    Byte-Lab authored Jan 18, 2024
    Configuration menu
    Copy the full SHA
    cdcdf18 View commit details
    Browse the repository at this point in the history

Commits on Jan 19, 2024

  1. Merge pull request #121 from ThinkerYzu/maybe_null

    Make the pionter passing to .dispatch MAYBE_NULL
    htejun authored Jan 19, 2024
    Configuration menu
    Copy the full SHA
    b1a0f3e View commit details
    Browse the repository at this point in the history

Commits on Jan 20, 2024

  1. scx: Fix a couple follow ups to recent struct_ops changes

    - Fix a few typos and some comment formatting in ext.c
    - Generalize the rule for compiling a "fail" testcase variant in
      seltests
    - Update copyrights to 2024
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Jan 20, 2024
    Configuration menu
    Copy the full SHA
    a141212 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #123 from sched-ext/structops_follow_ups

    scx: Fix a couple follow ups to recent struct_ops changes
    Byte-Lab authored Jan 20, 2024
    Configuration menu
    Copy the full SHA
    30b6fa8 View commit details
    Browse the repository at this point in the history

Commits on Jan 23, 2024

  1. Merge remote-tracking branch 'sched-ext/sched_ext' into scx_merge

    Conflicts:
    	include/linux/sched.h
    	kernel/bpf/verifier.c
    	kernel/cgroup/cgroup.c
    	kernel/sched/core.c
    
    Also had to add CFI stubs and kfunc annotations to ext.c, as well as
    remove use of strlcpy().
    
    Signed-off-by: David Vernet <[email protected]>
    Byte-Lab committed Jan 23, 2024
    Configuration menu
    Copy the full SHA
    50f8db4 View commit details
    Browse the repository at this point in the history