Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scx_layered: Consume from local LLCs for dispatch #919

Merged
merged 3 commits into from
Nov 15, 2024

Conversation

hodgesds
Copy link
Contributor

When dispatching consume from DSQs in the local LLC first before trying remote DSQs. This should still be fair as the layer iteration order will be maintained.

Copy link
Contributor

@likewhatevs likewhatevs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hodgesds
Copy link
Contributor Author

hodgesds commented Nov 11, 2024

Some stress results, not 100% sure if the llc-affinity test is the best benchmark.

main branch:

$ stress-ng -c 175 -t 32 -M 
stress-ng: info:  [707913] setting to a 32 secs run per stressor
stress-ng: info:  [707913] dispatching hogs: 175 cpu
stress-ng: metrc: [707913] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [707913]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [707913] cpu             5669917     32.00   5582.11      0.25    177174.78        1015.68        99.68          4224
stress-ng: info:  [707913] skipped: 0
stress-ng: info:  [707913] passed: 175: cpu (175)
stress-ng: info:  [707913] failed: 0
stress-ng: info:  [707913] metrics untrustworthy: 0
stress-ng: info:  [707913] successful run completed in 32.03 secs
$ stress-ng --llc-affinity 175 -t 32 -M
stress-ng: info:  [766348] setting to a 32 secs run per stressor
stress-ng: info:  [766348] dispatching hogs: 175 llc-affinity
stress-ng: info:  [766352] llc-affinity: using LLC cache size of 16384K
stress-ng: metrc: [766348] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [766348]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [766348] llc-affinity       1050     32.04   3894.67    146.29        32.77           0.26        72.07         18304
stress-ng: metrc: [766348] miscellaneous metrics:
stress-ng: metrc: [766348] llc-affinity        1405.43 MB pec sec memory write rate (harmonic mean of 175 instances)
stress-ng: metrc: [766348] llc-affinity        1357.90 MB per sec memory read rate (harmonic mean of 175 instances)
stress-ng: metrc: [766348] llc-affinity          31.10 CPU affinity changes per sec (harmonic mean of 175 instances)
stress-ng: info:  [766348] skipped: 0
stress-ng: info:  [766348] passed: 175: llc-affinity (175)
stress-ng: info:  [766348] failed: 0
stress-ng: info:  [766348] metrics untrustworthy: 0
stress-ng: info:  [766348] successful run completed in 32.10 secs
$ stress-ng --cache 175 --cache-enable-all  -M -t 32
stress-ng: info:  [872189] setting to a 32 secs run per stressor
stress-ng: info:  [872189] dispatching hogs: 175 cache
stress-ng: info:  [872190] cache: cldemote is not available, ignoring this option
stress-ng: info:  [872190] cache: cache flags used: prefetch clflush fence sfence clflushopt clwb
stress-ng: metrc: [872189] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [872189]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [872189] cache           8625597     32.03   2423.43      0.70    269286.86        3558.22        43.25         16896
stress-ng: metrc: [872189] miscellaneous metrics:
stress-ng: metrc: [872189] cache            1263481.16 cache ops per second (harmonic mean of 175 instances)
stress-ng: metrc: [872189] cache         1057726110.56 shared cache reads per second (harmonic mean of 175 instances)
stress-ng: metrc: [872189] cache         1407460135.42 shared cache writes per second (harmonic mean of 175 instances)
stress-ng: info:  [872189] skipped: 0
stress-ng: info:  [872189] passed: 175: cache (175)
stress-ng: info:  [872189] failed: 0
stress-ng: info:  [872189] metrics untrustworthy: 0
stress-ng: info:  [872189] successful run completed in 32.06 secs

PR:

$ stress-ng -c 175 -t 32 -M 
stress-ng: info:  [725451] setting to a 32 secs run per stressor
stress-ng: info:  [725451] dispatching hogs: 175 cpu
stress-ng: metrc: [725451] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [725451]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [725451] cpu             5670612     32.00   5582.23      0.22    177196.07        1015.79        99.68          4224
stress-ng: info:  [725451] skipped: 0
stress-ng: info:  [725451] passed: 175: cpu (175)
stress-ng: info:  [725451] failed: 0
stress-ng: info:  [725451] metrics untrustworthy: 0
stress-ng: info:  [725451] successful run completed in 32.03 secs
$ stress-ng --llc-affinity 175 -t 32 -M
stress-ng: info:  [756667] setting to a 32 secs run per stressor
stress-ng: info:  [756667] dispatching hogs: 175 llc-affinity
stress-ng: info:  [756668] llc-affinity: using LLC cache size of 16384K
stress-ng: metrc: [756667] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [756667]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [756667] llc-affinity       1059     32.04   3811.80    146.25        33.05           0.27        70.59         18304
stress-ng: metrc: [756667] miscellaneous metrics:
stress-ng: metrc: [756667] llc-affinity        1462.30 MB pec sec memory write rate (harmonic mean of 175 instances)
stress-ng: metrc: [756667] llc-affinity        1406.11 MB per sec memory read rate (harmonic mean of 175 instances)
stress-ng: metrc: [756667] llc-affinity          31.51 CPU affinity changes per sec (harmonic mean of 175 instances)
stress-ng: info:  [756667] skipped: 0
stress-ng: info:  [756667] passed: 175: llc-affinity (175)
stress-ng: info:  [756667] failed: 0
stress-ng: info:  [756667] metrics untrustworthy: 0
stress-ng: info:  [756667] successful run completed in 32.12 secs
$ $ sudo stress-ng --cache 175 --cache-enable-all  -M -t 32
stress-ng: info:  [880556] setting to a 32 secs run per stressor
stress-ng: info:  [880556] dispatching hogs: 175 cache
stress-ng: info:  [880563] cache: cldemote is not available, ignoring this option
stress-ng: info:  [880563] cache: cache flags used: prefetch clflush fence sfence clflushopt clwb
stress-ng: metrc: [880556] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [880556]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [880556] cache           8625576     32.03   2420.20      0.73    269304.05        3562.93        43.19         16896
stress-ng: metrc: [880556] miscellaneous metrics:
stress-ng: metrc: [880556] cache            1301351.51 cache ops per second (harmonic mean of 175 instances)
stress-ng: metrc: [880556] cache         1351852226.88 shared cache reads per second (harmonic mean of 175 instances)
stress-ng: metrc: [880556] cache         1390292086.63 shared cache writes per second (harmonic mean of 175 instances)
stress-ng: info:  [880556] skipped: 0
stress-ng: info:  [880556] passed: 175: cache (175)
stress-ng: info:  [880556] failed: 0
stress-ng: info:  [880556] metrics untrustworthy: 0
stress-ng: info:  [880556] successful run completed in 32.06 secs

When dispatching consume from DSQs in the local LLC first before trying
remote DSQs. This should still be fair as the layer iteration order will
be maintained.

Signed-off-by: Daniel Hodges <[email protected]>
@hodgesds
Copy link
Contributor Author

hodgesds commented Nov 12, 2024

More tests on a partially saturated machine (80 instances, 176 CPUs)
TLDR:
cache ops/sec (max): 55496899.98 (main) vs 109916623.28 (PR) 1.98x improvement
cache reads/sec (max): 477780191.75 (main) vs 356949190.61 (PR) 0.75x reduction
cache writes/sec (max): 24614546.65 (main) vs 43032553.41 (PR) 1.75x improvement

main branch:

$ sudo stress-ng --cache-no-affinity --cache-level 3 --cache 80 -t 35 -M 
stress-ng: info:  [535188] setting to a 35 secs run per stressor
stress-ng: info:  [535188] dispatching hogs: 80 cache
stress-ng: info:  [535189] cache: cache flags used: none
stress-ng: info:  [535189] cache: use --cache-enable-all to enable all cache flags for heavier cache stressing
stress-ng: metrc: [535188] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [535188]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [535188] cache          90342995     35.01   1999.98      0.36   2580706.03       45164.00        71.43         16896
stress-ng: metrc: [535188] miscellaneous metrics:
stress-ng: metrc: [535188] cache           55496899.98 cache ops per second (harmonic mean of 80 instances)
stress-ng: metrc: [535188] cache          477780191.75 shared cache reads per second (harmonic mean of 80 instances)
stress-ng: metrc: [535188] cache           24388481.50 shared cache writes per second (harmonic mean of 80 instances)
stress-ng: info:  [535188] skipped: 0
stress-ng: info:  [535188] passed: 80: cache (80)
stress-ng: info:  [535188] failed: 0
stress-ng: info:  [535188] metrics untrustworthy: 0
stress-ng: info:  [535188] successful run completed in 37.31 secs
]$ sudo stress-ng --cache-no-affinity --cache-level 3 --cache 80 -t 35 -M 
stress-ng: info:  [537384] setting to a 35 secs run per stressor
stress-ng: info:  [537384] dispatching hogs: 80 cache
stress-ng: info:  [537408] cache: cache flags used: none
stress-ng: info:  [537408] cache: use --cache-enable-all to enable all cache flags for heavier cache stressing
stress-ng: metrc: [537384] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [537384]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [537384] cache          82070345     35.01   1717.47      0.27   2343979.90       47778.05        61.32         16896
stress-ng: metrc: [537384] miscellaneous metrics:
stress-ng: metrc: [537384] cache           43714815.35 cache ops per second (harmonic mean of 80 instances)
stress-ng: metrc: [537384] cache          423490707.01 shared cache reads per second (harmonic mean of 80 instances)
stress-ng: metrc: [537384] cache           23517242.81 shared cache writes per second (harmonic mean of 80 instances)
stress-ng: info:  [537384] skipped: 0
stress-ng: info:  [537384] passed: 80: cache (80)
stress-ng: info:  [537384] failed: 0
stress-ng: info:  [537384] metrics untrustworthy: 0
stress-ng: info:  [537384] successful run completed in 35.78 secs
$ sudo stress-ng --cache-no-affinity --cache-level 3 --cache 80 -t 35 -M 
stress-ng: info:  [541929] setting to a 35 secs run per stressor
stress-ng: info:  [541929] dispatching hogs: 80 cache
stress-ng: info:  [541930] cache: cache flags used: none
stress-ng: info:  [541930] cache: use --cache-enable-all to enable all cache flags for heavier cache stressing
stress-ng: metrc: [541929] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [541929]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [541929] cache          85706319     35.01   1950.65      0.33   2448015.37       43929.85        69.66         16896
stress-ng: metrc: [541929] miscellaneous metrics:
stress-ng: metrc: [541929] cache           52360653.33 cache ops per second (harmonic mean of 80 instances)
stress-ng: metrc: [541929] cache          448703556.35 shared cache reads per second (harmonic mean of 80 instances)
stress-ng: metrc: [541929] cache           24614546.65 shared cache writes per second (harmonic mean of 80 instances)
stress-ng: info:  [541929] skipped: 0
stress-ng: info:  [541929] passed: 80: cache (80)
stress-ng: info:  [541929] failed: 0
stress-ng: info:  [541929] metrics untrustworthy: 0
stress-ng: info:  [541929] successful run completed in 35.37 secs

PR:

$ sudo stress-ng --cache-no-affinity --cache-level 3 --cache 80 -t 35 -M 
stress-ng: info:  [553389] setting to a 35 secs run per stressor
stress-ng: info:  [553389] dispatching hogs: 80 cache
stress-ng: info:  [553390] cache: cache flags used: none
stress-ng: info:  [553390] cache: use --cache-enable-all to enable all cache flags for heavier cache stressing
stress-ng: metrc: [553389] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [553389]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [553389] cache         131989009     35.00   2793.68      0.68   3771112.66       47234.16        99.80         16896
stress-ng: metrc: [553389] miscellaneous metrics:
stress-ng: metrc: [553389] cache          108346730.27 cache ops per second (harmonic mean of 80 instances)
stress-ng: metrc: [553389] cache          356949190.61 shared cache reads per second (harmonic mean of 80 instances)
stress-ng: metrc: [553389] cache           40803440.87 shared cache writes per second (harmonic mean of 80 instances)
stress-ng: info:  [553389] skipped: 0
stress-ng: info:  [553389] passed: 80: cache (80)
stress-ng: info:  [553389] failed: 0
stress-ng: info:  [553389] metrics untrustworthy: 0
stress-ng: info:  [553389] successful run completed in 35.02 secs
$ sudo stress-ng --cache-no-affinity --cache-level 3 --cache 80 -t 35 -M 
stress-ng: info:  [559376] setting to a 35 secs run per stressor
stress-ng: info:  [559376] dispatching hogs: 80 cache
stress-ng: info:  [559380] cache: cache flags used: none
stress-ng: info:  [559380] cache: use --cache-enable-all to enable all cache flags for heavier cache stressing
stress-ng: metrc: [559376] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [559376]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [559376] cache         136236026     35.00   2794.04      0.63   3892456.97       48748.61        99.81         16896
stress-ng: metrc: [559376] miscellaneous metrics:
stress-ng: metrc: [559376] cache          109916623.28 cache ops per second (harmonic mean of 80 instances)
stress-ng: metrc: [559376] cache          352940483.83 shared cache reads per second (harmonic mean of 80 instances)
stress-ng: metrc: [559376] cache           43032553.41 shared cache writes per second (harmonic mean of 80 instances)
stress-ng: info:  [559376] skipped: 0
stress-ng: info:  [559376] passed: 80: cache (80)
stress-ng: info:  [559376] failed: 0
stress-ng: info:  [559376] metrics untrustworthy: 0
stress-ng: info:  [559376] successful run completed in 35.02 secs
$ sudo stress-ng --cache-no-affinity --cache-level 3 --cache 80 -t 35 -M 
stress-ng: info:  [566322] setting to a 35 secs run per stressor
stress-ng: info:  [566322] dispatching hogs: 80 cache
stress-ng: info:  [566323] cache: cache flags used: none
stress-ng: info:  [566323] cache: use --cache-enable-all to enable all cache flags for heavier cache stressing
stress-ng: metrc: [566322] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
stress-ng: metrc: [566322]                           (secs)    (secs)    (secs)   (real time) (usr+sys time) instance (%)          (KB)
stress-ng: metrc: [566322] cache         133128101     35.00   2792.18      0.70   3803657.29       47666.93        99.75         16896
stress-ng: metrc: [566322] miscellaneous metrics:
stress-ng: metrc: [566322] cache          107637140.48 cache ops per second (harmonic mean of 80 instances)
stress-ng: metrc: [566322] cache          342616850.44 shared cache reads per second (harmonic mean of 80 instances)
stress-ng: metrc: [566322] cache           41565459.58 shared cache writes per second (harmonic mean of 80 instances)
stress-ng: info:  [566322] skipped: 0
stress-ng: info:  [566322] passed: 80: cache (80)
stress-ng: info:  [566322] failed: 0
stress-ng: info:  [566322] metrics untrustworthy: 0
stress-ng: info:  [566322] successful run completed in 35.01 secs

These tests were run with the following config:

[
  {
    "name": "hodgesd",
    "comment": "hodgesd user",
    "matches": [
      [
        {
          "UIDEquals": 224791
        }
      ]
    ],
    "kind": {
      "Confined": {
        "slice_us": 800,
        "util_range": [
          0.25,
          0.6
        ],
        "growth_algo": "Sticky",
        "preempt": false,
        "preempt_first": false,
        "exclusive": false
      }
    }
  },
  {
    "name": "stress-ng",
    "comment": "stress-ng slice",
    "matches": [
      [
        {
          "CommPrefix": "stress-ng"
        }
      ],
      [
        {
          "PcommPrefix": "stress-ng"
        }
      ]
    ],
    "kind": {
      "Confined": {
        "util_range": [
          0.60,
          0.80
        ],
        "slice_us": 1000,
        "preempt": false,
        "idle_smt": true,
        "preempt_first": true,
        "growth_algo": "Topo",
        "exclusive": false
      }
    }
  },
  {
    "name": "normal",
    "comment": "the rest",
    "matches": [
      []
    ],
    "kind": {
      "Grouped": {
        "util_range": [
          0.25,
          0.6
        ],
        "preempt": false,
        "weight": 500,
        "preempt_first": false,
        "slice_us": 800,
        "exclusive": false,
        "growth_algo": "Sticky"
      }
    }
  }
]

Copy link
Contributor

@JakeHillion JakeHillion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline I'm not sure these test results are consistent, and I had some issues running them for longer on main. However the change makes sense!

Copy link
Contributor

@JakeHillion JakeHillion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NACK to the flag without good reason. Normally I want flags for most changes, but in this case the change makes logical sense and doesn't significantly change our dispatching. The current implication of duplicating the entire loops is not good for code readability/maintenance at all.

If the flag is necessary (we can't settle on a single behaviour), can we implement it by doing divmod in the loop instead? This gives the same optionality without duplicating the bodies.

@hodgesds
Copy link
Contributor Author

for code readability/maintenance at all.

If the flag is necessary (we can't settle on a single behaviour), can we implement it by doing divmod in the loop instead? This gives the same optionality without duplicating the bodies.

I mostly want the flag for doing more testing on actual workloads. It's bothersome to have to rebuild different versions of the scheduler to test the differences in performance and synthetic benchmarks only go so far in testing.

@JakeHillion
Copy link
Contributor

If the flag is necessary (we can't settle on a single behaviour), can we implement it by doing divmod in the loop instead? This gives the same optionality without duplicating the bodies.

I mostly want the flag for doing more testing on actual workloads. It's bothersome to have to rebuild different versions of the scheduler to test the differences in performance and synthetic benchmarks only go so far in testing.

Alternatively, could you pull out the body from both loops into a static inline function and then have the if around the two loops? It'll mean doing the if checks multiple times unnecessarily, but should make it way easier to maintain the code (and we were doing those unnecessary checks until this week anyway).

@likewhatevs
Copy link
Contributor

I mostly want the flag for doing more testing on actual workloads. It's bothersome to have to rebuild different versions of the scheduler to test the differences in performance and synthetic benchmarks only go so far in testing.

Agreed, fan of more flags. It's that or more bisect/patch etc. to debug issues.

@hodgesds
Copy link
Contributor Author

Some results of doing 75 rounds of tests:
image
image

@hodgesds hodgesds added this pull request to the merge queue Nov 15, 2024
Merged via the queue into sched-ext:main with commit 79125ef Nov 15, 2024
23 checks passed
@hodgesds hodgesds deleted the layered-dispatch-local branch November 15, 2024 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants