Introduce SchedulingStateMachine for unified scheduler [NO-MERGE&REVIEW-ONLY-MODE] #35286

ryoqun · 2024-02-22T08:00:39Z

Problem

The unified scheduler is still single-threaded, because the most important piece of code is still missing: the scheduling code itself.

Summary of Changes

Implement it with cleanest way possible and most documented way possible. FINALLY, TIME HAS COME.

Note that there's more work to be done for general availability of unified scheduler after this pr, but it is arguably the most important PR.

numbers

steps

Download replaying-bench-mainnet-beta-2024-02.tar.zst: https://drive.google.com/file/d/1Jc1cd3pHKkaaT1yJrGeBAjBMJpkKS0hc/view?usp=sharing
untar
run solana-ledger-tool --ledger ./path/to/replaying-bench-mainnet-beta-2024-02 verify ...

(A) `--block-verification-method blockstore-processor` with this pr (reference):

[2024-02-22T15:24:06.957483120Z INFO  solana_ledger::blockstore_processor] ledger processed in 28 seconds, 198 ms, 761 µs and 24 ns. root slot is 249712363, 36 banks: 249712363, 249712364, 249712365, 249712366, 249712367, 249712368, 249712369, 249712370, 249712371, 249712372, 249712373, 249712374, 249712375, 249712378, 249712379, 249712380, 249712381, 249712382, 249712383, 249712384, 249712385, 249712386, 249712387, 249712388, 249712389, 249712390, 249712391, 249712392, 249712393, 249712394, 249712395, 249712396, 249712397, 249712398, 249712399, 249712400

(B) `--block-verification-method unified-scheduler` with this pr:

~1.3x faster than (A):

[2024-02-22T15:22:46.686416459Z INFO  solana_ledger::blockstore_processor] ledger processed in 20 seconds, 621 ms, 124 µs and 512 ns. root slot is 249712363, 36 banks: 249712363, 249712364, 249712365, 249712366, 249712367, 249712368, 249712369, 249712370, 249712371, 249712372, 249712373, 249712374, 249712375, 249712378, 249712379, 249712380, 249712381, 249712382, 249712383, 249712384, 249712385, 249712386, 249712387, 249712388, 249712389, 249712390, 249712391, 249712392, 249712393, 249712394, 249712395, 249712396, 249712397, 249712398, 249712399, 249712400

(C) (fyi) #33070 (all extra opt is applied):

~1.8x faster than (A), ~1.3x faster than (B):

[2024-02-22T15:16:51.304579110Z INFO  solana_ledger::blockstore_processor] ledger processed in 15 seconds, 399 ms, 459 µs and 572 ns. root slot is 249712363, 36 banks: 249712363, 249712364, 249712365, 249712366, 249712367, 249712368, 249712369, 249712370, 249712371, 249712372, 249712373, 249712374, 249712375, 249712378, 249712379, 249712380, 249712381, 249712382, 249712383, 249712384, 249712385, 249712386, 249712387, 249712388, 249712389, 249712390, 249712391, 249712392, 249712393, 249712394, 249712395, 249712396, 249712397, 249712398, 249712399, 249712400

context

extracted from #33070

ryoqun · 2024-02-22T15:30:27Z

unified-scheduler-pool/src/lib.rs

@@ -1023,7 +1062,7 @@ mod tests {
                .result,
            Ok(_)
        );
-        scheduler.schedule_execution(&(good_tx_after_bad_tx, 0));
+        scheduler.schedule_execution(&(good_tx_after_bad_tx, 1));


this is due to task_index is started to be assert!()-ed....

codecov · 2024-02-22T16:03:44Z

Codecov Report

Attention: Patch coverage is 98.77384% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 81.6%. Comparing base (a780ffb) to head (c9d1648).
Report is 18 commits behind head on master.

❗ Current head c9d1648 differs from pull request most recent head d072efd. Consider uploading reports for the commit d072efd to get more accurate results

Additional details and impacted files

@@            Coverage Diff            @@
##           master   #35286     +/-   ##
=========================================
- Coverage    81.8%    81.6%   -0.2%     
=========================================
  Files         837      834      -3     
  Lines      225922   225589    -333     
=========================================
- Hits       184897   184206    -691     
- Misses      41025    41383    +358

apfitzge

Did an initial scan through the code, appreciate the well thought out comments! I will have to give this another pass, as well as getting through the tests.

unified-scheduler-logic/src/lib.rs

apfitzge · 2024-02-22T20:52:29Z

unified-scheduler-logic/src/lib.rs

+    fn default() -> Self {
+        Self {
+            usage: PageUsage::default(),
+            blocked_tasks: VecDeque::with_capacity(1024),


Could you explain how you chose this number? At first glance, it seems quite high as the default allocation for number of blocked-tasks; at least for current blocks on MNB we have maybe several hundred non-vote transactions. So this seems like a huge overkill to me since for replay at max we probably have a couple hundred blocked tasks.

good point. i temporarily reduced it and document about why: 5887a91

unified-scheduler-logic/src/lib.rs

ryoqun · 2024-02-23T14:28:17Z

@apfitzge thanks for reviewing. I'll later address them.

I've added benchmark steps in the pr description.

also, I'll share some on-cpu flame-graph by perf to make sure I'm not crazy. The charts was from different solana-ledger-tool invocations timed manually. so, only the proportion of the spent walltime is what it's comparable.

the scheduler thread of rebase pr

the schedule thread of this pr

some random remarks

SchedulingStateMachine::{de,}schedule_task() and crossbeam select is what it's absolutely worth to burn cycles.
the rebase pr doesn't have any extra overhead, very close to the ideal runtime profile of the scheduler thread
this pr has bunch of overhead by dropping the task in thread and accumulating the timings. (this is why i want to introduce the accumulator thread)

ryoqun · 2024-02-23T14:34:48Z

also, off-cpu flame graph shows there's no idling other then the expected crossbeam select:

as a quick refresher of this unusual off-cpu flame graph, you can see some lock-contension bolow (there's known one in solana-runtime, which is fixed at the rebase pr, but not yet in this pr):

apfitzge · 2024-02-23T18:34:51Z

I think I finally have a reasonably good understanding of how your scheduling works (much easier with trimmed down PR! ❤️ ).
As I'm thinking on this, I think this state-machine and the prio-graph I've used in the new scheduler for block-production are similar in many ways, although there are a few differences. Just gonna jot some thoughts here.

Similarities

Use FIFO fed order (in replay case this is record order, for production this is priority-order from a queue)
Keep track of access-kind per account in "use"
Pop unblocked txs

Differences

PG essentially creates a linked-list of transaction dependencies to only next transaction.
US keeps track of current access, and all blocked transactions/access type as an ordered list per transaction
Blocked tx resolve time: PG resolves at insert time, US resolves at deschedule time.
"use" differs since the US is used in a stateful manner, while PG is created on-demand
PG has a mechanism for allowing user-defined ordering for "unblocked" txs

Example

Think visual examples always serve better than words here. Let's take a really simple example with 3 conflicting write transactions.

Prio-graph will effectively store a linked list, edges from 1 to 2 to 3.

graph LR;
Tx1 --> Tx2 --> Tx3

US will initially store access of Tx1, and list w/ order (Tx2, Tx3). Only when Tx1 is descheduled will the conflict between 2 and 3 be "realized".

graph LR;
Tx1 --> Tx2["Tx2(0, w)"] & Tx3["Tx3(1, w)"];

Closing

Not sure anything I commented here has any actual value, just wanted to share my thoughts that these 2 approaches are remarkably similar but different in their implmentation details due to different uses (stateful vs lazy).

edit: I actually made a branch in prio-graph to use this AccessList (Page?) pattern instead of directly tracking edges. Interestingly I see cases where the old approach does significantly better, but also cases where an approach similiar to yours does significantly better - interesting to see criterion regressions of 500% and also improvements of 50% in same bench run 😆

I am somewhat curious to try plugging in prio-graph to your benches to see how it compares for exactly the same benches you have. Do you have any benches on just the scheduler logic?

unified-scheduler-logic/src/lib.rs

ryoqun · 2024-02-26T15:14:37Z

I think I finally have a reasonably good understanding of how your scheduling works (much easier with trimmed down PR! ❤️ ).
As I'm thinking on this, I think this state-machine and the prio-graph I've used in the new scheduler for block-production are similar in many ways, although there are a few differences. Just gonna jot some thoughts here.
....

thanks for great comparison write-up. I'll check out prio-graph impl later with those good guides in mind. :) (EDIT: I wrote my thoughts here: #35286 (comment))

I am somewhat curious to try plugging in prio-graph to your benches to see how it compares for exactly the same benches you have. Do you have any benches on just the scheduler logic?

yeah: #33070 (comment)

my dev box:

bench_tx_throughput     time:   [57.525 ms 58.245 ms 58.958 ms]
                        change: [-1.3868% -0.1358% +1.2895%] (p = 0.84 > 0.05)
                        No change in performance detected.

this is processing 60000 txes with 100 accounts (and 50% conflicts). 970ns per this arb-like (= heavy) tx.

and this is the basis for 100ns in the doc comment (assuming normal tx should touch ~10 accounts; 1/10 of benched tx).

that said, I have very dirty changes (ryoqun#17 (comment)) with this results:

bench_tx_throughput     time:   [33.742 ms 34.205 ms 34.665 ms]
                        change: [+0.4371% +2.6652% +4.9342%] (p = 0.02 < 0.05)
                        Change within noise threshold.

That's ~1.7x faster. and 570ns per tx.

apfitzge · 2024-02-27T17:05:38Z

unified-scheduler-logic/src/lib.rs

+    /// Closure (`page_loader`) is used to delegate the (possibly multi-threaded)
+    /// implementation of [`Page`] look-up by [`pubkey`](Pubkey) to callers. It's the caller's
+    /// responsibility to ensure the same instance is returned from the closure, given a particular
+    /// pubkey.


I am probably missing some context, but I am not following the purpose of this closure, or why the AddressBook is not owned by the internal scheduler.
When would this ever do something other than just loading from AddressBook, which can already be done in a multi-threaded way.

hehe, nice point. hope this clarifies why i don't add dashmap to solana-unified-scheduler-logic's Cargo.toml...: 821dee2

ryoqun · 2024-02-28T15:37:14Z

sans the big rename Page => UsageQueue and my take on comparison with prio-graph, i think i should have addressed all review comments so far. so, I'm requesting another review round. not sure you did go in depth for its algo code and my unsafe-related bla bla talks.. lol

runtime/src/bank.rs

apfitzge · 2024-02-28T18:50:55Z

unified-scheduler-logic/src/lib.rs

+/// Scheduler's internal data for each address ([`Pubkey`](`solana_sdk::pubkey::Pubkey`)). Very
+/// opaque wrapper type; no methods just with [`::clone()`](Clone::clone) and
+/// [`::default()`](Default::default).
+#[derive(Debug, Clone, Default)]
+pub struct Page(Arc<TokenCell<PageInner>>);
+const_assert_eq!(mem::size_of::<Page>(), 8);


Arc has a runtime cost for cloning so I'm actually a bit surprised to see it used here for such a critical item - though perhaps it allows a simpler implementation. Do you know how much of your insertion time is spent doing Arc-clone? I imagine its' small relative to the Pubkey hashing?

Still, I wonder if you have thought about or have maybe even tested using an indexable storage for the inner pages, and then using index-references as the stored kind on the tasks, to avoid Arc-cloning.

To be clear I'm saying some sort of set up like:

pubkey_to_index: DashMap<Pubkey, usize>, // not sure if dashmap or locked hashmap? Just some translation that maps Pubkey -> index. inner_pages: Vec<Option<InnerPage>>, // use a fixed-capacity vector (or slice) so we never add more pages past some pre-determined limit

I don't want to block on this, just sharing my thought on perf here. It's almost certainly a micro-optimization; but in this sensitive code, any small bit can help!

Arc has a runtime cost for cloning so I'm actually a bit surprised to see it used here for such a critical item - though perhaps it allows a simpler implementation.

Yeah, I used Arc here for simplicity and to ship this pr into master asap (just surpassed the 1.5k loc... lol).

Do you know how much of your insertion time is spent doing Arc-clone?

not directly. but, I can indirectly guess it'll take 2-3ns per Arc::clone() when the instance isn't contended from an unrelated bench.

I imagine its' small relative to the Pubkey hashing?

I haven't tested. but I presume yes. Even more, I'm planning to switch to Rc from Arc. in that case, it's definitely smaller than the hashing.

As I indirectly indicated earlier (#35286 (comment) and #35286 (comment)), that optimization will allow us to trim down to merely a single inc (without the lock prefix) in x86-64 asm terms. i.e. possibly auto-vectorizable plain old 1 ALU instruction. And, mem access is 1 pop with no branching. And there's no work to need to be done for insertion into the scheduling state machine as it's not relevant. If you're referring to the insertions happening with the batched cloning in the case of being blocked in try_lock_for_task(), index-references or Rc will need 1 mem copy of 1 word-size for each address no matter what.

Still, I wonder if you have thought about or have maybe even tested using an indexable storage for the inner pages, and then using index-references as the stored kind on the tasks, to avoid Arc-cloning.

To be clear I'm saying some sort of set up like:

...

Yeah, i've thought about it. But, I concluded this pr's approach should be faster with gut sense. the index approach introduces another pointer dereference indirection, which i'd like to avoid. However, not benchmarked it.

I don't want to block on this, just sharing my thought on perf here. It's almost certainly a micro-optimization; but in this sensitive code, any small bit can help!

Thanks for sharing your thoughts. happy to share mine. I hope I've squeezed as much as possible.

apfitzge · 2024-02-28T20:07:32Z

sans the big rename Page => UsageQueue and my take on comparison with prio-graph, i think i should have addressed all review comments so far. so, I'm requesting another review round. not sure you did go in depth for its algo code and my unsafe-related bla bla talks.. lol

I was still going to do another round with deep dive. With these big ones I always try to read comments and get high-level overview, then come back for a deeper dive.
I did look into the actually scheduling logic, which I didn't have too many comments on as it was quite familiar to my work, although done differently internally.

apfitzge

Another short round of comments - I still need to take another round to go through the tests.

apfitzge · 2024-02-28T19:12:49Z

unified-scheduler-pool/src/lib.rs

+                                let result_with_timings = result_with_timings.as_mut().unwrap();
+                                Self::accumulate_result_with_timings(result_with_timings, executed_task);
+                            },
+                            recv(dummy_receiver(state_machine.has_unblocked_task())) -> dummy => {


The dummy receiver pattern seems really odd to me, though I see how it is working (mostly? need to cargo expand since not 100% sure WHEN state_machine,has_unblocked_task() is resolved).

edit:
So the has_unblocked_task is resolved before receiver selection, which is what I was guessing. Even if we stick with this dummy-receiver pattern, I'd actually much prefer we make this very clear since I don't think it's a very obvious thing:

let dummy_receiver = dummy_receiver(state_machine.has_unblocked_task()); select!{ // ... }

All this in mind, I'm trying to figure out when we could have an unblocked task. We can only have an unblocked task if we just received a finished task, right?
Since on receiving new tasks, we'll immediately attempt to schedule and will schedule it if it is unblocked, that seems to be the case to me.

Why even have this recv for the never case that there is no unblocked task? That's just a wasted select, right? Let's say both our real channels have actual important data - a finished task and a new task. This macro could randomly (because current impl is always random) select this dummy receiver, which will just do nothing useful. At least my initial intuition is that this is wasted time, lmk what you think.

I know you had made a branch for a biased select - is the order here i.e. finished -> dummy -> new the biased order you want?
It could be overkill for the current MNB load we have, but I wonder if you considered just running a hot loop here if we are actively scheduling; not sure if using try_recvs in hot loop would cause more contention on the channels, you're certainly more familiar with crossbeam internals.
But basically I am asking about something along the lines of:

// Blocking recv for an OpenChannel to begin our scheduling session - no other messages can come in until then. let message = new_task_receiver.recv(); { // Code for initialization, checking it's actually OpenSubChannel. } while !is_finished { if let Ok(executed_task) = finished_task_receiver.try_recv() { // Handle the executed task. continue; // Move to next iteration without checking other channels. } if state_machine.has_unblocked_task() { // Handle unblocked task. continue; // Move to next iteration without checking for new task. } if let Ok(message) = new_task_receiver.try_recv() { // Handle new task message. } }

EDIT: I am an idiot - for some reason I thought never was always just disconnected...not it could never be selected. Anyway, my initial thought on the oddity of the pattern still stands. So not going to delete my comment, and still request your thoughts.

The dummy receiver pattern seems really odd to me, though I see how it is working

.........

Anyway, my initial thought on the oddity of the pattern still stands. So not going to delete my comment, and still request your thoughts.

Hope I explained the justification clearly: 89b7773. I think you can get used to it. this pattern is simplest and no redundant code and most performant. :)

I'd actually much prefer we make this very clear since I don't think it's a very obvious thing:

let dummy_receiver = dummy_receiver(state_machine.has_unblocked_task()); select!{ // ... }

This is done in the commit along with a comment. As I ranted in the comment, ideally, I want to move the dummy_receiver() invocation back into the select!...

I know you had made a branch for a biased select - is the order here i.e. finished -> dummy -> new the biased order you want?

Yes, this is correct.

It could be overkill for the current MNB load we have, but I wonder if you considered just running a hot loop here if we are actively scheduling; not sure if using try_recvs in hot loop would cause more contention on the channels, you're certainly more familiar with crossbeam internals.

As I explicit commented on in the commit 89b7773. it's too early to take the busy loop path and i haven't fully grokked the perf implication of busy looping. Casual tests showed an improved latency indeed, though.

But basically I am asking about something along the lines of:

...

By the way, busy looping can be done with select{,_biased}! pretty easily like this:

loop { select{_biased}! { recv(...) => ...., ..., default => continue, // there's special-cased the `default` selector in `select{,_biased}!` } ... if is_finished { break; } }

I'm hesitant with manual busy (or non-busy) select construction, considering I'm planning to adding new channel specially for blocked tasks to fast-lane to process them.

apfitzge · 2024-02-28T19:47:28Z

unified-scheduler-pool/src/lib.rs

-                        ))
-                        .unwrap();
-                    session_ending = false;
+                loop {


We never exit this loop except for panic? I think we may have already talked about this in a previous PR, and you're just planning on error-handling in a follow-up. Please correct me if I'm wrong.

We never exit this loop except for panic? I think we may have already talked about this in a previous PR, and you're just planning on error-handling in a follow-up.

This understanding is completely aligns with what i said previously. The proper scheduler management code is complicated, flavored to my own taste (expecting back-and-force review session), (i.e. yet another beast by itself). so, i wanted to focus on the logic with this pr.

unified-scheduler-pool/src/lib.rs

apfitzge · 2024-02-28T20:47:24Z

unified-scheduler-logic/src/lib.rs

+        /// behavior when used with [`Token`].
+        #[must_use]
+        pub(super) unsafe fn assume_exclusive_mutating_thread() -> Self {
+            thread_local! {


As a summary, I think Token and TokenCell were introduced because we have data within an Arc that we want to mutate, and these were added to give some additional safety constraints.

As far as I can tell, right now the mutations occur only in the scheduling thread so this seems fine. These token cells do not protect us against multiple threads accessing the same data if it were called from multiple threads - which requires some care from developers here. I am hopeful the naming of this function and unsafeness is probably sufficient warning!

WRT the ShortCounter Token-gating; the reason we can't or do not want to use an AtomicU32 is simply because we want to have checked operations to handle overflow explicitly?

WRT the ShortCounter Token-gating; the reason we can't or do not want to use an AtomicU32 is simply because we want to have checked operations to handle overflow explicitly?

AtomicU32 is slow, because it always emits atomic operations, necessitating the given cpu core to go to the global bus ;) by the way, std::cell::Cell<ShortCounter> can be used without runtime cost here, thanks to ShortCounter: Copy. But i wanted to dog-food TokenCell and to use TokenCell's more concise api than Cell, which is made possible due to its additional constraints than Cell.

Also, note that blocked_page_count will be removed for upcoming opt.

As a summary, ...

As far as I can tell, ...

also, this is correct. thanks for wrapping up your understanding.

ryoqun · 2024-02-29T06:44:01Z

Cargo.lock

@@ -7462,7 +7462,9 @@ dependencies = [
 name = "solana-unified-scheduler-logic"
 version = "1.19.0"
 dependencies = [
+ "assert_matches",


btw, this will soon be not needed as par the progress at: rust-lang/rust#82775

ryoqun · 2024-02-29T14:51:18Z

sans the big rename Page => UsageQueue and my take on comparison with prio-graph, i think i should have addressed all review comments so far. so, I'm requesting another review round. not sure you did go in depth for its algo code and my unsafe-related bla bla talks.. lol

I just finished the big renaming. still, i have yet to do the quoted homework of prio-graph. but, i think I've addressed all new review comments so far again. Always thanks for quite timely review!

willhickey · 2024-03-03T04:55:03Z

This repository is no longer in use. Please re-open this pull request in the agave repo: https://github.com/anza-xyz/agave

ryoqun · 2024-03-07T05:43:25Z

unified-scheduler-logic/src/lib.rs

+        /// Returns a mutable reference with its lifetime bound to the mutable reference of the
+        /// given token.
+        ///
+        /// In this way, any additional reborrow can never happen at the same time across all
+        /// instances of [`TokenCell<V>`] conceptually owned by the instance of [`Token<V>`] (a
+        /// particular thread), unless previous borrow is released. After the release, the used
+        /// singleton token should be free to be reused for reborrows.
+        pub(super) fn borrow_mut<'t>(&self, _token: &'t mut Token<V>) -> &'t mut V {
+            unsafe { &mut *self.0.get() }
+        }
+    }
+
+    // Safety: Once after a (`Send`-able) `TokenCell` is transferred to a thread from other
+    // threads, access to `TokenCell` is assumed to be only from the single thread by proper use of
+    // Token. Thereby, implementing `Sync` can be thought as safe and doing so is needed for the
+    // particular implementation pattern in the unified scheduler (multi-threaded off-loading).
+    //
+    // In other words, TokenCell is technically still `!Sync`. But there should be no
+    // legalized usage which depends on real `Sync` to avoid undefined behaviors.
+    unsafe impl<V> Sync for TokenCell<V> {}


as per #35286 (review):

However. I think we should get a 2nd opionion on all the unsafe code.

@alessandrod This mod contains bunch of unsafes and these highlighted two is the most important ones. I think I've extensively doc-ed it. Could you review whether my justification does make sense at all? I appointed you assuming you know Rust better than me from various memory mangling at vm. :) Happy to comment-in-source more to fill the inner workings of unified scheduler itself for general context. That said, feel free to defer to someone else. Thanks in advance.

Co-authored-by: Andrew Fitzgerald <[email protected]>

ryoqun · 2024-03-07T07:32:48Z

think i'm happy with the test cases, minus indexing ones (see comment).

thanks for in-depth look into tests. I've addressed all, requesting another review req. btw, please r+ on anza-xyz#129...

However. I think we should get a 2nd opionion on all the unsafe code.

kk. I've done this as well: #35286 (comment)

alessandrod

Finished my first pass at this. I really like the general structure and direction! Being completely new to this new scheduler code I've struggled with some of the terminology and I've left some nits about that. Also see the TokenCell issue.

unified-scheduler-logic/src/lib.rs

alessandrod · 2024-03-15T22:29:25Z

unified-scheduler-logic/src/lib.rs

+//! [`::schedule_task()`](SchedulingStateMachine::schedule_task) while maintaining the account
+//! readonly/writable lock rules. Those returned runnable tasks are guaranteed to be safe to
+//! execute in parallel. Lastly, `SchedulingStateMachine` should be notified about the completion
+//! of the exeuction via [`::deschedule_task()`](SchedulingStateMachine::deschedule_task), so that


what happens if the caller doesn't call deschedule_task()?

kitten dies. 😿 well, it's really bad condition, i haven't thought about in depth. it will easily lead to dead locks.

well, i wonder this edge case should be called out in this summary doc.. fyi, I explicit mentioned this case in the deschedule_task() doc comment: 250edde

unified-scheduler-logic/src/lib.rs

alessandrod · 2024-03-15T23:12:40Z

unified-scheduler-logic/src/lib.rs

+/// `solana-unified-scheduler-pool`.
+pub struct SchedulingStateMachine {
+    unblocked_task_queue: VecDeque<Task>,
+    active_task_count: ShortCounter,


comments on what these counters count would be helpful

I find it a bit confusing that this is called active_task_count, because it also
includes blocked tasks, which are arguably inactive? Maybe current_task_count?

hehe, i agree tastelessness of active here. hmm, current_task_count is a bit ambiguous. how about in_progress_task_count? (or not_(yet)_handled_task_count; don't like this much because of implication of being a counter of tasks which has been resolved not to process).

I find in_progress as confusing, since it implies that something is progressing while blocked tasks technically aren't progressing. @apfitzge as the native speaker, wdyt would be the best term here?

how about pending_task_count, or managed_task_count, then?

... or runnable_task_count?

alessandrod · 2024-03-16T08:27:23Z

unified-scheduler-logic/src/lib.rs

+
+            while let Some((requested_usage, task_with_unblocked_queue)) = unblocked_task_from_queue
+            {
+                if let Some(task) = task_with_unblocked_queue.try_unblock(&mut self.count_token) {


what happens to the task when try_unblock returns None?

another nice question. hope this helps your understanding: ee5c8b6

Can you explain how this works? usage_queue.unlock() pops the task from the usage queue. task.try_unblock() consumes self and returns None. How does the task reappear?

it's cloned here #35286 (comment) to all of blocked usage_queues.

I see. This is why I dislike hiding Arcs with type aliases :P

unified-scheduler-logic/src/lib.rs

alessandrod · 2024-03-16T09:14:33Z

unified-scheduler-logic/src/lib.rs

+        /// instances of [`TokenCell<V>`] conceptually owned by the instance of [`Token<V>`] (a
+        /// particular thread), unless previous borrow is released. After the release, the used
+        /// singleton token should be free to be reused for reborrows.
+        pub(super) fn borrow_mut<'t>(&self, _token: &'t mut Token<V>) -> &'t mut V {


I understand the intent, but this is akin to a transmute and can be made
segfault trivially

struct T { v: Vec<u8>, }; let mut token = unsafe { Token::<T>::assume_exclusive_mutating_thread() }; let c = { let cell = TokenCell::new(T { v: vec![42] }); cell.borrow_mut(&mut token) }; c.v.push(43);

@alessandrod ~~oh, quite nice catch.. hope this fixes segfault and no more ub?: 001b10e~~ (EDIT: force-pushed with updated fix: 02567b1)

seems i'm too short-sighted only with aliasing-based ub.. ;)

uggg, actually, i need with_borrow_mut().. I'll soon push this.

uggg, actually, i need with_borrow_mut().. I'll soon push this.

done: 02567b1

anza-team

Thank you for the contribution! This repository is no longer in use. Please re-open this pull request in the agave repo: https://github.com/anza-xyz/agave

ryoqun · 2024-03-18T08:09:25Z

I really like the general structure and direction!

❤️

Being completely new to this new scheduler code I've struggled with some of the terminology and I've left some nits about that.

these feedback is quite appreciated. I'll address in turn to fill the missing contexts.

Also see the TokenCell issue.

as this is the most important review comment, I've address it firstly, and requested review for it...

ryoqun · 2024-03-18T08:09:44Z

gg

anza-team

Thank you for the contribution! This repository is no longer in use. Please re-open this pull request in the agave repo: https://github.com/anza-xyz/agave

ryoqun force-pushed the scheduling-state-machine branch 3 times, most recently from 3a9a5b4 to 51d4dc8 Compare February 22, 2024 13:16

ryoqun changed the title ~~Introduce SchedulingStateMachine~~ Introduce SchedulingStateMachine for functional unified scheduler Feb 22, 2024

ryoqun changed the title ~~Introduce SchedulingStateMachine for functional unified scheduler~~ Introduce SchedulingStateMachine for unified scheduler Feb 22, 2024

ryoqun force-pushed the scheduling-state-machine branch 2 times, most recently from 2f92ee2 to 730afef Compare February 22, 2024 14:36

ryoqun requested a review from apfitzge February 22, 2024 15:28

ryoqun marked this pull request as ready for review February 22, 2024 15:28

ryoqun commented Feb 22, 2024

View reviewed changes

apfitzge reviewed Feb 22, 2024

View reviewed changes

apfitzge reviewed Feb 24, 2024

View reviewed changes

unified-scheduler-logic/src/lib.rs Outdated Show resolved Hide resolved

apfitzge reviewed Feb 27, 2024

View reviewed changes

ryoqun force-pushed the scheduling-state-machine branch from 1d99c44 to 83e514a Compare February 28, 2024 15:30

ryoqun requested a review from apfitzge February 28, 2024 15:35

ryoqun commented Feb 28, 2024

View reviewed changes

runtime/src/bank.rs Show resolved Hide resolved

apfitzge reviewed Feb 28, 2024

View reviewed changes

apfitzge self-requested a review February 28, 2024 21:00

ryoqun commented Feb 29, 2024

View reviewed changes

willhickey closed this Mar 3, 2024

ryoqun added 2 commits March 7, 2024 14:16

Ensure reinitialize() is maintained for new fields

737e473

Remove uneeded impl Send for TokenCell & doc upd.

4fcb360

ryoqun reopened this Mar 7, 2024

ryoqun force-pushed the scheduling-state-machine branch from f245874 to 4fcb360 Compare March 7, 2024 05:19

This was referenced Mar 7, 2024

Introduce SchedulingStateMachine for unified scheduler #35432

Closed

Introduce SchedulingStateMachine for unified scheduler anza-xyz/agave#129

Merged

ryoqun added the noCI Suppress CI on this Pull Request label Mar 7, 2024

ryoqun commented Mar 7, 2024

View reviewed changes

ryoqun requested a review from alessandrod March 7, 2024 05:43

ryoqun and others added 4 commits March 7, 2024 15:34

Apply typo fixes from code review

a0baa10

Co-authored-by: Andrew Fitzgerald <[email protected]>

Merge similar tests into one

2cbf64a

Remove test_debug

3b9fb2c

Remove assertions of task_index()

d072efd

ryoqun requested a review from apfitzge March 7, 2024 07:31

alessandrod reviewed Mar 16, 2024

View reviewed changes

Fix UB in TokenCell

001b10e

anza-team reviewed Mar 18, 2024

View reviewed changes

anza-team closed this Mar 18, 2024

ryoqun reopened this Mar 18, 2024

anza-team reviewed Mar 18, 2024

View reviewed changes

anza-team closed this Mar 18, 2024

alessandrod self-requested a review March 22, 2024 07:05

ryoqun mentioned this pull request Mar 23, 2024

New exec code path rebased rc opt ryoqun/solana#17

Open

ryoqun mentioned this pull request Apr 6, 2024

Prioritize blocked task messaging over idle tasks anza-xyz/agave#627

Merged

This was referenced May 8, 2024

program cache: reduce contention anza-xyz/agave#1192

Merged

Use mimalloc in attempt to reduce mem alloc perf. oddities anza-xyz/agave#1250

Open

ryoqun mentioned this pull request Oct 4, 2024

ReplayStage: No More Clone SanitizedTransaction anza-xyz/agave#3058

Merged

Introduce SchedulingStateMachine for unified scheduler [NO-MERGE&REVIEW-ONLY-MODE] #35286

Introduce SchedulingStateMachine for unified scheduler [NO-MERGE&REVIEW-ONLY-MODE] #35286

Conversation

ryoqun commented Feb 22, 2024 • edited Loading

Problem

Summary of Changes

numbers

steps

(A) --block-verification-method blockstore-processor with this pr (reference):

(B) --block-verification-method unified-scheduler with this pr:

(C) (fyi) #33070 (all extra opt is applied):

context

Choose a reason for hiding this comment

codecov bot commented Feb 22, 2024 • edited Loading

Codecov Report

apfitzge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryoqun commented Feb 23, 2024 • edited Loading

the scheduler thread of rebase pr

the schedule thread of this pr

some random remarks

ryoqun commented Feb 23, 2024

apfitzge commented Feb 23, 2024 • edited Loading

Similarities

Differences

Example

Closing

ryoqun commented Feb 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryoqun commented Feb 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryoqun Feb 29, 2024 • edited Loading

Choose a reason for hiding this comment

apfitzge commented Feb 28, 2024

apfitzge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryoqun Feb 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryoqun commented Feb 29, 2024

willhickey commented Mar 3, 2024

ryoqun Mar 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryoqun commented Mar 7, 2024 • edited Loading

alessandrod left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryoqun Mar 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alessandrod Mar 16, 2024 • edited by ryoqun Loading

Choose a reason for hiding this comment

ryoqun Mar 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anza-team left a comment

Choose a reason for hiding this comment

ryoqun commented Feb 22, 2024 •

edited

Loading

(A) `--block-verification-method blockstore-processor` with this pr (reference):

(B) `--block-verification-method unified-scheduler` with this pr:

codecov bot commented Feb 22, 2024 •

edited

Loading

ryoqun commented Feb 23, 2024 •

edited

Loading

apfitzge commented Feb 23, 2024 •

edited

Loading

ryoqun commented Feb 26, 2024 •

edited

Loading

ryoqun commented Feb 28, 2024 •

edited

Loading

ryoqun Feb 29, 2024 •

edited

Loading

ryoqun Feb 29, 2024 •

edited

Loading

ryoqun Mar 7, 2024 •

edited

Loading

ryoqun commented Mar 7, 2024 •

edited

Loading

ryoqun Mar 22, 2024 •

edited

Loading

alessandrod Mar 16, 2024 •

edited by ryoqun

Loading

ryoqun Mar 18, 2024 •

edited

Loading