-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove thread pool, switch to tokio #10
base: main
Are you sure you want to change the base?
Conversation
error!("report alive failed: '{}'", e); | ||
} | ||
select! { | ||
recv(signal) -> signal => { | ||
biased; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The select!
loop sets biased
to preserve the original behavior where each entry is attempted in order. https://docs.rs/tokio/latest/tokio/macro.select.html#fairness
|
||
use rand::{Rng, distributions}; | ||
|
||
use threadpool::ThreadPool; | ||
use async_channel::{bounded, Receiver, Sender}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tokio provides a mpsc
channel, but the code previously relied on using a mpmc channel (to allow cloning) so I added this crate. https://docs.rs/async-channel/latest/async_channel/
Err(ErrorKind::JobHandlerError(Box::new("a".parse::<i8>().unwrap_err())).into()) | ||
#[async_trait] | ||
pub trait JobHandler: Send + Sync { | ||
async fn perform(&self, job: &Job) -> JobHandlerResult; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msepga has a different approach using a BoxFuture
. That fixed the lifetime issues I had as of the first commit on this branch, but when integrating it into our own code, the futures generated by other crates caused compile errors because they aren't Sync
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle this should be equivalent to Box<dyn Sync + ...>
given that JobHandler
requires Sync
here, though we can merge as-is regardless 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean by JobFuture
-- there isn't a type like that around.
My understanding is that the BoxFuture
version and my attempt in the first commit was running into issues because it required that the returned futures be Sync
. This implementation only requires that the JobHandler
is Sync
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo, should be JobHandler
in my comment 🙂 So, BoxFuture
doesn't require the future to be Sync
, the linked implementation only required the handler closure to be Sync
:
With the definition of BoxFuture:
pub type BoxFuture<'a, T> = Pin<Box<dyn Future<Output = T> + Send + 'a, Global>>;
Looking at the type now:
Box<dyn Send + Sync + FnMut(Job) -> BoxFuture<'static, Result<JobSuccessType>>>
| The handler itself must: | | The BoxFuture must: |
| - impl Send | | - Be Send |
| - impl Sync | | - Be 'static |
| - Take a `Job` argument | | - Return Result<JobSuccessType> |
| - Return `BoxFuture<..>`----> | |
`--------------------------' | The BoxFuture *does not* have to be |
| Sync |
| |
`----------------------------------------'
Here's a playground link that illustrates how the handler can be sync without the returned future being sync. You'll notice that uncommenting line 25 will fail to compile, because the returned future doesn't implement Sync
, even if the handler is required to do so.
We don't have to let this block the merge BTW, I just figured I'd leave a footnote in case it could help clear up how Box<dyn ... + Fn>
is equivalent to the JobHandler
trait here.
src/worker.rs
Outdated
let mut result: Result<Option<(String, String)>> = Ok(None); | ||
task::block_in_place(|| { | ||
Handle::current().block_on(async { | ||
result = self.redis.brpop(queues, 10).await.map_err(From::from); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, this is the correct way to run blocking operations such that it doesn't block other async tasks from running. Note, we intentionally want to preserve this blocking brpop
because it allows workers to start working on new jobs as soon as they're available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that the server, and each worker individually has its own Redis connection. If they shared the same connection this wouldn't work properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll defer to @msepga on a detailed review, but conceptually this makes sense to me.
It became overly complicated since tokio cancels the futures for branches that aren't reached, and for the blocking Redis request that could mean jobs are lost in-transit over the network. Instead, simplify the loop to rely on the blocking operation to prevent the loop from executing too quickly. Also, reduce the blocking duration to 1 second so workers are able to respond to shutdown requests faster.
loop { | ||
select! { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided not to use tokio's select!
here since it cancels the future of any branch that wasn't executed. For the Redis brpop
that could mean the future is canceled after Redis has sent us data over the network but before it's been received (meaning that job would be lost). I did have a select!
implementation in this PR originally, but I removed it in favor of a simple loop because it's much more understandable this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Left a nit regarding br_pop
Err(ErrorKind::JobHandlerError(Box::new("a".parse::<i8>().unwrap_err())).into()) | ||
#[async_trait] | ||
pub trait JobHandler: Send + Sync { | ||
async fn perform(&self, job: &Job) -> JobHandlerResult; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle this should be equivalent to Box<dyn Sync + ...>
given that JobHandler
requires Sync
here, though we can merge as-is regardless 👍
@seanlinsley I happened to run across this by coincidence - should we merge this into |
I intentionally didn't merge this because this version is several times slower for our workload than the thread pool, and this PR doesn't maintain backwards compatibility. There are some people who have starred / are watching this repo so it's possible other people are relying on this code. |
Internally we use both tokio and async-std because of differing crate dependencies, and then additionally this crate runs its own thread pool. We've seen that removing async-std resolves an unbounded memory growth issue (related to TLS caching), but that introduced a significant performance regression. Hopefully moving sidekiq_server to tokio as well will allow tokio's cooperative task scheduling to work as intended. If not, we may still have to use
unconstrained
😅This PR removes the middleware layer as it significantly increased the number of compiler errors. It could certainly be reintroduced in the future, though we aren't using it internally at the moment.
In order to work around the many lifetime and thread safety errors that come from passing around async closures, I settled on a trait API that users of this crate can implement for every job they need to process. This happens to be very similar to the original Sidekiq library in Ruby, as each job is a class. In the future the job handlers could be given a
&mut self
so they can manage temporary state in their owned struct.When reviewing this, it's important to consider if the async tasks are being handled correctly for an environment where many jobs are running in parallel. Especially
run_queue_once
, which performs a blocking call to Redis.