Replies: 3 comments 3 replies
-
What I have done in the past is implement a future that wraps the spawned task and tracks the data. You can use that to instrument waking and track the lag between a wake notification and poll. If needed, I can sketch it up. |
Beta Was this translation helpful? Give feedback.
-
With help from @Darksonn, I started measuring slow poll times in our app with a wrapper like this: use std::future::Future;
use std::pin::Pin;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::task::{Context, Poll};
use std::time::Instant;
const THRESHOLD: u128 = 5000 * 1000;
pub struct Measured<F> {
fut: F,
label: &'static str,
threshold: u128,
count: AtomicUsize,
}
impl<F> Measured<F> {
pub fn wrap(fut: F, label: &'static str) -> Self {
Self {
fut,
label,
threshold: THRESHOLD,
count: AtomicUsize::new(0),
}
}
pub fn into_inner(self) -> F {
self.fut
}
pub fn wrap_t(fut: F, label: &'static str, threshold: u128) -> Self {
Self {
fut,
label,
threshold,
count: AtomicUsize::new(0),
}
}
}
impl<F: Future> Future for Measured<F> {
type Output = F::Output;
fn poll(self: Pin<&mut Self>, ctx: &mut Context<'_>) -> Poll<F::Output> {
let label = self.label.clone();
let t = self.threshold;
let count = self.count.fetch_add(1, Ordering::SeqCst);
let fut = unsafe { Pin::map_unchecked_mut(self, |me| &mut me.fut) };
let start = Instant::now();
let res = fut.poll(ctx);
let measured = start.elapsed();
if measured.as_nanos() > t {
warn!({nanos = measured.as_nanos() as u64, label, count = count},
"future took a long time!"
);
}
res
}
} This helped us figure out the slowest ones. It helped to put a few of them (like TLS handshake, and some fs IO) in However, I'm still getting many "slow polls" that take over 5-10ms to complete. Even locally in an environment where there is virtually no load. I compiled our app in release mode and I've been able to trigger some slow polls. The biggest slowness I've identified if when accepting connections. If I run a load testing tool with keepalives (and connection reuses), then the slow polls only log when the initial connections are created. If I disable keepalives, they log throughout the load test. Of course, it's mostly the polls on futures happening while we accept a connection, TLS handshake the connection, etc. that are slow. I don't understand why though. For example, here's a rough structure (with too much useless details) of what's happening during TLS handshake, it appears to consistently show up as slow poll logs:
This is fairly simple and all "measured" for slow polls. I don't think any of this should be slow. The top-level future triggers often (with 10+ms polls), but none of its children trigger much, this is making me unsure where to look next. Putting the tls handshake in a spawn_blocking really helped, but we're still seeing 80+ms event loop "lag" (99th percentile) from our measurement. This is the logs I get when sending ~1200 requests in 10s (with a concurrency of 60). Each request creates a new connection (keepalives disabled) which our app needs to the accept.
(the many Any ideas what I should be measuring next? What to look for? Would using the |
Beta Was this translation helpful? Give feedback.
-
I tried minimally reproducing this and I was able to with the following code: // main.rs
use hyper::service::{make_service_fn, service_fn};
use hyper::{Body, Request, Response, Server};
use std::{convert::Infallible, net::SocketAddr};
mod future;
async fn handle(_: Request<Body>) -> Result<Response<Body>, Infallible> {
future::Measured::wrap_t(
async move { Ok(Response::new("Hello, World!".into())) },
"handle",
1000 * 1000,
)
.await
}
#[tokio::main(core_threads = 2)]
async fn main() {
let addr = SocketAddr::from(([127, 0, 0, 1], 3214));
let make_svc = make_service_fn(|_conn| async { Ok::<_, Infallible>(service_fn(handle)) });
let server = Server::bind(&addr).serve(make_svc);
if let Err(e) = future::Measured::wrap(server, "server").await {
eprintln!("server error: {}", e);
}
} // future.rs
use std::future::Future;
use std::pin::Pin;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::task::{Context, Poll};
use std::time::Instant;
const THRESHOLD: u128 = 5000 * 1000;
pub struct Measured<F> {
fut: F,
label: &'static str,
threshold: u128,
count: AtomicUsize,
}
impl<F> Measured<F> {
pub fn wrap(fut: F, label: &'static str) -> Self {
Self {
fut,
label,
threshold: THRESHOLD,
count: AtomicUsize::new(0),
}
}
pub fn into_inner(self) -> F {
self.fut
}
pub fn wrap_t(fut: F, label: &'static str, threshold: u128) -> Self {
Self {
fut,
label,
threshold,
count: AtomicUsize::new(0),
}
}
}
impl<F: Future> Future for Measured<F> {
type Output = F::Output;
fn poll(self: Pin<&mut Self>, ctx: &mut Context<'_>) -> Poll<F::Output> {
let label = self.label.clone();
let t = self.threshold;
let count = self.count.fetch_add(1, Ordering::SeqCst);
let fut = unsafe { Pin::map_unchecked_mut(self, |me| &mut me.fut) };
let start = Instant::now();
let res = fut.poll(ctx);
let measured = start.elapsed();
if measured.as_nanos() > t {
eprintln!(
"future took a long time! nanos = {}, label = {}",
measured.as_nanos(),
label
);
}
res
}
} I then blasted the hyper server with this:
I started seeing the following logs:
I was mostly trying to get an idea of a baseline for polling latency. |
Beta Was this translation helpful? Give feedback.
-
I was wondering if there's a way to measure the event loop lag. I've been discussing it in the tokio Discord, but I figured this might be useful here.
Our current implementation for somewhat measuring lag on the event loop is this:
We had considered (and it's also been suggested) running this every 1-2 seconds instead of every 5 and recording the lag as a histogram. We're storing this in a prometheus cluster.
It's also been suggested that this is inaccurate because of the work-stealing nature of the multi-threaded event loop. That is to say: one thread might have lag in processing tasks while others might be fast.
Even with these inaccuracies, I've been able to sometimes measure a lag of up to 190ms on our event loop. That seems enormous. For context: we're running a reverse-proxy, based on hyper. We handshake TLS a few times per second and we cache some stuff on disk (we use tokio's fs async API for those). We also reload a configuration (at most once per second), which reads a file and parses 5MB of json with serde.
We're interested in measuring the lag precisely because we're seeing some weird behaviour from time to time (hyper bodies returning very slowly). That's a story for another discussion.
It would also be interesting to measure the number of tasks currently running / pending / etc. on the event loop, to fully troubleshoot this slowness.
Beta Was this translation helpful? Give feedback.
All reactions