Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RwLock.write() and Mutex.lock() hang forever #91

Open
nazar-pc opened this issue Sep 8, 2024 · 14 comments
Open

RwLock.write() and Mutex.lock() hang forever #91

nazar-pc opened this issue Sep 8, 2024 · 14 comments

Comments

@nazar-pc
Copy link

nazar-pc commented Sep 8, 2024

I have a piece of code where rwlock.write() hangs in one of the threads forever.

There are other threads that are calling both rwlock.write() and rwlock.read() on the same instance (behind Arc) and they work fine, it is just this single thread that is stuck.

I do not have a clean or minimal reproduction, but so far I am able to reproduce it in a larger app, even though it sometimes takes hours to do so.

Might be related to #80 (comment)

Using async-lock 3.4.0 and tokio 1.39.2.

Open for debugging suggestions, really not sure where to go next from here.

@nazar-pc
Copy link
Author

nazar-pc commented Sep 8, 2024

Reduced reproduction based on application behavior:

use async_lock::RwLock;
use std::future::Future;
use std::pin::pin;
use std::task::Poll;
use tokio::macros::support::poll_fn;

#[tokio::main]
async fn main() {
    let rw = RwLock::new(());

    let read_guard = rw.read().await;

    let mut write_a_fut = pin!(rw.write());
    let mut write_b_fut = pin!(rw.write());

    poll_fn(|ctx| {
        if !matches!(write_b_fut.as_mut().poll(ctx), Poll::Pending) {
            unreachable!("");
        }
        if !matches!(write_a_fut.as_mut().poll(ctx), Poll::Pending) {
            unreachable!("");
        }

        Poll::Ready(())
    })
    .await;

    drop(read_guard);

    write_a_fut.await;
}

Essentially I have two write lock attempts which I try to poll once and then even after read lock being dropped, write can't be acquired.

Note the order of write_a_fut and write_b_fut polling, when reversed it succeeds.

@nazar-pc
Copy link
Author

nazar-pc commented Sep 9, 2024

cc @taiki-e and @notgull since you were active on #80 and smol-rs/event-listener#123

@nazar-pc
Copy link
Author

Checked versions down to 2.3.0 and they are all also problematic in the same way

@nazar-pc nazar-pc changed the title RwLock.write() hangs forever RwLock.write() and Mutex.lock() hang forever Sep 14, 2024
@nazar-pc
Copy link
Author

The same happens with Mutex actually

@notgull
Copy link
Member

notgull commented Sep 14, 2024

Interesting! Let me investigate this.

@nazar-pc
Copy link
Author

Probably related to smol-rs/event-listener#143 as well, I have event-listener in the dependencies and even without async-lock I still have issues, so the root cause is likely in event-listener itself.

@nazar-pc
Copy link
Author

@notgull I can imagine you're probably busy as is, but in case you have an ETA when you might have time to look into it, it'd help me plan things as well. If not I'll understand that too.

@notgull
Copy link
Member

notgull commented Sep 19, 2024

@nazar-pc Unfortunately no, smol is a free-time project for me and I haven't had a lot of free time as of late.

@nazar-pc
Copy link
Author

I can do more debugging myself as well given some pointers. I tried RwLock first with step by step debugger and it had quite a few layers, I might give it another try with Mutex that should be more straightforward implementation-wise.

@nazar-pc
Copy link
Author

Here is a further reduced example with just event-listener:

use event_listener::Event;
use std::future::Future;
use std::pin::pin;
use std::task::Poll;
use tokio::macros::support::poll_fn;

#[tokio::main]
async fn main() {
    let event = Event::new();

    let mut listen_a = pin!(event.listen());
    let mut listen_b = pin!(event.listen());

    poll_fn(|ctx| {
        assert_eq!(listen_b.as_mut().poll(ctx), Poll::Pending);
        assert_eq!(listen_a.as_mut().poll(ctx), Poll::Pending);

        Poll::Ready(())
    })
    .await;

    drop(listen_a);

    event.notify(1);
    listen_b.await;
}

And clippy is happy to tell why:

warning: call to `std::mem::drop` with a value that does not implement `Drop`. Dropping such a type only extends its contained lifetimes
  --> main.rs:22:5
   |
22 |     drop(listen_a);
   |     ^^^^^^^^^^^^^^
   |
note: argument has type `std::pin::Pin<&mut event_listener::EventListener>`
  --> main.rs:22:10
   |
22 |     drop(listen_a);
   |          ^^^^^^^^
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#drop_non_drop
   = note: `#[warn(clippy::drop_non_drop)]` on by default

There should be impl Drop for EventListener that will unregister listener for things to get un-stuck.

@nazar-pc
Copy link
Author

nazar-pc commented Sep 20, 2024

Even shorter:

use event_listener::Event;
use std::pin::pin;

#[tokio::main]
async fn main() {
    let event = Event::new();

    let listen_a = pin!(event.listen());
    let listen_b = pin!(event.listen());

    drop(listen_a);

    event.notify(1);
    listen_b.await;
}

Interestingly, removing pin!() allows it to make progress (might be compiler optimization or something).

@nazar-pc
Copy link
Author

Closing in favor of smol-rs/event-listener#143

@nazar-pc nazar-pc closed this as not planned Won't fix, can't repro, duplicate, stale Sep 20, 2024
@nazar-pc
Copy link
Author

nazar-pc commented Sep 20, 2024

Here is fixed example without problematic pin!() that prevents Drop from working:

use async_lock::RwLock;
use std::future::Future;
use std::task::Poll;
use tokio::macros::support::poll_fn;

#[tokio::main]
async fn main() {
    let rw = RwLock::new(());

    let read_guard = rw.read().await;

    let mut write_a_fut = Box::pin(rw.write());
    let mut write_b_fut = Box::pin(rw.write());

    poll_fn(|ctx| {
        assert!(matches!(write_b_fut.as_mut().poll(ctx), Poll::Pending));
        assert!(matches!(write_a_fut.as_mut().poll(ctx), Poll::Pending));

        Poll::Ready(())
    })
    .await;

    drop(read_guard);

    write_a_fut.await;
}

And similar with Mutex:

use async_lock::Mutex;
use std::future::Future;
use std::task::Poll;
use tokio::macros::support::poll_fn;

#[tokio::main]
async fn main() {
    let m = Mutex::new(());

    let guard_a = m.lock().await;

    let mut guard_b = Box::pin(m.lock());
    let mut guard_c = Box::pin(m.lock());

    poll_fn(|ctx| {
        assert!(matches!(guard_c.as_mut().poll(ctx), Poll::Pending));
        assert!(matches!(guard_b.as_mut().poll(ctx), Poll::Pending));

        Poll::Ready(())
    })
    .await;

    drop(guard_a);

    guard_b.await;
}

This basically means there is an internal queue that defines an order in which things are supposed to be unlocked determined by the first .poll() call. If after that order changes, it may deadlock.

I'm not 100% sure whether it is expected or not. The solution would be to notify all instead of one, but that will likely have a performance impact, which may or may not be acceptable.

UPD: Though I suspect there is still another issue somewhere that I can't reproduce just yet.

@nazar-pc nazar-pc reopened this Sep 20, 2024
@nazar-pc
Copy link
Author

Feel free to close this issue if above behavior is expected

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants