Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster_test.py::test_migration_one_after_another test fail when using epoll #4498

Open
adiholden opened this issue Jan 22, 2025 · 3 comments
Labels
bug Something isn't working failing-test

Comments

@adiholden
Copy link
Collaborator

we get this check fail
epoll_socket.cc:381] Check failed: async_write_req_ == nullptr

StackTrace:
30002➜ @ 0x55b96235e961 google::LogMessage::SendToLog()

30002➜ @ 0x55b96235e136 google::LogMessage::Flush()

30002➜ @ 0x55b962361fae google::LogMessageFatal::~LogMessageFatal()

30002➜ @ 0x55b96231b36e util::fb2::EpollSocket::AsyncWriteSome()

30002➜ @ 0x55b9623246e6 io::AsyncSink::AsyncWrite()

30002➜ @ 0x55b961bd0eb1 dfly::JournalStreamer::AsyncWrite()

30002➜ @ 0x55b961bd1508 dfly::JournalStreamer::OnCompletion()

30002➜ @ 0x55b961bd0bc6 ZZN4dfly15JournalStreamer10AsyncWriteEvENKUlSt10error_codeE_clES1

30002➜ @ 0x55b961bd54f3 ZSt13__invoke_implIvRZN4dfly15JournalStreamer10AsyncWriteEvEUlSt10error_codeE_JS2_EET_St14__invoke_otherOT0_DpOT1

30002➜ @ 0x55b961bd4fac ZSt10__invoke_rIvRZN4dfly15JournalStreamer10AsyncWriteEvEUlSt10error_codeE_JS2_EENSt9enable_ifIX16is_invocable_r_vIT_T0_DpT1_EES6_E4typeEOS7_DpOS8

30002➜ @ 0x55b961bd4989 ZNSt17_Function_handlerIFvSt10error_codeEZN4dfly15JournalStreamer10AsyncWriteEvEUlS0_E_E9_M_invokeERKSt9_Any_dataOS0

30002➜ @ 0x55b962326263 std::function<>::operator()()

30002➜ @ 0x55b9623235c5 io::(anonymous namespace)::AsyncWriteState::OnCb()

30002➜ @ 0x55b962323379 ZZN2io12_GLOBAL__N_115AsyncWriteState4OnCbEN6nonstd13expected_lite8expectedImSt10error_codeEEENKUlS6_E_clES6

30002➜ @ 0x55b9623258be ZSt13__invoke_implIvRZN2io12_GLOBAL__N_115AsyncWriteState4OnCbEN6nonstd13expected_lite8expectedImSt10error_codeEEEUlS7_E_JS7_EET_St14__invoke_otherOT0_DpOT1

30002➜ @ 0x55b96232531b ZSt10__invoke_rIvRZN2io12_GLOBAL__N_115AsyncWriteState4OnCbEN6nonstd13expected_lite8expectedImSt10error_codeEEEUlS7_E_JS7_EENSt9enable_ifIX16is_invocable_r_vIT_T0_DpT1_EESB_E4typeEOSC_DpOSD

30002➜ @ 0x55b962324ec1 ZNSt17_Function_handlerIFvN6nonstd13expected_lite8expectedImSt10error_codeEEEZN2io12_GLOBAL__N_115AsyncWriteState4OnCbES4_EUlS4_E_E9_M_invokeERKSt9_Any_dataOS4

30002➜ @ 0x55b96225ef99 std::function<>::operator()()

30002➜ @ 0x55b962318f97 util::fb2::EpollSocket::AsyncReq::Run()

30002➜ @ 0x55b96231d609 util::fb2::EpollSocket::Wakey()

30002➜ @ 0x55b962319915 ZZN4util3fb211EpollSocket13OnSetProactorEvENKUljiPNS0_13EpollProactorEE_clEjiS3

30002➜ @ 0x55b96231e0c2 ZSt13__invoke_implIvRZN4util3fb211EpollSocket13OnSetProactorEvEUljiPNS1_13EpollProactorEE_JjiS4_EET_St14__invoke_otherOT0_DpOT1

30002➜ @ 0x55b96231dedc ZSt10__invoke_rIvRZN4util3fb211EpollSocket13OnSetProactorEvEUljiPNS1_13EpollProactorEE_JjiS4_EENSt9enable_ifIX16is_invocable_r_vIT_T0_DpT1_EES8_E4typeEOS9_DpOSA

30002➜ @ 0x55b96231dc29 ZNSt17_Function_handlerIFvjiPN4util3fb213EpollProactorEEZNS1_11EpollSocket13OnSetProactorEvEUljiS3_E_E9_M_invokeERKSt9_Any_dataOjOiOS3

30002➜ @ 0x55b962317d48 std::function<>::operator()()

30002➜ @ 0x55b962316aec util::fb2::EpollProactor::DispatchCompletions()

30002➜ @ 0x55b9623148ec util::fb2::EpollProactor::MainLoop()

30002➜ @ 0x55b962262871 util::fb2::ProactorDispatcher::Run()

30002➜ @ 0x55b96227868f util::fb2::detail::(anonymous namespace)::DispatcherImpl::Run()

It looks like we get the check fail from the JournalStreamer::OnCompletion call which calls again the AsyncWrite.

@adiholden adiholden added bug Something isn't working failing-test labels Jan 22, 2025
@kostasrim
Copy link
Contributor

Yes I think we don't reset async_write_req within epoll socket. I am familiar with the code and will take a look once I get a chance

@kostasrim
Copy link
Contributor

The problem is in:

  615       auto finalize = [this] {                                                                                                                                                                               
  616         delete async_write_req_;                                                                                                                                                                             
  617         async_write_req_ = nullptr;                                                                                                                                                                          
  618         async_write_pending_ = 0;                                                                                                                                                                            
  619       }; 
  621       if (ec) {                                   
  622         async_write_req_->cb(make_unexpected(ec));                                                
  623         finalize();

Long story short, we register a file descriptor (the socket) in the interest list of epoll (via epoll ctrl). At some later point, the epoll proactor (technically is a reactor but different conversation) will start processing events from epoll which in turn will execute a callback -> async_write_req_->cb(make_unexpected(ec));

The problem here is that cb in our case is:

  117   dest_->AsyncWrite(v.data(), v.size(), [this, len = in_flight_bytes_](std::error_code ec) {       
  118     OnCompletion(std::move(ec), len);                                                              
  119   });  

and:

  131 void JournalStreamer::OnCompletion(std::error_code ec, size_t len) {                               
  132   DCHECK_EQ(in_flight_bytes_, len);                                                                
  133                                                                                                    
  134   DVLOG(3) << "Completing " << in_flight_bytes_;                                                   
  135   in_flight_bytes_ = 0;                                                                            
  136   pending_buf_.Pop();                                                                              
  137   if (ec && !IsStopped()) {                                                                        
  138     cntx_->ReportError(ec);                                                                        
  139   } else if (!pending_buf_.Empty() && !IsStopped()) {                                              
  140     AsyncWrite();                                                                                  
  141   } 

So OnCompletion will call again AsyncWrite. This is problematic, because we haven't yet called finalize which cleans up the internal pointer of the async request (see first code snippet) and therefore we crash because of the CHECK

p.s. Also epoll socket does not yet have a proper AsyncRead implementation but we are lucky because we don't really use it in DF (we only do AsyncWrites within the streamer).

I will fix both on first convenience

@kostasrim
Copy link
Contributor

romange/helio#374

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working failing-test
Projects
None yet
Development

No branches or pull requests

2 participants