You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've noticed a few times now that there are sporadic failures of CI jobs in the pytest stage due to socket issues. It is hard to replicate since there is no obvious decernable pattern to when they fail, but this needs investigating.
The text was updated successfully, but these errors were encountered:
What seems to happen is, occasionally when the for loop gets to the await adapter.send_message([]) line for the 2nd time, the socket from the previous run of the loop hasn't been shut down properly.
The fix
I've found threads like aio-libs/aiozmq#72 (comment) and https://stackoverflow.com/questions/45805714/how-to-close-properly-with-aiozmq-zmq, which led me to find out about zmq's LINGER property, documented (here)[http://api.zeromq.org/2-1%3azmq-setsockopt#toc15]. This suggests that by default, when zmq connections are made and a message is sent (but hasn't been recieved by the peer), the message will just linger - to me this suggests that the connection will not fully close.
But that hasn't seemed to help anything. So, I found another fix; aborting the stream transport. i.e. force killing it. Maybe in future we can investigate exactly why this is happening but... it's really quite annoying to debug aiozmq library as it's not consistently typed.
After much deliberation, I've decided the fix to this is to move from aiozmq to pyzmq, as the latter is supported for later versions of python and has had successful CI jobs for recent releases (https://github.com/zeromq/pyzmq/tree/v25.0.2)
I've noticed a few times now that there are sporadic failures of CI jobs in the pytest stage due to socket issues. It is hard to replicate since there is no obvious decernable pattern to when they fail, but this needs investigating.
The text was updated successfully, but these errors were encountered: