Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timed out waiting for keepalive pong error never closes the connection #1527

Open
go-- opened this issue Oct 22, 2024 · 5 comments
Open

timed out waiting for keepalive pong error never closes the connection #1527

go-- opened this issue Oct 22, 2024 · 5 comments
Labels

Comments

@go--
Copy link

go-- commented Oct 22, 2024

if client is connected to server, sending and receiving pings.
if you close the wifi or the connection of the client:
server never raises a connectionclosederror or a similar error. even though it seems it is gonna raise one.

async with self.send_context():                                                                                                                                                          
  self.protocol.fail(
    CloseCode.INTERNAL_ERROR,
    "keepalive ping timeout",
  )

until client reconnects again which is not expected.

if I enable to debug logs then it shows that connection is closed but no connection_closed raise.

Oct 22 07:19:55 DEBUG = connection is CLOSING 
Oct 22 07:19:55 DEBUG > EOF 
Oct 22 07:19:55 DEBUG x closing TCP connection 
Oct 22 07:26:55 DEBUG < EOF 
Oct 22 07:26:55 DEBUG = connection is CLOSED
@go-- go-- changed the title timed out waiting for keepalive pong error never closes the connection timed out waiting for keepalive pong error never closes the connection #bug Oct 22, 2024
@go-- go-- changed the title timed out waiting for keepalive pong error never closes the connection #bug timed out waiting for keepalive pong error never closes the connection Oct 22, 2024
@aaugustin
Copy link
Member

It's possible that there's a bug but your report is a bit short of information for me to reproduce easily.

Can you give me a bit more context?

  • Are you using websockets on the client, the server, or both?
  • Where do you expect a ConnectionClosedError? Are you awaiting recv() when this happens?

@go--
Copy link
Author

go-- commented Oct 26, 2024

I think the problem is in the keepalive ping internal procedure. It never raises ConnectionClosed when the client closes the connection.

  • Using the server with the new asyncio API. (Server) (Enable debug logs)
  • Connecting it this from any client which has a different WiFi connection to test ping timeout. (Client)
  1. To reproduce it; close the clients WiFi.
  2. Watch the server logs you should see DEBUG = connection is CLOSED but the client is still alive in the loop. no ConnectionClosed. (If the client closed normally like with ctrl+c it would throw ConnectionClosed)
  3. After awhile like 2 mins later, turn the WiFi on and check the logs again the server now throw the connectionclosed error for the intended client.

hope I am more clear this time.

@aaugustin
Copy link
Member

aaugustin commented Oct 26, 2024

Yes, that's more clear, thank you.

The scenario that you're testing typically results in a stuck TCP connection because the TCP closing handshake cannot complete.

websockets waits for the TCP connection to terminate to raise the ConnectionClosed exception because the details of that exception cannot be determined with certainty until then.

My best guess is that you're seeing in the debug logs of the server:

  • sending a ping frame ---> waiting for a pong frame
  • detecting a ping timeout ping_timeout seconds later
  • sending a close frame ---> waiting for a close frame
  • closing the TCP connection close_timeout seconds later
    • this happens at 07:19:55 in the logs you provided
  • ... nothing here because you've lost the network link so the TCP closing handshake cannot complete and asyncio never gets the connection_lost signal...
  • until you recover the network link, then the TCP closing handshake completes and you get the ConnectionClosed exception
    • this happens at 07:26:55 in the logs you provided

This behavior is documented in close and wait_closed.

Can you confirm that's what you're seeing? You need to go a bit further back in logs vs. what you provided.

Once we agree on the facts, we can review whether that behavior makes sense. For context, it's the consequence of paying attention not to leak TCP connections.

@go--
Copy link
Author

go-- commented Oct 26, 2024

yes it was almost exactly like this.

what I cannot recall right now is I thought I saw the log lines I sent was a "whole at once". until you pointed out there is a 5 mins gap. I wont be able to reproduce this right now. I will check again after the holiday.
What I would have expected was:
After timeout out seconds (30 in my case) a ConnectionClosed or TimedOut error.
Because if you disable the logs. you wont be seeing anything. and no guarantee that the intended client is turning back online. then we would treat the lost connection client as connected.

@rgarus
Copy link

rgarus commented Dec 1, 2024

In my application I observe the same error on the client side, but only for Python 3.8, 3.9 and 3.10.

This is a part of the used code:

websocket = await websockets.asyncio.client.connect(...)

try:
    while True:
        try:
            await asyncio.wait_for(websocket.recv(), 30)
        except asyncio.TimeoutError:
            print("state:", websocket.state)
except Exception as e:
    print("Exception", type(e), e)

After some time the wifi is switched off and the server is no longer reachable.

Results with Python 3.10.15 on linux, websockets 14.1

Output:

state: State.OPEN
state: State.CLOSING
state: State.CLOSING
state: State.CLOSING
state: State.CLOSING
state: State.CLOSING
state: State.CLOSING
state: State.CLOSING
state: State.CLOSING
state: State.CLOSING
state: State.CLOSING
state: State.CLOSING
state: State.CLOSING
state: State.CLOSING
state: State.CLOSING
state: State.CLOSING

Logging:

2024-12-01 22:11:40,449 % sending keepalive ping
2024-12-01 22:11:40,449 > PING b8 df 8b e9 [binary, 4 bytes]
2024-12-01 22:11:40,454 < PONG b8 df 8b e9 [binary, 4 bytes]
2024-12-01 22:11:40,455 % received keepalive pong
2024-12-01 22:11:40,497 < PING 10 f3 58 07 [binary, 4 bytes]
2024-12-01 22:11:40,498 > PONG 10 f3 58 07 [binary, 4 bytes]
2024-12-01 22:12:00,469 % sending keepalive ping
2024-12-01 22:12:00,469 > PING 02 ab c1 43 [binary, 4 bytes]
2024-12-01 22:12:20,480 - timed out waiting for keepalive pong
2024-12-01 22:12:20,481 > CLOSE 1011 (internal error) keepalive ping timeout [24 bytes]
2024-12-01 22:12:20,481 = connection is CLOSING

Results with Python 3.11.5 on linux, websockets 14.1

Output:

state: 1
state: 2
state: 2
Exception <class 'websockets.exceptions.ConnectionClosedError'> sent 1011 (internal error) keepalive ping timeout; no close frame received

Logging:

2024-12-01 21:34:27,523 < PING ea 30 e6 ab [binary, 4 bytes]
2024-12-01 21:34:27,524 > PONG ea 30 e6 ab [binary, 4 bytes]
2024-12-01 21:34:27,525 % sending keepalive ping
2024-12-01 21:34:27,525 > PING 40 41 60 82 [binary, 4 bytes]
2024-12-01 21:34:27,531 < PONG 40 41 60 82 [binary, 4 bytes]
2024-12-01 21:34:27,531 % received keepalive pong
2024-12-01 21:34:47,533 % sending keepalive ping
2024-12-01 21:34:47,533 > PING 06 30 76 ad [binary, 4 bytes]
2024-12-01 21:35:07,543 - timed out waiting for keepalive pong
2024-12-01 21:35:07,544 > CLOSE 1011 (internal error) keepalive ping timeout [24 bytes]
2024-12-01 21:35:07,544 = connection is CLOSING
2024-12-01 21:35:47,564 < EOF
2024-12-01 21:35:47,564 > EOF
2024-12-01 21:35:47,565 = connection is CLOSED

With a close_timeout of 10 seconds I expect the "connection is CLOSED" message 10 seconds after "connection is CLOSING". Why does the CLOSE message only appear after 40 seconds?

@aaugustin aaugustin added the bug label Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants