Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection to backend lost after network interruption when using websockets #20213

Closed
lennartfricke opened this issue Oct 10, 2024 · 4 comments · Fixed by #20283
Closed

Connection to backend lost after network interruption when using websockets #20213

lennartfricke opened this issue Oct 10, 2024 · 4 comments · Fixed by #20283

Comments

@lennartfricke
Copy link

lennartfricke commented Oct 10, 2024

Description of the bug

I used the vaadin gradle starter skeleton to have a minimal reproducing example and changed AppShell.java to add a push annotation

/**
 * Use the @PWA annotation make the application installable on phones, tablets
 * and some desktop browsers.
 */
@PWA(name = "Project Base for Vaadin", shortName = "Project Base")
@Theme("my-theme")
@Push(transport = Transport.WEBSOCKET, value = PushMode.AUTOMATIC)
public class AppShell implements AppShellConfigurator {
}

After starting using jettyRun. I open the app in a browser in a virtual machine (I used VirtualBox and libvirt).

I type in something and verify I get a result. I change the textbox content.

Then I virtually pull the cable.

In libvirt this can be done as follows, start virsh. Find out machine name (domain <dom>) using list, find out interface () using domiflist <dom>. 'Pull cable' using domif-setlink <dom> <iface> down.

It is not sufficient to use the developer tools. That does not interrupt the websocket connection.

Press 'Say hello' again to send something and wait some time (best reproduced when browser print something like
'Websocket closed, reason: Connection was closed abnormally (that is, with no close frame being sent). - wasClean: false')

Then reconnect the network (domif-setlink <dom> <iface> up).

And observe that nothing is send any more.

Expected behavior

The connection is reestablished and interaction is possible again.

I do not know if the last update should still be pushed or if resynchronization is better.

Minimal reproducible example

I downloaded https://github.com/vaadin/base-starter-gradle and applied the change above. I copied the code when vaadin version was 2.4.10

Versions

  • Vaadin / Flow version: 24.4.10 (probably also most current)
  • Java version: 21
  • OS version: Client Windows 10, Server Linux Ubuntu Jammy
  • Browser version: reproduced with Tatu Lund on Microsoft Edge Version 129.0.2792.79, Firefox 131
  • Application Server: jetty as of starter
@tepi tepi added investigation push BFP Bugfix priority, also known as Warranty labels Oct 11, 2024
@tepi tepi moved this to 📥Inbox - needs triage in Vaadin Flow ongoing work (Vaadin 10+) Oct 11, 2024
@tepi tepi moved this from 🆕 Needs triage to 🔖 High Priority (P1) in Vaadin Flow bugs & maintenance (Vaadin 10+) Oct 11, 2024
@mshabarov mshabarov moved this from 📥Inbox - needs triage to 🪵Product backlog in Vaadin Flow ongoing work (Vaadin 10+) Oct 14, 2024
@mshabarov mshabarov moved this from 🪵Product backlog to 🟢Ready to Go in Vaadin Flow ongoing work (Vaadin 10+) Oct 16, 2024
@mcollovati
Copy link
Collaborator

By default, Atmosphere does not call onOpen or onReopen when the web-socket connection is re-established.
As a result, if also fallback transport is WEBSOCKET, the application will not recover from the disconnected state.
If Atmosphere fallbacks to LONG_POLLING (default), then onReopen is called and the app resumes to connected state.

However, in both cases, the RequestResponseTracker.hasActiveRequest() is never reset to false, thus preventing new UIDL messages from being sent to the server.

@mcollovati
Copy link
Collaborator

Just reset RequestResponseTracker.hasActiveRequest flag is not enough for web socket transport.
The problem is that the message that is not delivered because of network loss is completely lost, and after reconnection a resynchronization is triggered, causing a page reload. This seems restricted to web socket transport because with WEBSOCKET_XHR or LONG_POLLING, the payload of the undelivered message is sent as body of the reconnection request.

Another interesting thing is that, by default, Atmosphere will downgrade to the fallback transport if the first web socket reconnection attempt fails. Luckily, there's an Atmosphere configuration property (maxWebsocketErrorRetries, default 1) that can be tuned to try to reconnect with web socket for several times, before downgrading.
Most likely, Flow should set a sensible value for it, for example 12, given that reconnection attempts happen every 5 seconds.

In addition, atmosphere-javascript has a couple of unreleased fixes related to web socket reconnection.

So a potential fixes in Flow client could be:

  • MessageSender
    • send: for web socket, internally store a copy of the message that will be sent to the server.
    • setClientToServerMessageId: called after the server response. When invoked, remove the stored the message if the server has seen it.
    • doSendInvocationsToServer: if there's a stored message, resend it, and postpone processing the rest of the queued messages.
  • DefaultConnectionStateHandler
    • pushOk: on reconnect, reset the RequestResponseTracker active request flag and only for web socket immediately send pending message (for other transports, the pending message is sent as part of the reconnection request).

mcollovati added a commit that referenced this issue Oct 18, 2024
When a websocket PUSH connection is closed and re-established because
of a network failure, the RequestResponseTracker.hasActiveRequest is
not reset, prenvint the Flow client to send additional messages to
the server.
This change will reset the flag on reconnection. It also will track
unsent PUSH message over websocket, to retry the delivery once the
connection is re-established, preventing client resynchronization.
In addition, it sets a default value of 12 for the Atmospehere
maxWebsocketErrorRetries setting, to ensure that the Flow client will
attempt to reconnect with web socket transport several times, instead
of immediately downgrade to long-polling after first failed connection.

Fixes #20213
@mshabarov
Copy link
Contributor

The above listed fixes look slightly related to a bigger mechanism described in #20348.

@github-project-automation github-project-automation bot moved this from 🔎Iteration reviews to Done in Vaadin Flow ongoing work (Vaadin 10+) Nov 7, 2024
vaadin-bot pushed a commit that referenced this issue Nov 7, 2024
…ion (#20283)

* fix: resume client to server communication after web socket reconnection

When a websocket PUSH connection is closed and re-established because
of a network failure, the RequestResponseTracker.hasActiveRequest is
not reset, prenvint the Flow client to send additional messages to
the server.
This change will reset the flag on reconnection. It also will track
unsent PUSH message over websocket, to retry the delivery once the
connection is re-established, preventing client resynchronization.
In addition, it sets a default value of 12 for the Atmospehere
maxWebsocketErrorRetries setting, to ensure that the Flow client will
attempt to reconnect with web socket transport several times, instead
of immediately downgrade to long-polling after first failed connection.

Fixes #20213

* upgrade to atmosphere javascript 4.0.1 with reconnection fixes

---------

Co-authored-by: Mikhail Shabarov <[email protected]>
vaadin-bot pushed a commit that referenced this issue Nov 7, 2024
…ion (#20283)

* fix: resume client to server communication after web socket reconnection

When a websocket PUSH connection is closed and re-established because
of a network failure, the RequestResponseTracker.hasActiveRequest is
not reset, prenvint the Flow client to send additional messages to
the server.
This change will reset the flag on reconnection. It also will track
unsent PUSH message over websocket, to retry the delivery once the
connection is re-established, preventing client resynchronization.
In addition, it sets a default value of 12 for the Atmospehere
maxWebsocketErrorRetries setting, to ensure that the Flow client will
attempt to reconnect with web socket transport several times, instead
of immediately downgrade to long-polling after first failed connection.

Fixes #20213

* upgrade to atmosphere javascript 4.0.1 with reconnection fixes

---------

Co-authored-by: Mikhail Shabarov <[email protected]>
vaadin-bot added a commit that referenced this issue Nov 7, 2024
…ion (#20283) (#20423)

* fix: resume client to server communication after web socket reconnection

When a websocket PUSH connection is closed and re-established because
of a network failure, the RequestResponseTracker.hasActiveRequest is
not reset, prenvint the Flow client to send additional messages to
the server.
This change will reset the flag on reconnection. It also will track
unsent PUSH message over websocket, to retry the delivery once the
connection is re-established, preventing client resynchronization.
In addition, it sets a default value of 12 for the Atmospehere
maxWebsocketErrorRetries setting, to ensure that the Flow client will
attempt to reconnect with web socket transport several times, instead
of immediately downgrade to long-polling after first failed connection.

Fixes #20213

* upgrade to atmosphere javascript 4.0.1 with reconnection fixes

---------

Co-authored-by: Marco Collovati <[email protected]>
Co-authored-by: Mikhail Shabarov <[email protected]>
vaadin-bot added a commit that referenced this issue Nov 7, 2024
…ion (#20283) (#20424)

* fix: resume client to server communication after web socket reconnection

When a websocket PUSH connection is closed and re-established because
of a network failure, the RequestResponseTracker.hasActiveRequest is
not reset, prenvint the Flow client to send additional messages to
the server.
This change will reset the flag on reconnection. It also will track
unsent PUSH message over websocket, to retry the delivery once the
connection is re-established, preventing client resynchronization.
In addition, it sets a default value of 12 for the Atmospehere
maxWebsocketErrorRetries setting, to ensure that the Flow client will
attempt to reconnect with web socket transport several times, instead
of immediately downgrade to long-polling after first failed connection.

Fixes #20213

* upgrade to atmosphere javascript 4.0.1 with reconnection fixes

---------

Co-authored-by: Marco Collovati <[email protected]>
Co-authored-by: Mikhail Shabarov <[email protected]>
@vaadin-bot
Copy link
Collaborator

This ticket/PR has been released with Vaadin 24.5.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

5 participants