Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: enable mozilla-central http3server to use neqo-bin #1878

Merged
merged 8 commits into from
May 8, 2024

Conversation

mxinden
Copy link
Collaborator

@mxinden mxinden commented May 4, 2024

There are two server implementations based on neqo:

  1. https://github.com/mozilla/neqo/tree/main/neqo-bin/src/server

    • http3 and http09 implementation
    • used for manual testing and QUIC Interop
  2. https://searchfox.org/mozilla-central/source/netwerk/test/http3server/src/main.rs

    • used to test Firefox

I assume one was once an exact copy of the other. Both implement their own I/O, timers, event loop, ... Since then, the two implementations diverged significantly. Especially (1) saw a lot of improvements in recent months:

At this point, bugs in (2) are hard to fix, see e.g. #1801.

This commit merges (2) into (1), thus removing all duplicate logic and having (2) benefit from all the recent improvements to (1).


Opening as a draft for now. Blocked on #1877.

What do folks think? Are you in favor of merging the two? We can then discuss details, e.g. having firefox.rs with its various HttpServer implementations live in mozilla-central instead of GitHub.

The QUIC Interop Runner requires an http3 and http09 implementation for both
client and server. The client code is already structured into an http3 and an
http09 implementation since mozilla#1727.

This commit does the same for the server side, i.e. splits the http3 and http09
implementation into separate Rust modules.
There are two server implementations based on neqo:

1. https://github.com/mozilla/neqo/tree/main/neqo-bin/src/server
  - http3 and http09 implementation
  - used for manual testing and QUIC Interop

2. https://searchfox.org/mozilla-central/source/netwerk/test/http3server/src/main.rs
  - used to test Firefox

I assume one was once an exact copy of the other. Both implement their own I/O,
event loop, ... Since then, the two implementations diverged significantly.
Especially (1) saw a lot of improvements in recent months:

- mozilla#1564
- mozilla#1569
- mozilla#1578
- mozilla#1581
- mozilla#1604
- mozilla#1612
- mozilla#1676
- mozilla#1692
- mozilla#1707
- mozilla#1708
- mozilla#1727
- mozilla#1753
- mozilla#1756
- mozilla#1766
- mozilla#1772
- mozilla#1786
- mozilla#1787
- mozilla#1788
- mozilla#1794
- mozilla#1806
- mozilla#1808
- mozilla#1848
- mozilla#1866

At this point, bugs in (2) are hard to fix, see e.g.
mozilla#1801.

This commit merges (2) into (1), thus removing all duplicate logic and
having (2) benefit from all the recent improvements to (1).
Copy link

github-actions bot commented May 4, 2024

Failed Interop Tests

QUIC Interop Runner, client vs. server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

@KershawChang
Copy link
Collaborator

After reviewing the code in firefox.rs, I prefer to keep Firefox-specific test code (e.g., NonRespondingServer, Http3TestServer) in mozilla-central rather than in neqo. The main reason is that this test-specific code is closely tied to the corresponding test files in mozilla-central (e.g., test_http3.js). Splitting this test code between two repositories seems not a good idea.

However, I do think that extracting common elements such as I/O, timers, and the event loop to simplify the code in http3server would be nice.

@mxinden
Copy link
Collaborator Author

mxinden commented May 6, 2024

NonRespondingServer, Http3TestServer and Http3ProxyServer each implement the HttpServer trait. We can export (i.e. make pub) the HttpServer trait from neqo-bin. We can then move the NonRespondingServer, Http3TestServer and Http3ProxyServer back into mozilla-central, each implementing neqo_bin::HttpServer. One can then run each of the servers via the neqo_bin::ServersRunner from within mozilla-central/source/netwerk/test/http3server/src/main.rs. Sounds good @KershawChang?

@KershawChang
Copy link
Collaborator

NonRespondingServer, Http3TestServer and Http3ProxyServer each implement the HttpServer trait. We can export (i.e. make pub) the HttpServer trait from neqo-bin. We can then move the NonRespondingServer, Http3TestServer and Http3ProxyServer back into mozilla-central, each implementing neqo_bin::HttpServer. One can then run each of the servers via the neqo_bin::ServersRunner from within mozilla-central/source/netwerk/test/http3server/src/main.rs. Sounds good @KershawChang?

Yeah, I think this sounds good to me.
However, we also need to make sure that some hacks that we've done in http3server are also migrated to neqo_bin::ServersRunner.
For example, we have another get_timeout function that's used to return a short timeout.

neqo-bin/src/server/mod.rs Show resolved Hide resolved
neqo-bin/src/server/mod.rs Show resolved Hide resolved
neqo-bin/src/server/mod.rs Outdated Show resolved Hide resolved
@mxinden
Copy link
Collaborator Author

mxinden commented May 6, 2024

Status update:

devtools/client/inspector/test/browser_inspector_reload_iframe.js test is now succeeding on my local machine, using neqo-bin for I/O etc., with #1784 back in place.

Still needs a bit of clean-up. Will create my first mozilla-central patch soon.

@mxinden
Copy link
Collaborator Author

mxinden commented May 6, 2024

I have created a accompanying Bugzilla Bug and added a Phabricator patch.

I suggest we keep high level discussions here on this GitHub pull request.

@mxinden
Copy link
Collaborator Author

mxinden commented May 6, 2024

This pull request builds on top of #1877.

Once #1877 is merged, I will move this pull request to "Ready for review".

Copy link

github-actions bot commented May 6, 2024

Firefox builds for this PR

The following builds are available for testing. Crossed-out builds did not succeed.

Copy link

github-actions bot commented May 6, 2024

Failed Interop Tests

QUIC Interop Runner, client vs. server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

@KershawChang
Copy link
Collaborator

The merged http3server seems to work pretty well, based on the try push below.
https://treeherder.mozilla.org/jobs?repo=try&revision=bb0c12caf3fdf7b82b348b63419376ce289d8382
https://treeherder.mozilla.org/jobs?repo=try&revision=44dd0f923e5a010b9260262f8952ed1367446d8f

One potential problem is that we need to vendor some third-party rust crates into mozilla-central, as shown in this link
I think it'd be good if we can reduce these dependencies if possible.

neqo-bin/src/server/http09.rs Show resolved Hide resolved
neqo-bin/src/server/mod.rs Show resolved Hide resolved
@mxinden mxinden marked this pull request as ready for review May 7, 2024 15:35
@mxinden mxinden changed the title refactor: merge mozilla-central http3 server into neqo-bin refactor: enable mozilla-central http3server to use neqo-bin May 7, 2024
@mxinden
Copy link
Collaborator Author

mxinden commented May 7, 2024

One potential problem is that we need to vendor some third-party rust crates into mozilla-central, as shown in this link
I think it'd be good if we can reduce these dependencies if possible.

I will take a look. Thanks @KershawChang!

Nice to have. Adds multiple dependencies. Hard to justify for mozilla-central.
@mxinden
Copy link
Collaborator Author

mxinden commented May 7, 2024

955aaa9 should drop the need for the following dependencies:

+[[audits.anstream]]
+who = "Kershaw Chang <[email protected]>"
+criteria = "safe-to-deploy"
+version = "0.5.0"
+
+[[audits.anstyle-parse]]
+who = "Kershaw Chang <[email protected]>"
+criteria = "safe-to-deploy"
+version = "0.2.4"
+
+[[audits.anstyle-query]]
+who = "Kershaw Chang <[email protected]>"
+criteria = "safe-to-deploy"
+version = "1.0.3"
+
+[[audits.anstyle-wincon]]
+who = "Kershaw Chang <[email protected]>"
+criteria = "safe-to-deploy"
+version = "2.1.0"
+[[audits.colorchoice]]
+who = "Kershaw Chang <[email protected]>"
+criteria = "safe-to-deploy"
+delta = "1.0.0 -> 1.0.1"

Which leaves us with the following additional dependencies for mozilla central:

+[[audits.quinn-udp]]
+who = "Kershaw Chang <[email protected]>"
+criteria = "safe-to-run"
+version = "0.5.0"
+
+[[audits.quinn-udp]]
+who = "Kershaw Chang <[email protected]>"
+criteria = "safe-to-run"
+delta = "0.5.0 -> 0.5.0@git:a947962131aba8a6521253d03cc948b20098a2d6"
+importable = false

I don't think we can nor do we want to do without quinn-udp.

+[[audits.socket2]]
+who = "Kershaw Chang <[email protected]>"
+criteria = "safe-to-run"
+delta = "0.5.5 -> 0.5.7"

Must have for quinn-udp.

+[[audits.clap-verbosity-flag]]
+who = "Kershaw Chang <[email protected]>"
+criteria = "safe-to-run"
+version = "2.2.0"

Could be removed through some semi-intrusive feature flags in neqo-bin. That said, clap-verbosity-flag is part of the clap-rs GitHub organization. Given that mozilla-central already trusts clap, I would assume it can trust clap-verbosity-flags as well. Thoughts?

https://github.com/clap-rs/clap-verbosity-flag

Copy link

github-actions bot commented May 7, 2024

Benchmark results

Performance differences relative to beffcc6.

  • drain a timer quickly time: [310.04 ns 317.56 ns 324.53 ns]
    change: [-2.3091% -0.0927% +2.1501%] (p = 0.93 > 0.05)
    No change in performance detected.

  • coalesce_acked_from_zero 1+1 entries
    time: [195.65 ns 196.08 ns 196.56 ns]
    change: [-1.4067% -0.2249% +0.5087%] (p = 0.76 > 0.05)
    No change in performance detected.

  • coalesce_acked_from_zero 3+1 entries
    time: [237.21 ns 237.77 ns 238.38 ns]
    change: [-0.3859% +0.0091% +0.4445%] (p = 0.97 > 0.05)
    No change in performance detected.

  • coalesce_acked_from_zero 10+1 entries
    time: [236.54 ns 237.19 ns 238.00 ns]
    change: [-0.5263% +0.0822% +0.6749%] (p = 0.80 > 0.05)
    No change in performance detected.

  • coalesce_acked_from_zero 1000+1 entries
    time: [217.72 ns 217.87 ns 218.07 ns]
    change: [-0.4886% +0.0962% +0.6935%] (p = 0.76 > 0.05)
    No change in performance detected.

  • RxStreamOrderer::inbound_frame()
    time: [118.50 ms 118.59 ms 118.68 ms]
    change: [-0.4867% -0.3973% -0.2988%] (p = 0.00 < 0.05)
    Change within noise threshold.

  • transfer/Run multiple transfers with varying seeds
    time: [118.56 ms 118.84 ms 119.11 ms]
    thrpt: [33.582 MiB/s 33.658 MiB/s 33.737 MiB/s]
    change:
    time: [-1.5179% -1.2052% -0.8946%] (p = 0.00 < 0.05)
    thrpt: [+0.9026% +1.2199% +1.5412%]
    Change within noise threshold.

  • transfer/Run multiple transfers with the same seed
    time: [118.69 ms 118.90 ms 119.12 ms]
    thrpt: [33.579 MiB/s 33.641 MiB/s 33.703 MiB/s]
    change:
    time: [-1.4204% -1.1508% -0.8884%] (p = 0.00 < 0.05)
    thrpt: [+0.8964% +1.1642% +1.4409%]
    Change within noise threshold.

  • 1-conn/1-100mb-resp (aka. Download)/client
    time: [1.1402 s 1.1444 s 1.1492 s]
    thrpt: [87.019 MiB/s 87.384 MiB/s 87.702 MiB/s]
    change:
    time: [+0.9699% +2.2479% +3.3239%] (p = 0.00 < 0.05)
    thrpt: [-3.2170% -2.1984% -0.9606%]
    Change within noise threshold.

  • 1-conn/10_000-parallel-1b-resp (aka. RPS)/client
    time: [428.67 ms 430.69 ms 432.71 ms]
    thrpt: [23.110 Kelem/s 23.219 Kelem/s 23.328 Kelem/s]
    change:
    time: [+0.1086% +0.7894% +1.4942%] (p = 0.02 < 0.05)
    thrpt: [-1.4722% -0.7832% -0.1085%]
    Change within noise threshold.

  • 1-conn/1-1b-resp (aka. HPS)/client
    time: [50.381 ms 50.654 ms 50.976 ms]
    thrpt: [19.617 elem/s 19.742 elem/s 19.849 elem/s]
    change:
    time: [+0.0560% +1.1789% +2.1177%] (p = 0.02 < 0.05)
    thrpt: [-2.0738% -1.1651% -0.0559%]
    Change within noise threshold.

Client/server transfer results

Transfer of 134217728 bytes over loopback.

Client Server CC Pacing Mean [ms] Min [ms] Max [ms] Relative
msquic msquic 658.6 ± 196.1 447.2 1025.9 1.00
neqo msquic reno on 958.7 ± 278.6 779.9 1545.3 1.00
neqo msquic reno 987.1 ± 199.2 786.0 1399.6 1.00
neqo msquic cubic on 1037.2 ± 290.9 778.6 1494.7 1.00
neqo msquic cubic 998.6 ± 229.7 792.8 1407.5 1.00
msquic neqo reno on 5497.3 ± 293.6 5085.9 6101.2 1.00
msquic neqo reno 5414.9 ± 232.3 5131.1 5837.7 1.00
msquic neqo cubic on 5205.6 ± 244.8 4877.0 5613.5 1.00
msquic neqo cubic 5167.4 ± 228.6 4822.9 5532.2 1.00
neqo neqo reno on 4518.2 ± 593.0 3459.7 5326.8 1.00
neqo neqo reno 4282.6 ± 570.5 3084.7 5045.4 1.00
neqo neqo cubic on 4707.0 ± 700.0 3005.3 5585.8 1.00
neqo neqo cubic 4743.8 ± 587.4 3319.7 5397.9 1.00

⬇️ Download logs

@larseggert
Copy link
Collaborator

I don't think we can nor do we want to do without quinn-udp.

Aside: It looks like we will need to consider alternative options at some point, because quinn-udp just removed support for sendmmsg/recvmmsg (quinn-rs/quinn@ee08826), which I think we'd like as a fallback on those systems where GSO isn't available (such as macOS, iOS, etc.)

@mxinden
Copy link
Collaborator Author

mxinden commented May 8, 2024

@larseggert yes, unfortunately. Commit was part of quinn-rs/quinn#1729. I reviewed the pull request before that commit only (quinn-rs/quinn#1729 (review)). (The pull request has another change which is beneficial for neqo, see review link before.)

I am not sure this was a deliberate choice against sendmmsg, or just a clean-up, given that quinn currently doesn't use it. In other words, they might be willing to revert, once we raise that it is needed for neqo.

which I think we'd like as a fallback on those systems where GSO isn't available (such as macOS, iOS, etc.)

Agreed.

#1693 is currently paused in favor of #1801 and this pull request. Also I believe #1868 is a lower hanging fruit for now. Long story short, I don't think we need to make a decision today.

@larseggert
Copy link
Collaborator

I don't think we need to make a decision today.

We definitely don't.

@mxinden
Copy link
Collaborator Author

mxinden commented May 8, 2024

The two "Firefox / Build Firefox" failures seem unrelated and seem to just have timed out. @larseggert fine to ignore for now?

Edit: Probably related #1893.

@larseggert
Copy link
Collaborator

If @KershawChang is happy, I am good to land this.

@KershawChang
Copy link
Collaborator

If @KershawChang is happy, I am good to land this.

Looks good to me. I think we can land this.
I'll release a new neqo version and land it with the h3 server in mozilla-central next week.

@larseggert larseggert enabled auto-merge May 8, 2024 13:51
@larseggert larseggert added this pull request to the merge queue May 8, 2024
Merged via the queue into mozilla:main with commit b8ae981 May 8, 2024
52 of 54 checks passed
mxinden added a commit to mxinden/neqo that referenced this pull request May 13, 2024
…lla#1800)

This reverts commit 342e4e7.

With mozilla#1878 merged and
https://bugzilla.mozilla.org/show_bug.cgi?id=1895319 available, one can now
reapply the patch removing `Server::timers`.

More specifically, the actual bug fix on mozilla-central side:

``` rust
let output = if self.response_to_send.is_empty() {
    output
} else {
    // In case there are pending responses to send, make sure a reasonable
    // callback is returned.
    const MIN_INTERVAL: Duration = Duration::from_millis(100);

    match output {
        Output::None => Output::Callback(MIN_INTERVAL),
        o @ Output::Datagram(_) => o,
        Output::Callback(d) => Output::Callback(min(d, MIN_INTERVAL)),
    }
};
```

See https://phabricator.services.mozilla.com/D209574.
github-merge-queue bot pushed a commit that referenced this pull request May 15, 2024
This reverts commit 342e4e7.

With #1878 merged and
https://bugzilla.mozilla.org/show_bug.cgi?id=1895319 available, one can now
reapply the patch removing `Server::timers`.

More specifically, the actual bug fix on mozilla-central side:

``` rust
let output = if self.response_to_send.is_empty() {
    output
} else {
    // In case there are pending responses to send, make sure a reasonable
    // callback is returned.
    const MIN_INTERVAL: Duration = Duration::from_millis(100);

    match output {
        Output::None => Output::Callback(MIN_INTERVAL),
        o @ Output::Datagram(_) => o,
        Output::Callback(d) => Output::Callback(min(d, MIN_INTERVAL)),
    }
};
```

See https://phabricator.services.mozilla.com/D209574.
@mxinden mxinden mentioned this pull request May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants