Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropping the number of clients in the ingress benchmark #452

Closed
wants to merge 1 commit into from

Conversation

rsevilla87
Copy link
Member

@rsevilla87 rsevilla87 commented Aug 9, 2022

Fixes: #451

Signed-off-by: Raul Sevilla [email protected]

Description

We shouldn't use maxConnection=-1, as we always work with OOTB parameters, as the goal of the benchmarks is not reaching the best results, but to detect performance regressions. The parameter maxConnection=-1 is already part of the ingress-controller tuning guide.

Reducing the number of clients to prevent running out of connections, this should help to increase test stability, this parameters is configured OOTB with maxconn=20000
Also, reducing the test scope, by removing one of the clients iterations. cc: @qiliRedHat


With this change:

Large-scale:
20 clients: 20 x 500 routes x 2 (edge and re-encrypt terminations) => up to 20K connections
mix termination @ 10 clients: 10 x 2000 routes x 1.5 (edge and re-encrypt terminations) => up to 30K connections

Preliminary results after a couple of runs in different 4.11 clusters based on OpenShiftSDN are pretty stable (max observed deviation was 9.53% )

+-----------+--------+----------------------+-----------+--------------------------+--------+-----------+--------------------------+--------------------------+
| test_type | routes | conn_per_targetroute | keepalive |          metric          | result | deviation | 7112fc71-router-20220809 | 2ba26d9d-router-20220810 |
+-----------+--------+----------------------+-----------+--------------------------+--------+-----------+--------------------------+--------------------------+
|   http    |  500   |          1           |     0     | avg(requests_per_second) |  Pass  |  -6.87%   |         49891.0          |         46461.5          |
|   http    |  500   |          1           |     1     | avg(requests_per_second) |  Pass  |   9.53%   |         11783.5          |         12906.5          |
|   http    |  500   |          1           |    50     | avg(requests_per_second) |  Pass  |   6.39%   |         54539.0          |         58022.5          |
|   http    |  500   |          20          |     0     | avg(requests_per_second) |  Pass  |  -3.09%   |         86976.0          |         84292.5          |
|   http    |  500   |          20          |     1     | avg(requests_per_second) |  Pass  |  -6.14%   |         22313.5          |         20943.0          |
|   http    |  500   |          20          |    50     | avg(requests_per_second) |  Pass  |  -4.44%   |         83683.5          |         79965.0          |
+-----------+--------+----------------------+-----------+--------------------------+--------+-----------+--------------------------+--------------------------+
+-----------+--------+----------------------+-----------+--------------------------+--------+-----------+--------------------------+--------------------------+
| test_type | routes | conn_per_targetroute | keepalive |          metric          | result | deviation | 7112fc71-router-20220809 | 2ba26d9d-router-20220810 |
+-----------+--------+----------------------+-----------+--------------------------+--------+-----------+--------------------------+--------------------------+
|   edge    |  500   |          1           |     0     | avg(requests_per_second) |  Pass  |  -0.55%   |         119625.0         |         118971.0         |
|   edge    |  500   |          1           |     1     | avg(requests_per_second) |  Pass  |   0.28%   |          3711.5          |          3722.0          |
|   edge    |  500   |          1           |    50     | avg(requests_per_second) |  Pass  |   0.98%   |         98786.5          |         99759.5          |
|   edge    |  500   |          20          |     0     | avg(requests_per_second) |  Pass  |   3.29%   |         58950.5          |         60889.0          |
|   edge    |  500   |          20          |     1     | avg(requests_per_second) |  Pass  |   0.01%   |          3599.5          |          3600.0          |
|   edge    |  500   |          20          |    50     | avg(requests_per_second) |  Pass  |   3.29%   |         53534.5          |         55297.0          |
+-----------+--------+----------------------+-----------+--------------------------+--------+-----------+--------------------------+--------------------------+
+-------------+--------+----------------------+-----------+--------------------------+--------+-----------+--------------------------+--------------------------+
|  test_type  | routes | conn_per_targetroute | keepalive |          metric          | result | deviation | 7112fc71-router-20220809 | 2ba26d9d-router-20220810 |
+-------------+--------+----------------------+-----------+--------------------------+--------+-----------+--------------------------+--------------------------+
| passthrough |  500   |          1           |     0     | avg(requests_per_second) |  Pass  |   0.08%   |         136808.0         |         136920.5         |
| passthrough |  500   |          1           |     1     | avg(requests_per_second) |  Pass  |   0.34%   |          3714.0          |          3726.5          |
| passthrough |  500   |          1           |    50     | avg(requests_per_second) |  Pass  |   0.83%   |         109858.0         |         110773.5         |
| passthrough |  500   |          20          |     0     | avg(requests_per_second) |  Pass  |   1.27%   |         142737.5         |         144544.0         |
| passthrough |  500   |          20          |     1     | avg(requests_per_second) |  Pass  |   0.54%   |          3587.0          |          3606.5          |
| passthrough |  500   |          20          |    50     | avg(requests_per_second) |  Pass  |   1.54%   |         102508.0         |         104091.5         |
+-------------+--------+----------------------+-----------+--------------------------+--------+-----------+--------------------------+--------------------------+
+-----------+--------+----------------------+-----------+--------------------------+--------+-----------+--------------------------+--------------------------+
| test_type | routes | conn_per_targetroute | keepalive |          metric          | result | deviation | 7112fc71-router-20220809 | 2ba26d9d-router-20220810 |
+-----------+--------+----------------------+-----------+--------------------------+--------+-----------+--------------------------+--------------------------+
| reencrypt |  500   |          1           |     0     | avg(requests_per_second) |  Pass  |   4.23%   |         122159.0         |         127332.0         |
| reencrypt |  500   |          1           |     1     | avg(requests_per_second) |  Pass  |   0.39%   |          3738.5          |          3753.0          |
| reencrypt |  500   |          1           |    50     | avg(requests_per_second) |  Pass  |   3.06%   |         92213.0          |         95030.5          |
| reencrypt |  500   |          20          |     0     | avg(requests_per_second) |  Pass  |   2.08%   |         16137.5          |         16473.5          |
| reencrypt |  500   |          20          |     1     | avg(requests_per_second) |  Pass  |   0.66%   |          3618.0          |          3642.0          |
| reencrypt |  500   |          20          |    50     | avg(requests_per_second) |  Pass  |   2.49%   |         15873.0          |         16269.0          |
+-----------+--------+----------------------+-----------+--------------------------+--------+-----------+--------------------------+--------------------------+
+-----------+--------+----------------------+-----------+--------------------------+--------+-----------+--------------------------+--------------------------+
| test_type | routes | conn_per_targetroute | keepalive |          metric          | result | deviation | 7112fc71-router-20220809 | 2ba26d9d-router-20220810 |
+-----------+--------+----------------------+-----------+--------------------------+--------+-----------+--------------------------+--------------------------+
|    mix    |  2000  |          1           |     0     | avg(requests_per_second) |  Pass  |   6.29%   |         164314.5         |         174642.0         |
|    mix    |  2000  |          1           |     1     | avg(requests_per_second) |  Pass  |   2.52%   |         26342.0          |         27006.5          |
|    mix    |  2000  |          1           |    50     | avg(requests_per_second) |  Pass  |   5.82%   |         141775.0         |         150033.0         |
|    mix    |  2000  |          5           |     0     | avg(requests_per_second) |  Pass  |   3.74%   |         65049.5          |         67484.5          |
|    mix    |  2000  |          5           |     1     | avg(requests_per_second) |  Pass  |   3.56%   |         25241.0          |         26139.0          |
|    mix    |  2000  |          5           |    50     | avg(requests_per_second) |  Pass  |   3.90%   |         63860.0          |         66353.5          |
+-----------+--------+----------------------+-----------+--------------------------+--------+-----------+--------------------------+--------------------------+

Small-scale:
100 clients: 100x100 routes x 2 (edge and re-encrypt terminations) => up to 20K connections
mix termination @ 50 clients: 25 x 400 routes x 1.5 (edge and re-encrypt terminations) => up to 30K connections

@rsevilla87 rsevilla87 added the bug Something isn't working label Aug 9, 2022
Copy link
Collaborator

@venkataanil venkataanil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sjug
Copy link
Collaborator

sjug commented Aug 10, 2022

Is there a reason you're changing the small scale routes/clients?

@qiliRedHat
Copy link
Collaborator

@rsevilla87

mix termination @ 5 clients: 5 x 2000 routes x 2 (edge and re-encrypt terminations) => up to 20K connections

From my test result for large scale 500 mix, 10 clients is ok too. https://docs.google.com/spreadsheets/d/1jNYCdTu2XvSs4xARk8PwQGoPZVgra0jOQORIlUKAdKg/edit#gid=1789221797

For default 2 routers, Capacity:
default: number of router(defalt 2) x default maxConnections(20k) = 40k

For mix 500 routes 10 Clients, Real:
number of routes(500) x 2 (http and passthrough) x clients(10) + routes(500) x 2(edge and re-encrypt) x 2(connection per termination) x clients(10) =30k

Real 30k < Capacity 40k --> OK

@rsevilla87 rsevilla87 changed the title Dropping the number of clients in the large-scale scenario Dropping the number of clients in the ingress benchmark Aug 11, 2022
@rsevilla87
Copy link
Member Author

@rsevilla87

mix termination @ 5 clients: 5 x 2000 routes x 2 (edge and re-encrypt terminations) => up to 20K connections

From my test result for large scale 500 mix, 10 clients is ok too. https://docs.google.com/spreadsheets/d/1jNYCdTu2XvSs4xARk8PwQGoPZVgra0jOQORIlUKAdKg/edit#gid=1789221797

For default 2 routers, Capacity: default: number of router(defalt 2) x default maxConnections(20k) = 40k

For mix 500 routes 10 Clients, Real: number of routes(500) x 2 (http and passthrough) x clients(10) + routes(500) x 2(edge and re-encrypt) x 2(connection per termination) x clients(10) =30k

Real 30k < Capacity 40k --> OK

I wanted to ensure that we not run out of ports in any of the router pods because of irregular request balancing. Anyway, I've tested with the number of clients you suggested with good results (just some negligible 0 status codes) and I've pushed the changes.

@rsevilla87
Copy link
Member Author

rsevilla87 commented Aug 11, 2022

Is there a reason you're changing the small scale routes/clients?

The number of clients was also excessive in this scenario.
200 clients * 2 (edge&reencrypt terminations) * 100 routes = 40000 conns which is basically the connections capacity of the sum of both router pods
80 clients * 1.5 (edge&reencrypt terminations) * 400 routes = 48000 conns, higher than the connection capacity

If you still need something specific for your tests, I recommend you to overwrite the required variables.

@qiliRedHat
Copy link
Collaborator

@rsevilla87 It looks good. Shall we merge this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use maxConnection=-1 in router-perf test to increase tps and reduce error connections
4 participants