Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression in version 0.6.15 #4

Open
chinaryov opened this issue Mar 29, 2023 · 17 comments
Open

Performance regression in version 0.6.15 #4

chinaryov opened this issue Mar 29, 2023 · 17 comments

Comments

@chinaryov
Copy link

Hello,

We recently decided to upgrade to the latest version of the library from 0.6.13. During testing, we found that the number of requests decreased by half, and the average request time doubled. We tested all intermediate versions and found that the problem appeared in version 0.6.15. We are looking for ideas on what could affect performance in this version. There are a significant number of changes in this version.

Please let me know if you need any further information.

Chart with metrics - https://www.evernote.com/shard/s113/sh/331c5ad5-4cc2-4050-886f-b7bc85ad15be/CZnRDD3TGvojYcOe4LvwZWAv6invDhIzNBVWqQ2R0B2k25YwoQ-3AL0eEw

Best regards,
Alexander

@rdobrik
Copy link
Contributor

rdobrik commented Mar 29, 2023

Hi Alexander,

In Release 15 we added BLS certificate verification on response. That might cause performance regression. This verification is on by default. If your use case does not require it you can turn it off on Agent instance with method agent.setVerify(false).
Let me know, if it helps.

We just released version 19, if you are interested to move there and test it. I has more new features and bug fixes. making it compatible with the latest ICP version.

Roman

@chinaryov
Copy link
Author

Thank you for your quick response. We have disabled the verification and upgraded to the latest version 19, but the problem still persists. Do you have any other ideas?

@rdobrik
Copy link
Contributor

rdobrik commented Mar 30, 2023

Hmm, let me go back to other changes I made that time. Which version of HTTP client are you using? Apache or OkHttp?

@chinaryov
Copy link
Author

chinaryov commented Mar 30, 2023

I have identified the issue in the Agent class constructor where the BLS.init() call is executed. After disabling this call, the average request time has returned to its previous value. Our application works with multiple canisters from different threads, and we frequently recreate the Agent, which contributed to the performance regression. Thank you for your help in resolving this issue. Was tested on the version 0.6.15.

@chinaryov
Copy link
Author

chinaryov commented Mar 30, 2023

I have tested the latest version 19 of the library. Вisabling the BLS.init() call has resolved the issue with the average request time.

@rdobrik
Copy link
Contributor

rdobrik commented Mar 30, 2023

Thank you Alexander! I will optimize the code, it should not initialize BLS if setVerify is false. Let's see if it will also work with static object so it will be initialized just once.

@rdobrik
Copy link
Contributor

rdobrik commented Mar 30, 2023

I might also add system property to set verify to false by default if needed.

@rdobrik
Copy link
Contributor

rdobrik commented Mar 30, 2023

Yeah it should work with static initializer. I will run more tests in different scenarios. Also add an option disable it completely, depending on system settings.

@rdobrik
Copy link
Contributor

rdobrik commented Apr 1, 2023

Hi Alexander,

Just dropped new release with the patch, it impacts only agent package.

implementation 'org.ic4j:ic4j-agent:0.6.19.1'
implementation 'org.ic4j:ic4j-candid:0.6.19'

In this patch we are initializing BLS statically, so it will take a hit only once per Java instance.
Also added system property, if you need to disable BLS verification completely per system. This way it will not initialize or verify.

to use it just set blsVerify property to false, in Java

System.setProperty("blsVerify", "false");

Roman

@chinaryov
Copy link
Author

Hello,

We have one more problem - significant increase in memory consumption with this upgrade.

With the new version we've noticed a significant increase in memory consumption, from 1GB to 3GB and more, as shown in the graph below.

https://photos.app.goo.gl/DzyLqy2Pskkc3dTe9

We've determined that the switch to the new version (5.1.5) of the httpcore5-h2 library is what's causing the increase in memory consumption. We've also tested version 5.1.4 and observed the same increase in memory consumption.

However, when we switch back to version 5.1.3 of the httpcore5-h2 library, the memory consumption issue disappears. We have not made any additional changes to the code or other libraries.

After inspecting the Java heap, we found a large number of byte[] objects sized 16777224 bytes. This issue was not observed in the previous version, as shown in the screenshot below.

Screenshot of increased memory consumption: https://photos.app.goo.gl/pzHjvvxbo8PvWSpA8
Screenshot of previous memory consumption: https://photos.app.goo.gl/AZtJaWehEbJR9oQv5

We would appreciate any ideas or suggestions on why this increase in memory consumption is happening. Thank you.

Best regards,
Alexander

@rdobrik
Copy link
Contributor

rdobrik commented Apr 3, 2023

Hi Alexander,

Let me see if I can reproduce it here. And if there is something we can do in IC4J Agent to prevent it. Meanwhile can you use 5.1.3 version? There is nothing we require in our Agent to use 5.1.4, we just updated to the latest version. Actually, I tried it with 5.2.1 but there are some incompatibilities with Java version 8. Will see if they will resolve it.
If there is some memory issue with 5.1.4 we should report bug to HTTP Client team. let me investigate too.

Roman

@rdobrik
Copy link
Contributor

rdobrik commented Apr 3, 2023

Looks like I am able to reproduce it here, in 5.1.4 I see object org.apache.hc.core5.http2.impl.nio.ClientH2StreamMultiplexer that retains those bytes, I do not see that object in 5.1.3.

Class Name | Shallow Heap | Retained Heap | Percentage

streamMultiplexer org.apache.hc.core5.http2.impl.nio.ClientH2StreamMultiplexer @ 0x641accf78| 144 | 16,846,992 | 57.52%

@chinaryov
Copy link
Author

I've traced the source of the issue to the applyRemoteSettings method in the AbstractH2StreamMultiplexer file. Specifically, there was a change made to this method in the GitHub commit below, which appears to be causing the output buffer to increase to 16777215 bytes:
apache/httpcomponents-core@b2a5a18

        final int maxFrameSize = remoteConfig.getMaxFrameSize();
        if (maxFrameSize > localConfig.getMaxFrameSize()) {
            outputBuffer.expand(maxFrameSize);
        }

H2Config remoteConfig - has default value of MaxFrameSize 16777215 bytes

@chinaryov
Copy link
Author

Additionally, I've found some strange values in our log file, specifically the output payload (MAX_HEADER_LIST_SIZE = 16777215) and input payload (MAX_FRAME_SIZE = 16777215). It's possible that these values are correct, but they look suspicious to me.
https://photos.app.goo.gl/tX8nC6WzhjqZSCua8

@chinaryov
Copy link
Author

The maximum size of a payload frame in HTTP/2 is defined by the protocol and is known as the MAX_FRAME_SIZE parameter. The default value of MAX_FRAME_SIZE is 2^14 bytes (16,384 bytes). This means that the maximum size of a single payload frame is 16,384 bytes.

However, the protocol allows endpoints to negotiate a different value for MAX_FRAME_SIZE during the initial connection setup, using the SETTINGS frame. The value of MAX_FRAME_SIZE can be any power of two between 2^14 and 2^24-1 bytes (16,384 and 16,777,215 bytes).

If an endpoint sends a payload frame that exceeds the maximum size negotiated by the peer, the peer will send a connection error (HTTP/2 error code PROTOCOL_ERROR) and terminate the connection. Therefore, it's important for clients and servers to agree on a reasonable value for MAX_FRAME_SIZE during the initial connection setup to avoid errors and ensure efficient data transfer.

Can we somehow set a limit to avoid using the maximum possible value?

@rdobrik
Copy link
Contributor

rdobrik commented Apr 4, 2023

Hi Alexander,

Good investigation! Just trying to apply some changes to our Apache ReplicaTransport. I thing it will be easier to create custom version of ReplicaTransport that will use MinimalHttpAsyncClient instead of CloseableHttpAsyncClient. There we can set H2Config properties.

Here is a sample how to use it. https://github.com/apache/httpcomponents-client/blob/master/httpclient5/src/test/java/org/apache/hc/client5/http/examples/AsyncClientH2Multiplexing.java

I will share with you that new ReplicaTransport when is ready and somehow works for testing.

@rdobrik
Copy link
Contributor

rdobrik commented Apr 4, 2023

Hmm, looks like those properties come remotely from server. I can override local settings, but not sure if I can do it from remote one. I did not find any hook to reset that settings without modifying original Apache code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants