Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many DirectByteBuffer with high capacity when use netty shaded client #11314

Open
cudothanh-Nhan opened this issue Jun 24, 2024 · 9 comments
Open

Comments

@cudothanh-Nhan
Copy link

cudothanh-Nhan commented Jun 24, 2024

What version of gRPC-Java are you using?

1.60.0

What is your environment?

jdk-18.0.2.1-x64
Linux 3.10.0-1160.76.1.el7.x86_64

Client intialization?

        NettyChannelBuilder.forTarget(target)
            .withOption(ChannelOption.CONNECT_TIMEOUT_MILLIS, timeout)
            .defaultLoadBalancingPolicy("round_robin")
            .keepAliveTime(60, TimeUnit.SECONDS)
            .keepAliveWithoutCalls(true)
            .sslContext(
                GrpcSslContexts.forClient()
                    .trustManager(InsecureTrustManagerFactory.INSTANCE)
                    .build())

;

JVM properties?

/zserver/java/jdk-18.0.2.1/bin/java --add-opens=java.base/jdk.internal.misc=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED -Dio.netty.tryReflectionSetAccessible=true -Dzappname=kiki-asr-streaming-websocket -Dzappprof=production -Dzconfdir=conf -Dzconffiles=config.ini -Djzcommonx.version=LATEST -Dzicachex.version=LATEST -Dzlogconffile=log4j2.yaml -Dlog4j2.configurationFile=conf/production.log4j2.yaml -Dlog4j2.contextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector -Dlog4j2.immediateFlush=false -Djava.net.preferIPv4Stack=true -XX:+AlwaysPreTouch -XX:+UseTLAB -XX:+ResizeTLAB -XX:+PerfDisableSharedMem -Xms1G -Xmx2G -XX:+UseG1GC -XX:MaxGCPauseMillis=500 -XX:InitiatingHeapOccupancyPercent=70 -XX:ParallelGCThreads=24 -XX:ConcGCThreads=24 -XX:+ParallelRefProcEnabled -XX:-ResizePLAB -XX:G1RSetUpdatingPauseTimePercent=5 -Dspring.config.location=optional:file:./conf/production.spring.yaml -Dorg.springframework.boot.logging.LoggingSystem=none -jar /zserver/java-projects/kiki-asr-streaming-websocket/dist/kiki-asr-streaming-websocket-1.3.1.jar

What did you expect to see?

Stable number of DirectByteBuffer objects

What did you see instead?

Increasing number of DirectByteBuffer objects.

This is my OQL to list capacity of about 1,832 objects
image

This is the GC root references from sample FastThreadLocalThread which contains DirectByteBuffer with capacity about 2MB and there are a lot of object like that
image

Besides, I noted that there are many DirectByteBuffer which has null cleaner. Is it the intentional impletation of netty.

Steps to reproduce the bug

@ejona86
Copy link
Member

ejona86 commented Jun 24, 2024

Increasing number of DirectByteBuffer objects.

That doesn't tell us much. And you only give us one data point.

Does your machine have many cores? #4317 and #5671 are about many threads. The screenshot shows details of an EpollEventLoop; we would expect there to be a cache there.

@cudothanh-Nhan
Copy link
Author

cudothanh-Nhan commented Jun 24, 2024

Our app run on machine with 48 cores. I could give you the full heap dump here.
https://drive.google.com/file/d/1ycFKIrlkxqTIVYuciAIw0j2RyZR4pupS/view?usp=sharing

Given that we expect there to be a cache for each EpollEventLoop but it seems too much memory.
From the heap dump, you can see one event loop contains about 16 DirectByteBuffers in small subpage area, each has capacity of 2MB. Meaning that each event loop occupies about 16 x 2 = 32 MB -> 40MB

Does it sound reasonable?
@ejona86

@cudothanh-Nhan
Copy link
Author

cudothanh-Nhan commented Jun 24, 2024

I also wonder whether we have a limitation on the number of DirectByteBuffers inside each subpage area

@ejona86
Copy link
Member

ejona86 commented Jun 24, 2024

gRPC reduces the subpage size to 2 MiB, to reduce memory. It also reduces the number of threads to number of cores. I think what's hurting here is the number of threads. If we reduced the number of threads by half, would that get into a reasonable state, or are you hoping for even more memory usage redection?

@cudothanh-Nhan
Copy link
Author

I mean while the size of each subpage is only 2MB, there is also a potential memory pressure when there are many objects of them. Even though if my server only has 1 cores, one eventloop can contains multiple subpage, 2MB each @ejona86

@cudothanh-Nhan
Copy link
Author

cudothanh-Nhan commented Jun 25, 2024

After diving deep inside netty implementation, while the number of PoolChunk object is stable (48 objects for 48 cores), I have found many DirectByteBuffer objects referenced by PoolThreadCache (about 1,154 objects as shown in the below image).

image

Given that my GRPC Client use default grpc executor which is, in turn, a cache thread executor.

Is native memory occupied by DirectByteBuffer freed after executor thread no longer exist? I think no because I see a lot of DirectByteBuffer objects are holding in PoolThreadCache

@cudothanh-Nhan
Copy link
Author

It seems that one PoolThreadCache can contain many DirectByteBuffer objects, so that if one PoolThreadCache contain 40 SmallSubPageDirectCaches, it can consume up to 2MB * 40 = 80 MB native memory.

image

Am I right?

@hakusai22
Copy link
Contributor

@ejona86 Hello, is there any progress on this issue?

@kannanjgithub
Copy link
Contributor

You are presenting the state of how memory is handled in grpc, but not necessarily indicative of a problem. There can be optimizations but each with their own trade-offs.

  1. Changing the number of threads is one of easier things to do. As mentioned by Eric above, the number of event loops is set to be equal to the number of cores available, but this can be further controlled via the system property io.netty.eventLoopThreads.

  2. Use heap memory via -Dio.grpc.netty.shaded.io.netty.noPreferDirect=true if your problem is with direct memory so it uses heap memory instead.

  3. Netty also provides a way of specifying a custom allocator via the ChannelOption.ALLOCATOR and pass it via the Channel Builder options (Although in this case you shouldn't be using grpc netty shaded).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants