Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extreme CPU Overload due incomplete I/O operations #4602

Closed
ByteExceptionM opened this issue Apr 25, 2024 · 15 comments
Closed

Extreme CPU Overload due incomplete I/O operations #4602

ByteExceptionM opened this issue Apr 25, 2024 · 15 comments
Labels
Priority: High Unconfirmed Bug/Not Currently Replicable The bug reported is unconfirmed or unable to be replicated.

Comments

@ByteExceptionM
Copy link
Contributor

ByteExceptionM commented Apr 25, 2024

Describe the bug

Since I update from build 478 to the latest build 512 for 1.20.80 support, I have had extreme CPU peaks. Diff:
b469904...2471de1

To Reproduce

Update to latest build

Expected behaviour

No CPU overload

Screenshots / Videos

image
image
image
image

Server Version and Plugins

No response

Geyser Dump

No response

Geyser Version

2.2.3-SNAPSHOT 2471de1

Minecraft: Bedrock Edition Device/Version

No response

Additional Context

No response

@onebeastchris
Copy link
Member

Please send a spark report - that should show what Geyser is doing. Here's how:

Spark is a plugin that helps you monitor performance for you server.
https://spark.lucko.me/download

To record performance on your server use:
/spark profiler --thread * --timeout 60. This will run for 60 seconds then it will automatically stop.
It'll probably lag the server a good deal but it'll give us a link we might be able to process.

@ByteExceptionM
Copy link
Contributor Author

Geyser runs standalone, cannot start a spark profiler here

@onebeastchris
Copy link
Member

If you're able to compile it yourself, we do have a spark geyser extension that you could use: https://github.com/GeyserMC/spark
If you're not able to compile it, I could send a build of it in a few hours

@ByteExceptionM
Copy link
Contributor Author

All right. I'll take care of it

@ByteExceptionM
Copy link
Contributor Author

I now have the Spark extension on the Geyser application. However, I cannot execute the command because it is not found. The following message appears in the logs:
image
But also this one:
image

I have also installed Spark on all sub-servers. The command for Geyser Spark is also spark. In other words, I can't really run it now. I can't run anything in the console either (Docker container without the correct attach & interact flags). I can't restart the Geyser instances either, as there are currently a lot of players online. Able to debug it by urself?

@ByteExceptionM
Copy link
Contributor Author

ByteExceptionM commented Apr 25, 2024

iotop:
image

Geyser is running in a docker container. Path is /data/server.jar

As you can see here in the screenshot, Geyser blocks many processes with IO operations that are not closed. This leads to high CPU utilization.

@onebeastchris
Copy link
Member

I'm unable to try and replicate the issue at the moment, but I will try and fix the spark extension so that could be used to get proper data to resolve this issue

@onebeastchris
Copy link
Member

GeyserMC/spark#1 This should resolve the issue with spark not working.
Here's a working build:
spark-1.10.0-geyser.zip

Does this issue occur at some specific playercount? In any case, without some concrete data on what's causing the high usage uptick it'll be difficult to guess what the issue is caused by.

@ByteExceptionM
Copy link
Contributor Author

Sounds great. Already checked out your branch and deploying it to start a profiler.

In any case, the CPU load increases with the number of players - it doesn't go higher than 90%. The number of processes waiting for I/O doesn't change much, either. Enclosed screenshots

image
image

@Kas-tle
Copy link
Member

Kas-tle commented Apr 25, 2024

At this point it would help quite a bit if you could isolate the issue to a certain commit as the range you've provided is quite a bit to go through, especially given we cannot reproduce the issue due to your complex setup.

@ByteExceptionM
Copy link
Contributor Author

ByteExceptionM commented Apr 25, 2024

I am already debugging with Chris in dm. It's not really possible for me to search through your code or check what the problem is. The Geyser Spark extension had some problems - which Chris has now fixed. I currently have the Geyser traffic routed to another machine so I can make the changes and debugs there. We are a network with almost 5,000 different players - unfortunately it's not that easy with restarts. As soon i get more information, ill come back. Im on it!

@ByteExceptionM
Copy link
Contributor Author

The error has not occurred again to date. Could not trace the source of the error. If it occurs again, I'll be sure to profile it properly with the fixed spark extension. I will reopen the issue, when it occurs again - but close the issue here at this point.

@ByteExceptionM ByteExceptionM closed this as not planned Won't fix, can't repro, duplicate, stale Apr 28, 2024
@nicolube
Copy link

nicolube commented Sep 20, 2024

We're heaving the same behavior...

grafik

We're currently running on an Advance-3 Gen 2 from OVH.
HW Specs:

CPU

  • AMD Ryzen 9 5900X - 12c/24t - 3.7 GHz/4.8 GHz

RAM

  • 128 GB ECC 2666 MHz

Storage:

  • 2×960 GB SSD NVMe
  • Soft RAID

We're running the latest geyser-version, it will get updated every day at 4am.

We're running ur geyser standalone and has an resourcepack.
Everything else is basically vanilla.

And just added geyser-spark to it.

@onebeastchris
Copy link
Member

@nicolube please open a new issue instead of commenting here. Further, please attach further information, such as a spark profiler run and similar - server specs alone and rather vague screenshots are unfortunately not particularly helpful to debug the issue. Thanks!

@nicolube
Copy link

@onebeastchris
Hello, I opened a new issue, and attacked a spark profile, I just did not have one with load this morning.
#5050

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: High Unconfirmed Bug/Not Currently Replicable The bug reported is unconfirmed or unable to be replicated.
Projects
None yet
Development

No branches or pull requests

4 participants