Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Event and Aggreagations threads to use only one threadpool #475

Merged
merged 17 commits into from
Dec 3, 2020

Conversation

tobiasake
Copy link
Contributor

@tobiasake tobiasake commented Sep 8, 2020

Applicable Issues

Today EI consume a lot of CPU and Memory reasources due to EI can consume a lot events that causes spawning a lot of threads that causes high JVM CPU and Memory load.
Since each thread has also has a MongoDB connector, it will cause high load on MongoDB as well.

Description of the Change

In some deployments it has been reported that EI consumes to much resource(CPU, memory) and this due to that we use unlimited number of threads for matching aggregations with subscriptions jmePaths and in same time a lot of MongoDB queries is made from these threads, which causes extra load on MongoDb.

With this change only one and same thread that consumes the event from MessageBus is used through whole EI.

By doing this, we avoid using too much memory and too much CPU.
If the system that is used can handle more events and aggregations, then increase the threadpool size configuration in application.properties file.

This includes the fix for duplication of AggreagationsTtl fields in Infromation RestApi.
Old and faulty "/information" entrypoint return value:
"objectHandler" : {
"aggregationsCollectionName" : "aggregations",
"databaseName" : "eiffel_intelligence",
"ttl" : 0
"aggregationsTtl" : ""
},
With this PR change, which is how it should be:
"objectHandler" : {
"aggregationsCollectionName" : "aggregations",
"databaseName" : "eiffel_intelligence",
"aggregationsTtl" : ""
},

Fixed MongoDbHandler so MongoDb connection is restored in case of MongoDB connection goes down and comes up again, then connection is automatically restored.

Alternate Designs

Benefits

Reduce load on JVM and MongoDb and minimize overload on MongoDb.

Possible Drawbacks

EI might consume events in a slower pace.

Sign-off

Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or

(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or

(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.

(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.

Signed-off-by: @tobiasake

@saif-ericsson saif-ericsson self-requested a review September 15, 2020 12:04
@m-linner-ericsson
Copy link
Member

Why just merge #465 into master?

@tobiasake
Copy link
Contributor Author

Well its been several discussions and meetings about that through 3-4 months since it was forced merge without any tests nor verifications. And meetings that I have not been part of, it seems.
And other issues has been brought due to that change. I wrote one of the issue in a ticket here:
#469

This is another approach to solve it where I removed this unlimited threads for the last aggregations and subscription matching step, where we use same thread from from consuming event from messagebus until aggregations and subscritpioning match is finished.
With this change we have only one threadpool to control also.
And events/aggregations is more persistent, since we don't queue threads(in threadpools queues etc.) in memory, instead these events/aggregations is waiting in Messagebus queue until EI has resources to process them.
I will not go into more details than this, since this has been discussed long time.
This is my suggestion on the solution to make it more persistent and simpler controlling the resources/load and simpler architecture.

Copy link
Member

@m-linner-ericsson m-linner-ericsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't get why not to use #465 it has been out there and tried. We don't have any automatic load tests but it has been tested manually.

Also, could you explain the selection of the numbers below?

Comment on lines +78 to +81
threads.core.pool.size: 200
threads.queue.capacity: 7000
threads.max.pool.size: 250
scheduled.threadpool.size: 200
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on where you got these values, you increased them but why are the new values good?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failover tests and some other tests fails if we don't use specific size thread pool. I don't know why, seem to be timing issues when things should be completed and not completed in time. I don't know tests and details of the tests, so can't answer why. Maybe those that wrote the tests can answer your question.
Since I removed the unlimit number of threads, the size of the single thread pool can be increased also.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But where did you get the values?

Copy link
Contributor Author

@tobiasake tobiasake Sep 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the values until tests succeeds, same as it has been done earlier I guess.
This question has been raised earlier, no one have given any better answer, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When application is deployed in production, user can set other values.

@tobiasake
Copy link
Contributor Author

I still don't get why not to use #465 it has been out there and tried. We don't have any automatic load tests but it has been tested manually.

EI Still crash, was on a meeting 2 weeks ago and it still crash.
They have got it working okay by setting the ttl on aggregated object to 10 min, but customer pipeline lasted longer than 10 min so some event was never aggregated and no Subscription was triggered.

I implemented(Or had looked att the code and was going to do the implementation) this already in June, when you and Emily created the other solution.
We were going to test both this the other solution in.
I was on the way to setup a test environment in Kubernetes with both solution in June, but in the middle of that the the other solution was forced merge without any tests and so on.
And now its been several issues since then.
And customer is planning to upgrade to EI 3.0, than I thought we could try another approach to decrease the load and is simpler to control by have only one threadpool and also have more persistence on event that stay in MessageBus Queue until EI can process them.

@m-linner-ericsson
Copy link
Member

Do we still have problems with #465. Please note that this one is on 2.x branch?

@tobiasake
Copy link
Contributor Author

tobiasake commented Sep 21, 2020

Yepp, I know thats on 2.x. But we still have the origin solution with unlimited thread on master.
Thats why I suggest another approach/solution on master branch.

Copy link
Member

@henning-roos henning-roos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All PR:s in Eiffel should have a description that follows the provided template. Especially important is the sign-off at the end. Please provide this before merging.

@tobiasake
Copy link
Contributor Author

To all of you that want to have more threads executing in parallel as the solution in EI 2.x that comes from the 2.0-maintenance branch which causing high load problem on JVM and MongoDB in some environments where its 100000+ number of events,
can we wait and see how this solution works. If it works good, then we maybe can stay on this solution with only one thread pool.
If it not works well, then you others that prefer the EI 2.0 Mainatenace branch solution with two thread pools make PR with change for EI 3.0.

@m-linner-ericsson
Copy link
Member

In the solution on 2.0 we can set the thread pool with configuration here it is hardcoded.

@tobiasake
Copy link
Contributor Author

tobiasake commented Nov 30, 2020

No, it is configurable with these properties, provided as Java properties to command line or set the properties in a application.properties file in same folder as Java executes from or in a config/ folder. This is the properties that configure the thread pool:
threads.core.pool.size
threads.queue.capacity
threads.queue.capacity
threads.max.pool.size
threads.max.pool.size

In 2.0 maintenance branch releases the second thread pool is hard coded.

Copy link

@raja-maragani raja-maragani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants