-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
khepri_cluster: Use key metrics to determine if a Ra server is running #292
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #292 +/- ##
=======================================
Coverage 89.67% 89.67%
=======================================
Files 21 21
Lines 3187 3187
=======================================
Hits 2858 2858
Misses 329 329
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
[Why] The previous use of `ra:ping/2` was too expensive. As `khepri_cluster:is_store_running/1` is now used by `mnesia_to_khepri:is_migration_finished/2` and `mnesia_to_khepri:hande_fallback/5` since khepri_mnesia_migration 0.6.0, we saw a regression in performance in RabbitMQ because of this. khepri_mnesia_migration was using a very basic and incomplete version of `is_store_running()` before. That's why the issue was not spotted earlier. [How] The new code uses `ra:key_metrics/2` which simply checks if the process us running and query a few local counters. This is way faster because it does not send messages to the Ra server.
eb270ef
to
b47b2fa
Compare
would it not be even faster to avoid creating that map (that isn't used) and just do an erpc with a |
I thought about that but that’s a bit too much knowledge of the Ra implementation. As far as Khepri is concerned, the server ID is opaque.
|
We could consider adding a I tried this branch in RabbitMQ and the change to use |
What about @the-mikedavis: I see you approved this pull request. Do you believe we should merge it as is instead of waiting for a more appropriate API in Ra? |
Since we need this to restore performance in RabbitMQ and the performance is already pretty good I think we should merge this as-is. I was thinking that we should make changes to Ra as a follow-up. |
@the-mikedavis I agree with you. We can always optimize things some more in |
The |
Is it guaranteed to always be the registered name of the process (at least until the next major version)? Not that we would want to do this but Ra could change to use the first element in the tuple to lookup the actual process in an ETS table without changing the type of |
If we decided to introduce a means of dynamically discovering the remote pid of a server it would be encoded as an explicit new server_id() type case which newly declared servers would have to opt into to use. I don't see the current approach disappear ever. We don't even have a reasonable way to do dynamic discovery or any ideas of how to do it will (without depending on some other consensus system). |
Why
The previous use of
ra:ping/2
was too expensive.As
khepri_cluster:is_store_running/1
is now used bymnesia_to_khepri:is_migration_finished/2
andmnesia_to_khepri:hande_fallback/5
since khepri_mnesia_migration 0.6.0, we saw a regression in performance in RabbitMQ because of this.khepri_mnesia_migration was using a very basic and incomplete version of
is_store_running()
before. That's why the issue was not spotted earlier.How
The new code uses
ra:key_metrics/2
which simply checks if the process us running and query a few local counters. This is way faster because it does not send messages to the Ra server.