-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce peer message traffic for ledger data #5126
base: develop
Are you sure you want to change the base?
Conversation
* Also log as warning when the state lowers
c357abb
to
9e5487c
Compare
09c4156
to
c69b443
Compare
* Allow a retry after 30s in case of peer or network congestion. * Addresses RIPD-1870 * (Changes levelization. That is not desirable, and will need to be fixed.)
* Allow a retry after 15s in case of peer or network congestion. * Collate duplicate TMGetLedger requests: * The requestCookie is ignored when computing the hash, thus increasing the chances of detecting duplicate messages. * With duplicate messages, keep track of the different requestCookies (or lack of cookie). When work is finally done for a given request, send the response to all the peers that are waiting on the request, sending a separate message for each requestCookie. * Addresses RIPD-1871
* Addresses RIPD-1869 --------- Co-authored-by: Valentin Balaschenko <[email protected]> Co-authored-by: Ed Hennis <[email protected]>
* When work is done for a given TMGetLedger request, send the response to all the peers that are waiting on the request, sending one message per peer, including all the cookies and a "directResponse" flag indicating the data is intended for the sender, too.
c69b443
to
e490e57
Compare
include/xrpl/basics/CanProcess.h
Outdated
insert() | ||
{ | ||
std::unique_lock<Mutex> lock_(mtx_); | ||
bool exists = collection_.contains(item_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would avoid extra lookup
auto [_, inserted] = collection_.insert(item_);
return inserted;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Fixed.
* Avoid an unnecessary lookup in CanProcess
* upstream/develop: Set version to 2.3.0-b4 feat(SQLite): allow configurable database pragma values (5135) refactor: re-order PRAGMA statements (5140) fix(book_changes): add "validated" field and reduce RPC latency (5096) chore: fix typos in comments (5094) Set version to 2.2.3 Update SQLite3 max_page_count to match current defaults (5114)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only minor comments. Happy to approve once addressed.
@@ -623,6 +623,13 @@ to_string(base_uint<Bits, Tag> const& a) | |||
return strHex(a.cbegin(), a.cend()); | |||
} | |||
|
|||
template <std::size_t Bits, class Tag> | |||
inline std::string | |||
to_short_string(base_uint<Bits, Tag> const& a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit]: Adding checks for the to_short_string in the base_unit_test next to existing to_string cases would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't call that a nit. Missing test coverage is pretty significant. Thanks for catching it. Fixed.
src/xrpld/overlay/detail/PeerImp.cpp
Outdated
return ledger; | ||
} | ||
|
||
JLOG(p_journal_.trace()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit]: Should this be a warn
instead of trace
? Not having a peer to relay the request may indicate some configuration or environment issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily, though. It could mean that the node has already sent the request to all it's peers. Also not that the original message is trace
.
But there's something odd here. It looks like this code block was somehow duplicated! I must have messed up resolving a conflict when I rebased from master
to develop
. I've removed the duplicate.
src/xrpld/overlay/detail/PeerImp.cpp
Outdated
@@ -2936,7 +3078,9 @@ getPeerWithLedger( | |||
void | |||
PeerImp::sendLedgerBase( | |||
std::shared_ptr<Ledger const> const& ledger, | |||
protocol::TMLedgerData& ledgerData) | |||
protocol::TMLedgerData& ledgerData, | |||
std::map<std::shared_ptr<Peer>, std::set<std::optional<uint64_t>>> const& |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::map<std::shared_ptr<Peer>, std::set<std::optional<uint64_t>>>
is mentioned in four places. It would be more readable to define an alias:
using PeerCookieMap = std::map<std::shared_ptr<Peer>, std::set<std::optional<uint64_t>>>;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion! Fixed.
* Add unit tests for to_short_string(base_uint * Remove duplicated code * Use type aliases for cookie maps
* That's what I get for rushing to push
* upstream/develop: Expand Error Message for rpcInternal (4959) docs: clean up API-CHANGELOG.md (5064)
* upstream/develop: Consolidate definitions of fields, objects, transactions, and features (5122) Ignore reformat when blaming Reformat code with clang-format-18 Update pre-commit hook Update clang-format settings Update clang-format workflow
* upstream/develop: Add hubs.xrpkuwait.com to bootstrap (5169) docs: Add protobuf dependencies to linux setup instructions (5156) fix: reject invalid markers in account_objects RPC calls (5046) Update RELEASENOTES.md (5154) Introduce MPT support (XLS-33d): (5143)
* upstream/develop: Add AMMClawback Transaction (XLS-0073d) (5142)
* upstream/develop: Fix unity build (5179)
* upstream/develop: Set version to 2.3.0-rc1 Replace Uint192 with Hash192 in server_definitions response (5177) Fix potential deadlock (5124) Introduce Credentials support (XLS-70d): (5103) Fix token comparison in Payment (5172) Add fixAMMv1_2 amendment (5176)
High Level Overview of Change
Several changes to help reduce message traffic and improve logging and visibility.
TMGetLedger
andTMLedgerData
messages, reducing the overhead of processing those messages.TMLedgerData
message, which allows multiple identical requests to be replied to with one message.These changes are organized into several commits which are organized logically separating each functional operation. They can be merged as-is, or squashed.
Context of Change
Analysis of the issue that led to #5115 identified heavy
TMGetLedger
request andTMLedgerData
response traffic between nodes leading up to the syncing incidents. It was later determined that those messages were more a symptom of the problem, and not the root cause. However, leading up to identification of the root cause, these changes were being implemented to cut down on those messages, detect duplicates, etc. That reduction in unnecessary traffic is still valuable, so it's being included here.Type of Change
API Impact
None.
Test Plan
Future Tasks
I am still working on a follow-up to #4764 that makes use of these changes and other improvements to reduce the number of requests initiated by a given node in the first place.