Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: enhanced reliability of eth RPC methods with null checks and retry mechanisms #3349

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

quiet-node
Copy link
Member

@quiet-node quiet-node commented Dec 20, 2024

Description:
Numerous requests for eth_getBlockByNumber and eth_getBlockByHash have been reported as failing with errors such as Cannot read properties of null and Received an invalid integer type: null.

Additionally, eth_getTransactionReceipt is also reported to be failing with Error invoking RPC: Invalid parameter: hashOrNumber.

These issues stem from regressions in the getBlock() method, which lacks adequate null checks in certain areas. This pull request addresses the problem by implementing proper null checks and enhancing fallback mechanisms and error handling.

Partially, the results of contracts and log responses from the Mirror Node occasionally contain undefined fields, such as transaction_index, block_hash, block_number, and log_index. This issue may arise because the Mirror Node database does not have sufficient time to save all the information before it is fetched. This PR adds a retry mechanism to the fetching methods that checks for these cases, waits for a specified duration, and then retries the fetch operation.

Related issue(s):

Fixes #3345
Fixes #3351

Notes for reviewer:

Checklist

  • Documented (Code comments, README, etc.)
  • Tested (unit, integration, etc.)

Copy link

github-actions bot commented Dec 20, 2024

Test Results

 20 files   -   4  312 suites   - 49   38m 13s ⏱️ - 22m 7s
613 tests  -   6  604 ✅ + 10  4 💤 ±0  5 ❌  - 16 
700 runs   - 201  688 ✅  - 182  4 💤 ±0  8 ❌  - 19 

For more details on these failures, see this check.

Results for commit 051f326. ± Comparison against base commit 90739e1.

This pull request removes 6 tests.
"after all" hook in "RPC Server Acceptance Tests" ‑ RPC Server Acceptance Tests "after all" hook in "RPC Server Acceptance Tests"
"after each" hook for "Subscribe to multiple contracts on same subscription" ‑ RPC Server Acceptance Tests Acceptance tests @web-socket-batch-3 eth_subscribe "after each" hook for "Subscribe to multiple contracts on same subscription"
"before all" hook for "emits an approval event" ‑ RPC Server Acceptance Tests Acceptance tests @erc20 Acceptance Tests HTS token should behave like erc20 transfer from when the token owner is not the zero address when the recipient is not the zero address when the spender has enough allowance "before all" hook for "emits an approval event"
"before all" hook for "reverts" ‑ RPC Server Acceptance Tests Acceptance tests @erc20 Acceptance Tests HTS token should behave like erc20 transfer from when the token owner is not the zero address when the recipient is not the zero address when the spender does not have enough allowance when the token owner has enough balance "before all" hook for "reverts"
"before each" hook for "reverts" ‑ RPC Server Acceptance Tests Acceptance tests @erc20 Acceptance Tests HTS token should behave like erc20 transfer from when the token owner is not the zero address when the recipient is not the zero address when the spender does not have enough allowance "before each" hook for "reverts"
"before each" hook for "reverts" ‑ RPC Server Acceptance Tests Acceptance tests @erc20 Acceptance Tests HTS token should behave like erc20 transfer from when the token owner is not the zero address when the recipient is the zero address "before each" hook for "reverts"

♻️ This comment has been updated with latest results.

@quiet-node quiet-node modified the milestones: 0.62.0, 0.63.0 Dec 20, 2024
@quiet-node quiet-node force-pushed the 3346-failed-request-alert-tuning-investigate-and-resolve-the-bug-causing-the-received-an-invalid-integer-type-null-error-in-ethimplgetblock-function branch from 675eb4a to d265305 Compare December 20, 2024 23:07
Copy link
Collaborator

@Nana-EC Nana-EC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.
Some considerations

packages/relay/src/lib/clients/mirrorNodeClient.ts Outdated Show resolved Hide resolved
@@ -334,7 +335,7 @@ export class CommonService implements ICommonService {
new Log({
address: log.address,
blockHash: toHash32(log.block_hash),
blockNumber: numberTo0x(log.block_number),
blockNumber: nullableNumberTo0x(log.block_number),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Why would a block number be null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is one of the strange cases where block_number, transaction_index, or blockHash are falsy, leading to 500 errors. The retry mechanism should address the issue. Should I revert this change and throw a more informative error message even if those fields remain falsy after the retry? Hmm maybe that would be better so the clients can resubmit the requests themselves other than having null in the response. Wdyt?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, if a retry is ensured to resolve the issue then maybe we can manage one iteration in the short term.
We should open up an issue in the mirror node repo to investigate this issue so this can be resolved.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not return a nullable (or casted to 0x) block number IMO. This might seriously mess up third-party tools like the graph.

packages/relay/src/receiptsRootUtils.ts Show resolved Hide resolved
@@ -620,7 +620,10 @@ export class MirrorNodeClient {
requestDetails,
);

await this.cacheService.set(cachedLabel, block, MirrorNodeClient.GET_BLOCK_ENDPOINT, requestDetails);
if (block) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if I understand correctly, but the errors described in the issue were because of caching null blocks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an issue I noticed and an attempt securing the code. While it may or may not be directly related to the problem, we should ensure the block is not null before storing it in the cache. Otherwise, we risk repeatedly retrieving null from the cache, which likely has a TTL.

packages/relay/src/receiptsRootUtils.ts Show resolved Hide resolved
@quiet-node quiet-node force-pushed the 3346-failed-request-alert-tuning-investigate-and-resolve-the-bug-causing-the-received-an-invalid-integer-type-null-error-in-ethimplgetblock-function branch from 936f1e9 to b959fc7 Compare December 21, 2024 02:20
packages/relay/src/lib/clients/mirrorNodeClient.ts Outdated Show resolved Hide resolved
packages/relay/src/lib/clients/mirrorNodeClient.ts Outdated Show resolved Hide resolved
packages/relay/src/lib/eth.ts Outdated Show resolved Hide resolved
packages/relay/src/lib/eth.ts Show resolved Hide resolved
@quiet-node quiet-node force-pushed the 3346-failed-request-alert-tuning-investigate-and-resolve-the-bug-causing-the-received-an-invalid-integer-type-null-error-in-ethimplgetblock-function branch from 19d9838 to e1c8fd0 Compare December 21, 2024 23:14
@quiet-node quiet-node marked this pull request as draft December 21, 2024 23:14
@quiet-node quiet-node force-pushed the 3346-failed-request-alert-tuning-investigate-and-resolve-the-bug-causing-the-received-an-invalid-integer-type-null-error-in-ethimplgetblock-function branch 7 times, most recently from 3b96fc4 to aad493c Compare December 22, 2024 20:12
@quiet-node quiet-node force-pushed the 3346-failed-request-alert-tuning-investigate-and-resolve-the-bug-causing-the-received-an-invalid-integer-type-null-error-in-ethimplgetblock-function branch 2 times, most recently from a5a0e1f to 78fa644 Compare December 22, 2024 21:07
@quiet-node quiet-node force-pushed the 3346-failed-request-alert-tuning-investigate-and-resolve-the-bug-causing-the-received-an-invalid-integer-type-null-error-in-ethimplgetblock-function branch 7 times, most recently from bac3c74 to d7f5829 Compare December 23, 2024 15:22
Comment on lines +727 to +731
this.logger.warn(
e,
`${requestDetails.formattedRequestId} Transaction failed while executing transaction via the SDK: transactionId=${transaction.transactionId}, callerName=${callerName}, txConstructorName=${txConstructorName}`,
);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for Rereviewers:
This is an effort to enhance debugging.

Comment on lines +1675 to +1676
await relay.pollForValidTransactionReceipt(transactionHash);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for Rereviewers:
This is an effort to stablize the flaky test.

Signed-off-by: Logan Nguyen <[email protected]>

Revert "fix: reverted licenses"

This reverts commit d3c860a.

Reapply "fix: reverted licenses"

This reverts commit 50a9acd5031490f3f805042a70bfdae7c589146d.
@quiet-node quiet-node force-pushed the 3346-failed-request-alert-tuning-investigate-and-resolve-the-bug-causing-the-received-an-invalid-integer-type-null-error-in-ethimplgetblock-function branch from d6756f0 to 8687322 Compare December 23, 2024 18:19
@quiet-node quiet-node changed the title fix: added proper null checks for getBlock() method fix: enhanced reliability of eth RPC methods with null checks and retry mechanisms Dec 23, 2024
@quiet-node quiet-node modified the milestones: 0.63.0, 0.63.1 Dec 26, 2024
@@ -175,6 +177,20 @@ export class CommonService implements ICommonService {
returnLatest?: boolean,
): Promise<any> {
if (!returnLatest && this.blockTagIsLatestOrPending(blockNumberOrTagOrHash)) {
if (this.logger.isLevelEnabled('debug')) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why is it needed to query for the log level? can we just interpolate the log message? https://github.com/pinojs/pino/blob/main/docs/api.md#interpolationvalues-any

This would be just to simplify the log logic.

requestDetails: RequestDetails,
): Promise<any> {
const shortDelay = 500;
const contractResult = await this[methodName](...args);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was this issue with calling getContractResult directly? I see all calls to getContractResultWithRetry use the same methodName=this.mirrorNodeClient.getContractResults.name, is there any other benefit of calling it indirectly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, there’s a rare case where the contract result contains null fields, as mentioned here: #3349 (comment). Calling getContractResultWithRetry() will invoke getContractResult, and if this rare case occurs, it will attempt the call again. The plan is to release this change first to gather more information about the getContractResult issue and eventually open a ticket on the MN side for further investigation.

@quiet-node quiet-node requested review from Nana-EC and acuarica January 6, 2025 18:11
@Nana-EC Nana-EC requested review from konstantinabl and nadezhdapopovaa and removed request for stoyanov-st January 7, 2025 06:29
Copy link
Collaborator

@natanasow natanasow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG, one question/thought that might be resolved before merging.

@quiet-node quiet-node force-pushed the 3346-failed-request-alert-tuning-investigate-and-resolve-the-bug-causing-the-received-an-invalid-integer-type-null-error-in-ethimplgetblock-function branch from 9f2e473 to 051f326 Compare January 8, 2025 00:15
Copy link

sonarqubecloud bot commented Jan 8, 2025

Quality Gate Failed Quality Gate failed

Failed conditions
19.3% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

Copy link

codecov bot commented Jan 8, 2025

Codecov Report

Attention: Patch coverage is 80.00000% with 11 lines in your changes missing coverage. Please review.

Project coverage is 84.83%. Comparing base (13d09e9) to head (051f326).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
packages/relay/src/lib/clients/mirrorNodeClient.ts 79.16% 2 Missing and 3 partials ⚠️
.../lib/services/ethService/ethCommonService/index.ts 70.58% 3 Missing and 2 partials ⚠️
packages/relay/src/receiptsRootUtils.ts 87.50% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3349      +/-   ##
==========================================
- Coverage   85.30%   84.83%   -0.47%     
==========================================
  Files          69       69              
  Lines        4688     4721      +33     
  Branches     1050     1063      +13     
==========================================
+ Hits         3999     4005       +6     
- Misses        374      399      +25     
- Partials      315      317       +2     
Flag Coverage Δ
config-service 98.14% <ø> (ø)
relay 79.50% <74.54%> (-0.23%) ⬇️
server 83.28% <ø> (ø)
ws-server 36.66% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
packages/relay/src/lib/clients/sdkClient.ts 67.81% <100.00%> (-3.31%) ⬇️
packages/relay/src/lib/eth.ts 86.14% <100.00%> (-0.54%) ⬇️
...kages/relay/src/lib/services/debugService/index.ts 85.71% <100.00%> (ø)
packages/relay/src/receiptsRootUtils.ts 96.07% <87.50%> (-1.80%) ⬇️
packages/relay/src/lib/clients/mirrorNodeClient.ts 88.62% <79.16%> (-1.13%) ⬇️
.../lib/services/ethService/ethCommonService/index.ts 90.14% <70.58%> (-3.76%) ⬇️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
4 participants