Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(connector-besu): do not crash if ledger unreachable - send HTTP 503 #3406

Open
petermetz opened this issue Jul 12, 2024 · 5 comments · May be fixed by #3573
Open

fix(connector-besu): do not crash if ledger unreachable - send HTTP 503 #3406

petermetz opened this issue Jul 12, 2024 · 5 comments · May be fixed by #3573
Assignees
Labels
Besu P2 Priority 2: High
Milestone

Comments

@petermetz
Copy link
Contributor

petermetz commented Jul 12, 2024

Description

Right now the API server will exit with a crash if the Besu connector receives a request and the backing ledger is not accessible.
Instead of a full-on crash, we need to treat this as a recoverable error and send back HTTP 503 - Service Unavailable
with a Retry-After header with exponential backoff where it starts from 1 second and the doubles every re-try.

The easiest way to reproduce is to follow this guide and remove the part of the command from step 2 that binds the API server to the host machine's network: --network=host \

https://github.com/hyperledger/cacti/tree/main/packages/cactus-plugin-ledger-connector-besu#testing-api-calls-with-the-container

Acceptance Criteria

  1. Test coverage included for this case.
  2. Test verifies that API server does not crash in the mentioned use-case.
@petermetz petermetz added Besu P2 Priority 2: High labels Jul 12, 2024
@petermetz petermetz added this to the v2.0.0-rc.3 milestone Jul 12, 2024
@jagpreetsinghsasan
Copy link
Contributor

jagpreetsinghsasan commented Jul 30, 2024

Looks like Orion isn't configured correctly in the cactus-besu-all-in-one:2024-07-04-8c030ae.
@ruzell22 please check if the issue persists on the latest tag as well or not. If the error still persists, we will increase the scope of this task with a potential fix for the same.

UPDATE: I will create an issue to remove tessera, orion base images from the besu aio (besu mp has it and is used for private tx)

@petermetz
Copy link
Contributor Author

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Retry-After
Retry-After - HTTP | MDN
The Retry-After response HTTP header indicates how long
the user agent should wait before making a follow-up request. There are three main cases
this header is used:

Exponential back-off - the delay between retries is increasing exponentially with every retry.
So first you wait 2 seconds, 4, 8, etc.

@petermetz
Copy link
Contributor Author

Let's say start with the run-transaction endpoint of the besu connector and do these things:

  1. Refactor the method within the connector so that it throw the specific kind of exception, let's say you create a new exception type called BesuLedgerInaccessibleException which is a sub-class of Error and you throw this when you detect the issue in the connector that a transaction was not possible to be executed because the network connection to the ledger is broken.
  2. Then in the HTTP endpoint implementation in the handleRequest method you can write some logic in the catch block that sets the retry-after headers (linked above) but it only should do this if the exception you caught (that was thrown by the connector) is an instanceof BesuLedgerInaccessibleException so that way we know for sure that this was the problem.

@petermetz petermetz modified the milestones: v2.0.0-rc.3, v2.0.0-rc.4 Aug 4, 2024
@petermetz
Copy link
Contributor Author

web3/web3.js#6900

@petermetz
Copy link
Contributor Author

petermetz commented Aug 20, 2024

@ruzell22 I figured it out! The issue is deep within the web3 library itself that was causing the crashes. I pushed a commit to your branch with fixes that make it work so that it doesn't crash the entire nodejs process anymore when the connection is down to the ledger.

The other issue with the code was that you were having the re-try logic in the catch block within the endpoint request handler and this was trying to modify the http headers after the response was sent which was another thing that was ALSO crashing the entire nodejs process (which is why seemingly none of my fixes worked at first that were targeted at the web3 library itself).

So now the code is like this:

  1. yarn patch to [email protected] so that it doesn't crash the nodejs process
  2. reverted your changes in the request handler so that those also don't crash the nodejs process
  3. in the current state it's not returning a 503 but a 500

What I need you to do to finish the task:

  1. Make sure that when the connection is down an http 503 code is returned (this part of your change was good)
  2. Add a static retry-after header value of 5 seconds (I'm dropping the requirement to have the exponential back-off for now because we've spent too much time on this task already so the exponential back-off is something we can address later)
  3. Add test cases asserting that when the ledger connection is down the API call fails with 503 and not a generic 500.

To verify the changes you make, take a look at this snippet (that I'll also add to the cmd-api-server readme). You can use this to mount your local changes on the besu connector code into the api server container and then configure the api server to not install from the registry but instead import the local package you have on your file-system (this is what I used to verify my fixes instead of publishing to npm every time)

  • Launch container with plugins imported from the local file-system when you wish to test your changes that are not on the package (npm/ghcr/etc.) registries yet. Two things to note here: First the --volume parameter that mounts the project directory from the host machine into the container. Second the packageSrc parameter pointing to the plugin's source directory on the file-system.
    docker run \
      --rm \
      --volume ${PWD}:/usr/src/cacti \
      --publish 3000:3000 \
      --publish 4000:4000 \
      --env AUTHORIZATION_PROTOCOL='NONE' \
      --env AUTHORIZATION_CONFIG_JSON='{}' \
      --env GRPC_TLS_ENABLED=false \
      --env API_TLS_CERT_PEM=- \
      --env API_TLS_CLIENT_CA_PEM=- \
      --env API_TLS_KEY_PEM=- \
      --env API_TLS_ENABLED=false \
      --env API_MTLS_ENABLED=false \
      --env API_HOST=0.0.0.0 \
      --env LOG_LEVEL=DEBUG \
      --env PLUGINS='[{"packageName": "@hyperledger/cactus-plugin-ledger-connector-besu", "type": "org.hyperledger.cactus.plugin_import_type.LOCAL", "action": "org.hyperledger.cactus.plugin_import_action.INSTALL",  "options": { "packageSrc": "/usr/src/cacti/packages/cactus-plugin-ledger-connector-besu", "rpcApiWsHost": "http://127.0.0.1:8546", "rpcApiHttpHost": "http://127.0.0.1:8545", "instanceId": "some-unique-besu-connector-instance-id"}}]' \
      cas

petermetz referenced this issue in petermetz/cacti Sep 4, 2024
This came up during a discussion here and I thought it best to document
it a little more thoroughly so that later it can be referenced for others
as well:
https://github.com/hyperledger/cacti/issues/3406#issuecomment-2299654552

Signed-off-by: Peter Somogyvari <[email protected]>
@petermetz petermetz modified the milestones: v2.0.0-rc.4, v2.0.0-rc.5 Sep 9, 2024
petermetz referenced this issue in petermetz/cacti Sep 17, 2024
This came up during a discussion here and I thought it best to document
it a little more thoroughly so that later it can be referenced for others
as well:
https://github.com/hyperledger/cacti/issues/3406#issuecomment-2299654552

Signed-off-by: Peter Somogyvari <[email protected]>
petermetz referenced this issue Sep 18, 2024
This came up during a discussion here and I thought it best to document
it a little more thoroughly so that later it can be referenced for others
as well:
https://github.com/hyperledger/cacti/issues/3406#issuecomment-2299654552

Signed-off-by: Peter Somogyvari <[email protected]>
@petermetz petermetz modified the milestones: v2.0.0-rc.5, v2.0.0-rc.6 Sep 30, 2024
ruzell22 added a commit to ruzell22/cactus that referenced this issue Oct 4, 2024
---------------
1. Yarn patch to [email protected] so that it does not crash nodejs process
2. It is returning a 503 instead of a 500
3. Added a static retry-after header value of 5 seconds

Fixes: hyperledger-cacti#3406

Co-authored-by: Peter Somogyvari <[email protected]>

Signed-off-by: ruzell22 <[email protected]>
Signed-off-by: Peter Somogyvari <[email protected]>
ruzell22 added a commit to ruzell22/cactus that referenced this issue Oct 4, 2024
Primary Changes
---------------
1. Yarn patch to [email protected] so that it does not crash nodejs process
2. It is returning a 503 instead of a 500
3. Added a static retry-after header value of 5 seconds

Fixes: hyperledger-cacti#3406

Co-authored-by: Peter Somogyvari <[email protected]>

Signed-off-by: ruzell22 <[email protected]>
Signed-off-by: Peter Somogyvari <[email protected]>
ruzell22 added a commit to ruzell22/cactus that referenced this issue Oct 16, 2024
Primary Changes
---------------
1. Yarn patch to [email protected] so that it does not crash nodejs process
2. It is returning a 503 instead of a 500
3. Added a static retry-after header value of 5 seconds

Without the catch block, the rejection is an unhandled rejection that bubbles
up to the top of the callstack where NodeJS itself catches it and then crashes
the entire process. (used to be that it just logged a warning but since some
of the newer versions it crashes which allows us to find these bugs in our
code / library's code)

Fixes: hyperledger-cacti#3406

Co-authored-by: Peter Somogyvari <[email protected]>

Signed-off-by: ruzell22 <[email protected]>
Signed-off-by: Peter Somogyvari <[email protected]>
@petermetz petermetz modified the milestones: v2.0.0-rc.6, v2.1.0 Oct 18, 2024
ruzell22 added a commit to ruzell22/cactus that referenced this issue Oct 28, 2024
Primary Changes
---------------
1. Yarn patch to [email protected] so that it does not crash nodejs process
Without the catch block, the rejection is an unhandled rejection that bubbles
up to the top of the callstack where NodeJS itself catches it and then crashes
the entire process. (used to be that it just logged a warning but since some
of the newer versions it crashes which allows us to find these bugs in our
code / library's code)
2. It is returning a 503 instead of a 500
3. Added a static retry-after header value of 5 seconds
4. Added test case in test-ledger which has run-transaction at the end that is
expected to give error 503 when the backing ledger is unavailable

Fixes: hyperledger-cacti#3406

Co-authored-by: Peter Somogyvari <[email protected]>

Signed-off-by: ruzell22 <[email protected]>
Signed-off-by: Peter Somogyvari <[email protected]>
ruzell22 added a commit to ruzell22/cactus that referenced this issue Nov 4, 2024
Primary Changes
---------------
1. Yarn patch to [email protected] so that it does not crash nodejs process
Without the catch block, the rejection is an unhandled rejection that bubbles
up to the top of the callstack where NodeJS itself catches it and then crashes
the entire process. (used to be that it just logged a warning but since some
of the newer versions it crashes which allows us to find these bugs in our
code / library's code)
2. It is returning a 503 instead of a 500
3. Added a static retry-after header value of 5 seconds
4. Added test case in test-ledger which has run-transaction at the end that is
expected to give error 503 when the backing ledger is unavailable

Fixes: hyperledger-cacti#3406

Co-authored-by: Peter Somogyvari <[email protected]>

Signed-off-by: ruzell22 <[email protected]>
Signed-off-by: Peter Somogyvari <[email protected]>
ruzell22 added a commit to ruzell22/cactus that referenced this issue Nov 4, 2024
Primary Changes
---------------
1. Yarn patch to [email protected] so that it does not crash nodejs process
Without the catch block, the rejection is an unhandled rejection that bubbles
up to the top of the callstack where NodeJS itself catches it and then crashes
the entire process. (used to be that it just logged a warning but since some
of the newer versions it crashes which allows us to find these bugs in our
code / library's code)
2. It is returning a 503 instead of a 500
3. Added a static retry-after header value of 5 seconds
4. Added test case in test-ledger which has run-transaction at the end that is
expected to give error 503 when the backing ledger is unavailable

Fixes: hyperledger-cacti#3406

Co-authored-by: Peter Somogyvari <[email protected]>

Signed-off-by: ruzell22 <[email protected]>
Signed-off-by: Peter Somogyvari <[email protected]>
ruzell22 added a commit to ruzell22/cactus that referenced this issue Nov 4, 2024
Primary Changes
---------------
1. Yarn patch to [email protected] so that it does not crash nodejs process
Without the catch block, the rejection is an unhandled rejection that bubbles
up to the top of the callstack where NodeJS itself catches it and then crashes
the entire process. (used to be that it just logged a warning but since some
of the newer versions it crashes which allows us to find these bugs in our
code / library's code)
2. It is returning a 503 instead of a 500
3. Added a static retry-after header value of 5 seconds
4. Added test case in test-ledger which has run-transaction at the end that is
expected to give error 503 when the backing ledger is unavailable

Fixes: hyperledger-cacti#3406

Co-authored-by: Peter Somogyvari <[email protected]>

Signed-off-by: ruzell22 <[email protected]>
Signed-off-by: Peter Somogyvari <[email protected]>
petermetz pushed a commit to ruzell22/cactus that referenced this issue Nov 5, 2024
Primary Changes
---------------
1. Yarn patch to [email protected] so that it does not crash nodejs process
Without the catch block, the rejection is an unhandled rejection that bubbles
up to the top of the callstack where NodeJS itself catches it and then crashes
the entire process. (used to be that it just logged a warning but since some
of the newer versions it crashes which allows us to find these bugs in our
code / library's code)
2. It is returning a 503 instead of a 500
3. Added a static retry-after header value of 5 seconds
4. Added test case in test-ledger which has run-transaction at the end that is
expected to give error 503 when the backing ledger is unavailable

Fixes: hyperledger-cacti#3406

Co-authored-by: Peter Somogyvari <[email protected]>

Signed-off-by: ruzell22 <[email protected]>
Signed-off-by: Peter Somogyvari <[email protected]>
ruzell22 added a commit to ruzell22/cactus that referenced this issue Nov 6, 2024
Primary Changes
---------------
1. Yarn patch to [email protected] so that it does not crash nodejs process
Without the catch block, the rejection is an unhandled rejection that bubbles
up to the top of the callstack where NodeJS itself catches it and then crashes
the entire process. (used to be that it just logged a warning but since some
of the newer versions it crashes which allows us to find these bugs in our
code / library's code)
2. It is returning a 503 instead of a 500
3. Added a static retry-after header value of 5 seconds
4. Added test case in test-ledger which has run-transaction at the end that is
expected to give error 503 when the backing ledger is unavailable

Fixes: hyperledger-cacti#3406

Co-authored-by: Peter Somogyvari <[email protected]>

Signed-off-by: ruzell22 <[email protected]>
Signed-off-by: Peter Somogyvari <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Besu P2 Priority 2: High
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants