cardano-graphql fatal errors don't fully fail the systemd service unit #792

johnalotoski · 2022-12-19T23:15:58Z

Summary

While running the cardano-graphql NixOS service as a systemd unit, at least in some cases, when an error is logged, the error appears to result in a fatal condition where the cardano-graphql process no longer continues to function as no further activity happens in that systemd unit as would normally. However, the systemd process doesn't die, systemd therefore doesn't restart the service even though it's non-functional and the cardano-graphql process ends up blocking until manual intervention occurs.

This might be due to an exception which occurs in cardano-graphql that is eventually caught here, then the cardano-graphql process stops, but node continue running and systemd believes the service is still running.

If so, logging a message that the server is exiting due to an exception after logging the error would be helpful. Logging the request associated with the exception would also be helpful.

Steps to reproduce the bug

Run cardano-graphql in an explorer stack under load. Example problem which randomly occurs -- watch hasura client initialize, then see an error thrown (happens randomly) after which point no further activity will happen in the process.

cardano-graphql-start[$PID]: {"name":"cardano-graphql","hostname":"explorer-a","pid":$PID,"level":30,"module":"HasuraClient","msg":"Initializing","time":"$TIMESTAMP","v":0}
cardano-graphql-start[$PID]: {"name":"cardano-graphql","hostname":"explorer-c","pid":$PID,"level":50,"msg":"database query error: {\"response\":{\"errors\":[{\"extensions\":{\"code\":\"unexpected\",\"path\":\"$\"},\"message\":\"database query error\"}],\"status\":200},\"request\":{\"query\":\"query {\\n          epochs (limit: 1, order_by: { number: desc }) {\\n              adaPots {\\n                  reserves\\n              }\\n          }\\n          rewards_aggregate {\\n              aggregate {\\n                  sum {\\n                      amount\\n                  }\\n              }\\n          }\\n          utxos_aggregate {\\n              aggregate {\\n                  sum {\\n                      value\\n                  }\\n              }\\n          }\\n          withdrawals_aggregate {\\n              aggregate {\\n                  sum {\\n                      amount\\n                  }\\n              }\\n          }\\n      }\"}}","time":"$TIMESTAMP","v":0}

Actual Result

Manual intervention required to restart a non-functional cardano-graphql service

Expected Result

A fatal error which renders the cardano-graphql process non-functional to completely exit with a failure code so that systemd recognizes a unit failure and will take predetermined action.

Environment

Cardano-graphql 7.0.X and newer unreleased test branches

Platform

Linux (Ubuntu)
Linux (Other)
macOS
Windows

Platform version

NixOS 21.11

Runtime

Node.js
Docker

Runtime version

v12.15.0

The text was updated successfully, but these errors were encountered:

johnalotoski added the BUG label Dec 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cardano-graphql fatal errors don't fully fail the systemd service unit #792

cardano-graphql fatal errors don't fully fail the systemd service unit #792

johnalotoski commented Dec 19, 2022

cardano-graphql fatal errors don't fully fail the systemd service unit #792

cardano-graphql fatal errors don't fully fail the systemd service unit #792

Comments

johnalotoski commented Dec 19, 2022

Summary

Steps to reproduce the bug

Actual Result

Expected Result

Environment

Platform

Platform version

Runtime

Runtime version