Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cardano-graphql fatal errors don't fully fail the systemd service unit #792

Open
2 of 6 tasks
johnalotoski opened this issue Dec 19, 2022 · 0 comments
Open
2 of 6 tasks
Labels

Comments

@johnalotoski
Copy link
Contributor

Summary

While running the cardano-graphql NixOS service as a systemd unit, at least in some cases, when an error is logged, the error appears to result in a fatal condition where the cardano-graphql process no longer continues to function as no further activity happens in that systemd unit as would normally. However, the systemd process doesn't die, systemd therefore doesn't restart the service even though it's non-functional and the cardano-graphql process ends up blocking until manual intervention occurs.

This might be due to an exception which occurs in cardano-graphql that is eventually caught here, then the cardano-graphql process stops, but node continue running and systemd believes the service is still running.

If so, logging a message that the server is exiting due to an exception after logging the error would be helpful. Logging the request associated with the exception would also be helpful.

Steps to reproduce the bug

Run cardano-graphql in an explorer stack under load. Example problem which randomly occurs -- watch hasura client initialize, then see an error thrown (happens randomly) after which point no further activity will happen in the process.

cardano-graphql-start[$PID]: {"name":"cardano-graphql","hostname":"explorer-a","pid":$PID,"level":30,"module":"HasuraClient","msg":"Initializing","time":"$TIMESTAMP","v":0}
cardano-graphql-start[$PID]: {"name":"cardano-graphql","hostname":"explorer-c","pid":$PID,"level":50,"msg":"database query error: {\"response\":{\"errors\":[{\"extensions\":{\"code\":\"unexpected\",\"path\":\"$\"},\"message\":\"database query error\"}],\"status\":200},\"request\":{\"query\":\"query {\\n          epochs (limit: 1, order_by: { number: desc }) {\\n              adaPots {\\n                  reserves\\n              }\\n          }\\n          rewards_aggregate {\\n              aggregate {\\n                  sum {\\n                      amount\\n                  }\\n              }\\n          }\\n          utxos_aggregate {\\n              aggregate {\\n                  sum {\\n                      value\\n                  }\\n              }\\n          }\\n          withdrawals_aggregate {\\n              aggregate {\\n                  sum {\\n                      amount\\n                  }\\n              }\\n          }\\n      }\"}}","time":"$TIMESTAMP","v":0}

Actual Result

  • Manual intervention required to restart a non-functional cardano-graphql service

Expected Result

  • A fatal error which renders the cardano-graphql process non-functional to completely exit with a failure code so that systemd recognizes a unit failure and will take predetermined action.

Environment

Cardano-graphql 7.0.X and newer unreleased test branches

Platform

  • Linux (Ubuntu)
  • Linux (Other)
  • macOS
  • Windows

Platform version

NixOS 21.11

Runtime

  • Node.js
  • Docker

Runtime version

v12.15.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant