Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Make it easier to diagnose failures to connect to Temporal Cloud due to incorrect serverRootCACertificate #1431

Open
maxramqvist opened this issue May 29, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@maxramqvist
Copy link

What are you really trying to do?

I'm trying to connect to Temporal Cloud with this SDK client. But failed in my troubleshooting when the error message threw me in another direction.

Describe the bug

I'm trying to connect to a namespace in Temporal Cloud. The worker and the client is using the exact same TLS-configuration (i use the same function to generate the TLS config in my actual code).
The worker connects successfully to Temporal Cloud, but the client fails with the message Failed to connect before the deadline.

It turns out that it was a misconfiguration on my part on the serverRootCACertificate, I had set a CA certificate that wasn't valid in this case.

The error message through me off in the troubleshooting. I had expected the message to be something related to TLS validation or similar, not a failed connection before the deadline...

It would be great with a error message explaining the actual problem if that is possible.

Minimal Reproduction

        const crt = Buffer.from(config.temporal.clientCertB64, "base64")
        const key = Buffer.from(config.temporal.clientKeyB64, "base64")

        clientConnection = await Connection.connect({
            address: config.temporal.address,
            tls: {
              clientCertPair: {
                  crt,
                  key,
              },
              // just set an invalid CA cert here 
              serverRootCACertificate: process.env["NODE_EXTRA_CA_CERTS"] 
                  ? fs.readFileSync(process.env["NODE_EXTRA_CA_CERTS"])
                  : undefined,
          }
        })

Environment/Versions

Typescript SDK 1.9.3 and 1.10.1, NodeJS 22.0 and 22.2

@maxramqvist maxramqvist added the bug Something isn't working label May 29, 2024
@mjameswh
Copy link
Contributor

I fully agree. This is a common friction point, and we want to improve on this.

Unfortunately, and a bit counter-intuitively, gRPC's robustness against transport level failures makes it very difficult to identify such errors. Essentially, by design, the gRPC layer treats connection level errors as transient, and will simply retry transparently as long as it is authorized to do so (hence the "before deadline" part in that error message). There are also difficulties related to the fact that the underlying TLS libraries, on both client-side and server-side, will often close a connection eagerly, with very little details, if any, when they fail to complete an mTLS handshake sequence, which is the specific case you faced.

I think this comment from the @grpc/grpc-js project maintainer clearly expose this:

The current behavior is intentional. Fundamentally, the gRPC library assumes that users are inputting their correct information, and that failures to establish connections are the result of the inherently asynchronous nature of networks. A DNS resolution failure can mean that the DNS config hasn't been updated yet. A TCP connection failure can mean that the server process happens to be restarting at the moment. And a certificate validation error can mean that the server is rotating its certs and the client picked up the new roots a little early.

This is also not something that can be conclusively checked beforehand. The root certificates file does not contain information about every host you might connect to. The default file contains certificate information for certificate authorities. So, you can check whether a particular certificate is signed by a CA that you recognize, but you can't check locally whether or not you will be able to connect to a particular host.

Still, we are looking internally at ways to improve the user experience in this regard. I'll therefore keep this ticket open to track this as a feature request.

@mjameswh mjameswh changed the title [Bug] Client to Temporal Cloud: "Failed to connect before the deadline" - but its actually the wrong serverRootCACertificate [Feature Request] Make it easier to diagnose failures to connect to Temporal Cloud due to incorrect serverRootCACertificate May 29, 2024
@mjameswh mjameswh added enhancement New feature or request and removed bug Something isn't working labels May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants