Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS resolutions errors with Broker host names returned by Pulsar lookups #199

Open
lhotari opened this issue Apr 27, 2022 · 2 comments
Open

Comments

@lhotari
Copy link
Contributor

lhotari commented Apr 27, 2022

There's currently a conflicting problem with the Pulsar k8s deployment and how Pulsar load balancing works.

When a Pulsar broker starts, it will register itself as a broker in the internal Pulsar load balancer. Pulsar load balancer might immediately assign new namespace bundles to the broker and the topics might immediately get requests.

The conflicting problem is that DNS resolution for the broker's host name will fail with the current settings until the broker's readiness probe succeeds.

Pulsar might already return the hostname of a specific broker to a client, but the client cannot resolve the DNS name since the broker's readiness probe hasn't passed. This causes extra delays and also bugs when connecting to topics after a load balancing event. Pulsar clients usually backoff and retry. For Admin API HTTP requests, clients might not properly handle errors and for example Pulsar Proxy will fail the request when there's a DNS lookup issue.

solution:
Broker statefulset's service should use publishNotReadyAddresses: true

There's useful information about stateful sets and publishNotReadyAddresses setting:
k8ssandra/cass-operator#18

There's an alternative solution in #198 which is fine for cases where TLS is disabled for brokers. Stable hostnames are required when using TLS to be able to do hostname verification for the certificates.

@lhotari
Copy link
Contributor Author

lhotari commented Apr 27, 2022

I made an experiment to add a new service and make the broker sts use this service: 259341c

The problem is that it's not possible to change the serviceName for a STS:

Error: UPGRADE FAILED: cannot patch "pulsar-testenv-pulsar-broker" with kind StatefulSet: StatefulSet.apps "pulsar-testenv-pulsar-broker" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden

We would like to have 2 service for the broker STS:

  • 1 service that uses publishNotReadyAddresses: true
  • another service that doesn't use publishNotReadyAddresses: true. This would be used to redirect traffic hitting the service only to brokers that pass the readiness probe.

It doesn't seem to be possible to keep backwards compatibility for existing deployments with the above requirements.

@cdbartholomew
Copy link
Contributor

@lhotari To support the upgrade path, can you switch the purpose of the services? So you don't have to modify the StatefulSet, use the existing name for the service that does use publishNotReadyAddresses: true setting and a new service that does? The proxy should point to the service that only routes traffic if the broker is ready, so that the proxy doesn't send traffic to a broker that can't handle it.

pgier pushed a commit to pgier/datastax-pulsar-helm-chart that referenced this issue Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants