-
Notifications
You must be signed in to change notification settings - Fork 129
Rolling restart unable to restart broker #239
Comments
From brokers SSH log, it looks the start and stop command actually reaches the broker but the start command is unable to start the broker.
|
The problem with rolling script is that both the stop and start command(Kafka) are executed almost at the same time with very little time delay resulting in start command executed even before stop command finishes its task. This is my assumption. This behavior can be emulated by executing the following command from the terminal.
Is it possible to inject a time delay between execution of the two command ? (not using sleep) |
Hi @DwijadasDey , what you are saying seems reasonable to me. I'll try to reproduce your issue internally so we can come up with a good fix |
@DwijadasDey I think we never ran into the issue internally since we use the default start/stop command (i.e. using systemd or upstart previously) as they probably have some logic to avoid the situation you are describing. Is this possible for your deployment? I think using service managers like either systemd would be better overall than running the command daemonized. Otherwise you might want to include a prestarttask in your invocation that waits until the Kafka process has been stopped. |
What happens to your SSH connection after the broker is stopped? |
I have the same problem /usr/lib/python2.7/site-packages/paramiko/kex_ecdh_nist.py:111: CryptographyDeprecationWarning: encode_point has been deprecated on EllipticCurvePublicNumbers and will be removed in a future version. Please use EllipticCurvePublicKey.public_bytes to obtain both compressed and uncompressed point encoding. |
Hi
I am trying use rolling restart script(latest) along with Jolokia (jolokia-jvm-1.6.2-agent.jar) which is embedded with the kafka service script running in the brokers node(passed via KAFKA_OPTS).
KAFKA_OPTS="-javaagent:/home/kafka/prometheus/jmx_prometheus_javaagent-0.3.1.jar=8080:/home/kafka/prometheus/kafka-0-8-2.yml -javaagent:/home/kafka/jolokia/jolokia-agent.jar=host=*"
I am able to get jolokia metrics from the remote brokers node using following CURL command.
curl bro1:8778/jolokia/read/kafka.server:name=UnderReplicatedPartitions,type=ReplicaManager/Value | jq
When i run the rolling restart script, it detects all the brokers and after confirmation the script stops the first broker. Then it waits forever to broker 1 to restart with the following messages:
Tried with the following command as well:
[kfk@admin-node ~]$ kafka-rolling-restart --cluster-type kafka --start-command "sudo service kafka start " --stop-command "sudo service kafka stop" --check-count 3
On inspecting brokers node 1, I found kafka is stopped. Upon manual restart of broker 1, the rolling restart script stopped the second broker and again the script waits forever for broker 2 to get up. I have tested all the service command for kafka(start,stop,restart) manually in the broker's node and all of them are working.
It looks rolling restart script able to stop the kafka broker but unable to restart it.
Where could be the issues ?
Kafka version: confluent-5.2.1-2.12
The text was updated successfully, but these errors were encountered: