Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic handling of _COMMAND substitutions #337

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

2m
Copy link

@2m 2m commented Jun 5, 2018

I found a need for another _COMMAND substitution (other than HOSTNAME_COMMAND and PORT_COMMAND) so this PR modifies start-kafka.sh to handle any environment variable ending with _COMMAND generically and replace the placeholder with the command value after evaluation.

My use case was to have a kafka cluster working with scale >1 and without hardcoded port forwarding and using the new LISTENERS kafka configuration.

The command substitution needed for such scenario is:

  • HOSTNAME_COMMAND (already supported) for outside hostname in KAFKA_ADVERTISED_LISTENERS
  • PORT_COMMAND (already supported) for outside port in KAFKA_ADVERTISED_LISTENERS
  • INSIDE_HOSTNAME_COMMAND for inside hostname in KAFKA_ADVERTISED_LISTENERS which evaluates to the IP assigned to the docker container.

This PR also adds a docker-compose file (docker-compose-scale.yaml) which can be used for such scenario.

@sscaling
Copy link
Collaborator

sscaling commented Jun 5, 2018

Hi - I can see some advantages of making this generic, but I'm also wondering if you have a general use-case. Typically, for a multi broker setup you would run Kafka on separate nodes with access to separate physical resources - requiring a completely different configuration.

This would lead me to believe this is for some local testing (scaling multiple nodes on one machine). In which case you can either use an empty host name or specify 0.0.0.0 to bind to all addresses depending on your requirements (as per https://kafka.apache.org/documentation/#configuration)

@2m
Copy link
Author

2m commented Jun 5, 2018

Yes, this is for testing. More concretely the use-case is to start a kafka cluster of more than one docker container in Travis CI and then run the tests, that send/receive messages to/from Kafka.

In which case you can either use an empty host name or specify 0.0.0.0 to bind to all addresses

Do you mean use that for inside hostname? As far as I understand (but my knowledge is very limited here) inside hostname is used by other kafka nodes to find each other and connect to when forming a cluster. So that needs to be a hostname that can be used from one node to access another. Therefore 0.0.0.0 might not work here. But I'll try.

@sscaling
Copy link
Collaborator

sscaling commented Jun 5, 2018

You are right - I misread your docker-compose file, I thought you were just configuring listeners. It is not valid to advertise 0.0.0.0 for advertised.listeners.

Have you looked at the tests in this repo - this is exactly what they do (2 Kafka brokers):

The main caveat to these tests is that the test evaluates from inside the docker network (i.e. tests are run from another docker container so they are already in the same network namespace).

Because of the restrictions of running multiple brokers on one machine (i.e. you cannot share the same port multiple times) I'm assuming it's all running inside the same network namespace anyway.

@2m
Copy link
Author

2m commented Jun 5, 2018

The main caveat to these tests is that the test evaluates from inside the docker network (i.e. tests are run from another docker container so they are already in the same network namespace).

I see. That means that there is no need to configure multiple listeners (outside and inside, for example).

However in a usecase like mine, were tests are being run outside of container, then I am not able to leave KAFKA_ADVERTISED_LISTENERS unset (like in the test/docker-compose.yml).

@sscaling
Copy link
Collaborator

sscaling commented Jun 6, 2018

I think (if docker port forwarding is bound to the same interface as the hostname) you could also just configure advertised listeners with INSIDE://_{HOSTNAME_COMMAND}:9092,OUTSIDE://kafka:9094 and set the hostname in the travis config.

The negatives I see in the proposed change is the generic behaviour. The more generic and abstract things are - the larger the surface area for things to go wrong.

If something is not well defined, I think some consideration into the naming should be given _COMMAND seems very generic, and could have other collisions (we already have issues with KAFKA_ collisions on kubernetes for example).

So as I said at the outset, I can see some benefits in this change - I think we just need to think through the implications (Also I would need to see some tests for varying different use-cases. A scenario test would be a better place for the included docker-compose.yml as well - to show it working end-to-end).

There are also some larger changes due (#335 : re-working our deployment model) so I would like to get those in and settled before considering merging this

@2m
Copy link
Author

2m commented Jun 6, 2018

I think (if docker port forwarding is bound to the same interface as the hostname) you could also just configure advertised listeners with INSIDE://_{HOSTNAME_COMMAND}:9092,OUTSIDE://kafka:9094 and set the hostname in the travis config.

The INSIDE://_{HOSTNAME_COMMAND}:9092 is not correct IMO. _{HOSTNAME_COMMAND} gets replaced to the hostname of the host machine. However that does not resolve inside the container and kafka nodes can not find eachother and form a cluster.

When checking this out I have found something else. It is possible to omit the hostname part completely from the INSIDE definition in the advertised listeners like so:

KAFKA_ADVERTISED_LISTENERS: INSIDE://:9092,OUTSIDE://_{HOSTNAME_COMMAND}:_{PORT_COMMAND}

Then kafka will automatically use the hostname of the container. This is what is stored in zookeeper when hostname is omitted:

... "endpoints":["INSIDE://f42261bd84ab:9092","OUTSIDE://donata:32845"] ...

(donata here is the hostname of the host machine)

Which is exactly what I wanted to achieve with the INSIDE_HOSTNAME_COMMAND and solves my usecase without the changes in this PR.

Therefore I am fine with closing this PR. If at a later time you would like to generalise the COMMAND substitution behaviour you are free to use the changes from here.

@sscaling
Copy link
Collaborator

sscaling commented Jun 6, 2018

The INSIDE://_{HOSTNAME_COMMAND}:9092 is not correct IMO. _{HOSTNAME_COMMAND} gets replaced to the hostname of the host machine. However that does not resolve inside the container and kafka nodes can not find eachother and form a cluster.

The implication was that you'd swap your INSIDE_HOSTNAME_COMMAND statement over to the HOSTNAME_COMMAND behaviour.

KAFKA_ADVERTISED_LISTENERS: INSIDE://:9092,OUTSIDE://{HOSTNAME_COMMAND}:{PORT_COMMAND}

That is what I was thinking of when I suggested binding to 0.0.0.0.

Glad it works. As I say - I think there is some value in this change - it's generic nature worries me a little about what other things may break (e.g. naming collisions etc). I think it's worth leaving open for now to see if anyone else has any strong feelings / feedback.

Thanks for the PR, feedback and responses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants