Skip to content
This repository has been archived by the owner on Mar 24, 2021. It is now read-only.

Tornado #731

Open
mrocklin opened this issue Oct 2, 2017 · 4 comments
Open

Tornado #731

mrocklin opened this issue Oct 2, 2017 · 4 comments

Comments

@mrocklin
Copy link

mrocklin commented Oct 2, 2017

I would like to consume and produce messages using PyKafka from within a Tornado application. I am potentially willing to contribute code for this. I have a few questions.

  1. Are the maintainers of this library comfortable adding tornado as an optional dependency? This would bloat your testing stack a bit. I'm also happy to do this externally if preferred.
  2. Is there already a mechanism to run a callback whenever data arrives? Using the non-blocking consume will work in most cases, but when it returns None we'll want to be triggered when data does arrive. Sometimes frameworks like this provide a mechanism for a callback function, which in our case could be setting an Event object. I'm particularly interested in the librdkafka consumer. As far as I can tell looking at the code there is no such mechanism exposed at the Python level, but I thought I would check. If forced we can always use a separate thread for this.

Separately, I noticed that there is no librdkafka solution for balanced consumers. Is this a limitation of librdkafka or is it that pykafka has not wrapped this functionality yet? If the latter then is there any near-term plan to support this?

@mrocklin
Copy link
Author

mrocklin commented Oct 2, 2017

On the producer side is there any back pressure? There doesn't seem to be any block= option when calling the produce(...) method. Is there a way to emit a message in a robustly non-blocking way and err if blocking would be inevitable?

@emmettbutler
Copy link
Contributor

emmettbutler commented Oct 2, 2017

In general I'd like to avoid adding additional dependencies where possible, but it will be helpful to understand exactly what a Tornado dependency would enable. If there's a good tradeoff between tornado-specific code in pykafka and ease of use for users of both, it could be worth the added dependency. Without knowing what this will look like, though, I'm wary of adding even more to our already somewhat bloated test requirements.

Some work was started a while ago by @mikepk on a callback mechanism similar to the one you're describing but for produced messages in #506. If it seems necessary, we can either adapt that work or start from scratch on a callback interface that would meet your needs. I think when we chat tomorrow we'll have a better understanding of the specific requirements around this.

You might be noticing that there's no balanced_consumer.py in the rdkafka directory. You can still use rdkafka with balanced consumers through the use_rdkafka kwarg on BalancedConsumer.

@jeffwidman
Copy link
Contributor

I noticed that there is no librdkafka solution for balanced consumers

That's because those should go the way of the dodo bird. Much better to use Kafka's native Consumer Group API's which are supported by librdkafka and also exposed by pykafka as ManagedBalancedConsumers. Pykafka's BalancedConsumer was built before these native Kafka API's existed, and (IMHO based on our production usage) has fairly serious problems such as #354

@emmettbutler
Copy link
Contributor

To clarify, the callback that would solve this issue would be called here when a message becomes available on one or more partitions.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants