Tornado #731

mrocklin · 2017-10-02T00:00:34Z

I would like to consume and produce messages using PyKafka from within a Tornado application. I am potentially willing to contribute code for this. I have a few questions.

Are the maintainers of this library comfortable adding tornado as an optional dependency? This would bloat your testing stack a bit. I'm also happy to do this externally if preferred.
Is there already a mechanism to run a callback whenever data arrives? Using the non-blocking consume will work in most cases, but when it returns None we'll want to be triggered when data does arrive. Sometimes frameworks like this provide a mechanism for a callback function, which in our case could be setting an Event object. I'm particularly interested in the librdkafka consumer. As far as I can tell looking at the code there is no such mechanism exposed at the Python level, but I thought I would check. If forced we can always use a separate thread for this.

Separately, I noticed that there is no librdkafka solution for balanced consumers. Is this a limitation of librdkafka or is it that pykafka has not wrapped this functionality yet? If the latter then is there any near-term plan to support this?

The text was updated successfully, but these errors were encountered:

mrocklin · 2017-10-02T00:04:17Z

On the producer side is there any back pressure? There doesn't seem to be any block= option when calling the produce(...) method. Is there a way to emit a message in a robustly non-blocking way and err if blocking would be inevitable?

emmettbutler · 2017-10-02T17:12:38Z

In general I'd like to avoid adding additional dependencies where possible, but it will be helpful to understand exactly what a Tornado dependency would enable. If there's a good tradeoff between tornado-specific code in pykafka and ease of use for users of both, it could be worth the added dependency. Without knowing what this will look like, though, I'm wary of adding even more to our already somewhat bloated test requirements.

Some work was started a while ago by @mikepk on a callback mechanism similar to the one you're describing but for produced messages in #506. If it seems necessary, we can either adapt that work or start from scratch on a callback interface that would meet your needs. I think when we chat tomorrow we'll have a better understanding of the specific requirements around this.

You might be noticing that there's no balanced_consumer.py in the rdkafka directory. You can still use rdkafka with balanced consumers through the use_rdkafka kwarg on BalancedConsumer.

jeffwidman · 2017-10-03T01:42:33Z

I noticed that there is no librdkafka solution for balanced consumers

That's because those should go the way of the dodo bird. Much better to use Kafka's native Consumer Group API's which are supported by librdkafka and also exposed by pykafka as ManagedBalancedConsumers. Pykafka's BalancedConsumer was built before these native Kafka API's existed, and (IMHO based on our production usage) has fairly serious problems such as #354

emmettbutler · 2017-10-05T16:07:18Z

To clarify, the callback that would solve this issue would be called here when a message becomes available on one or more partitions.

emmettbutler added the enhancement label Oct 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tornado #731

Tornado #731

mrocklin commented Oct 2, 2017

mrocklin commented Oct 2, 2017

emmettbutler commented Oct 2, 2017 •

edited

Loading

jeffwidman commented Oct 3, 2017

emmettbutler commented Oct 5, 2017

Tornado #731

Tornado #731

Comments

mrocklin commented Oct 2, 2017

mrocklin commented Oct 2, 2017

emmettbutler commented Oct 2, 2017 • edited Loading

jeffwidman commented Oct 3, 2017

emmettbutler commented Oct 5, 2017

emmettbutler commented Oct 2, 2017 •

edited

Loading