Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebalancing problem with Faust Streaming consumer #594

Open
2 tasks done
arcanjo45 opened this issue Dec 20, 2023 · 0 comments
Open
2 tasks done

Rebalancing problem with Faust Streaming consumer #594

arcanjo45 opened this issue Dec 20, 2023 · 0 comments

Comments

@arcanjo45
Copy link

Checklist

  • I have included information about relevant versions
  • I have verified that the issue persists when using the master branch of Faust.

Steps to reproduce

Hello everyone hope I find you well!

I'm facing an odd situation when using Faust Streaming in my consumer app. So I have a Kafka Consumer that connects to my Kafka GCP instance on my dev environment in Google Cloud. However when sometimes my Kafka instance restarts or goes down to lack of resources when my consumer tries to rebalance it stays stuck in a loop with the following errors logging:

[2023-12-20 10:23:47,912] [11] [INFO] Discovered coordinator 2 for group myapp-dev-processor 
[2023-12-20 10:23:47,912] [11] [INFO] (Re-)joining group myapp-dev-processor 
[2023-12-20 10:23:47,915] [11] [WARNING] Marking the coordinator dead (node 2)for group myapp-dev-processor. 

This is happening frequently for us only on our dev environment but we are investigating what may be the root cause of this issue and how to tackle it so that if it occurs in prod we have an way to act fast. We know the consumer connects to our kafka instance with success but then this error happens and it stays stuck in an endless loop. We tried search for any error log on our kafka instances but we don't find anything so we think this may be a problem within the library somehow.

I can also say that the only fix we discovered until now is to re-deploy both the kafka instance on GCP and our consumer project again which is a thing we can't do in production if this situations occurs. I don't have detailed knowledge of the project but it seems to be a problem related with the topic the app creates to handle task distribution because the name of the topic is not the one from where we consume data but the one that is created by the app.

Does anyone have any idea why this may be happening? We tried to search in the project repo, chatgpt and stack overflow but without any luck.

Expected behavior

Consumer should rebalance partitions normally.

Actual behavior

Full traceback

Versions

faust-aioeventlet==0.6

faust-streaming==0.10.14

confluent-kafka==2.1.1

Python 3.12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant