Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User Defined Async Source - "Readiness probe failed" when there are no more messages #128

Open
tolmanam opened this issue Dec 27, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@tolmanam
Copy link
Contributor

tolmanam commented Dec 27, 2023

Description

This is probably just me not understanding how things are supposed to work.

I have created a user-defined source, based on the async source example that sets up a REST API to accept requests that execute database queries and generate Numaflow messages for a pipeline to work off.

I am not sure what the read_handler function should return when there aren't any results to pass on (this could be just because we are waiting for another REST request).

I tried just breaking out of the iterator but that resulted in a "Readiness probe" failure so K8s will restart the pod.

To Reproduce

Steps to reproduce the behavior:

  1. Modify the async-source example.py so that the read_handler returns after some number of messages, rather than running forever.

Quick and dirty:

From:

for x in range(datum.num_records):

To:

for x in range(self.read_idx, datum.num_records):
  1. Build the image
  2. Deploy the pipeline
  3. Monitor the deployment (k9s)

Expected behavior

I thought that the source would stop producing messages so the pipeline would flush all the queues and then wait for more work (which will never come in this test case, but could in the REST API scenario described above).

Environment

  • Kubernetes: v1.27.6+k3s1
  • Numaflow: quay.io/numaproj/numaflow:v1.1.1
  • Numalogic: unknown (please advise where I might find this information)
  • Numaflow-python: 0.6.0

Message from the maintainers:

Impacted by this bug? Give it a 👍. We often sort issues this way to know what to prioritize.

@tolmanam tolmanam added the bug Something isn't working label Dec 27, 2023
@tolmanam
Copy link
Contributor Author

Is the expected behavior for the read_handler to run, forever, and just block while there is no data to pass along? I always worry about waiting for things indefinitely.

@tolmanam
Copy link
Contributor Author

FWIW -

I also see this same "Readiness probe failed" if the read_handler takes too long to respond.

Rather than limiting the number of responses as described above, you can just add a long sleep (longer than the readiness probe) inside the loop.

@kohlisid
Copy link
Contributor

Hey @tolmanam
I was trying to replicate the issue with the steps you provided and I had a quick question,
Were you seeing a pipeline deletion due to pods autoscaling down to 0 because of no traffic or was a crash seen at your end?

@tolmanam
Copy link
Contributor Author

I believe it was Kubernetes killing the pod because it failed the "Readiness probe".

Consider the use case that you want to run a database query that generates X number of messages every 10 minutes. You wouldn't want autoscaling to drop the vertex.

FWIW - I swapped out the UDF source with the built-in HTTP source, and it runs happily without adding any messages to the pipeline until receiving a POST, so the behavior I would like is compatible with Numaflow, I just don't appear to know how to build a User Defined Source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants