User Defined Async Source - "Readiness probe failed" when there are no more messages #128

tolmanam · 2023-12-27T16:41:49Z

Description

This is probably just me not understanding how things are supposed to work.

I have created a user-defined source, based on the async source example that sets up a REST API to accept requests that execute database queries and generate Numaflow messages for a pipeline to work off.

I am not sure what the read_handler function should return when there aren't any results to pass on (this could be just because we are waiting for another REST request).

I tried just breaking out of the iterator but that resulted in a "Readiness probe" failure so K8s will restart the pod.

To Reproduce

Steps to reproduce the behavior:

Modify the async-source example.py so that the read_handler returns after some number of messages, rather than running forever.

Quick and dirty:

From:

for x in range(datum.num_records):

To:

for x in range(self.read_idx, datum.num_records):

Build the image
Deploy the pipeline
Monitor the deployment (k9s)

Expected behavior

I thought that the source would stop producing messages so the pipeline would flush all the queues and then wait for more work (which will never come in this test case, but could in the REST API scenario described above).

Environment

Kubernetes: v1.27.6+k3s1
Numaflow: quay.io/numaproj/numaflow:v1.1.1
Numalogic: unknown (please advise where I might find this information)
Numaflow-python: 0.6.0

Message from the maintainers:

Impacted by this bug? Give it a 👍. We often sort issues this way to know what to prioritize.

The text was updated successfully, but these errors were encountered:

tolmanam · 2023-12-27T16:45:08Z

Is the expected behavior for the read_handler to run, forever, and just block while there is no data to pass along? I always worry about waiting for things indefinitely.

tolmanam · 2023-12-27T17:57:01Z

FWIW -

I also see this same "Readiness probe failed" if the read_handler takes too long to respond.

Rather than limiting the number of responses as described above, you can just add a long sleep (longer than the readiness probe) inside the loop.

kohlisid · 2024-01-12T19:51:30Z

Hey @tolmanam
I was trying to replicate the issue with the steps you provided and I had a quick question,
Were you seeing a pipeline deletion due to pods autoscaling down to 0 because of no traffic or was a crash seen at your end?

tolmanam · 2024-01-14T09:37:03Z

I believe it was Kubernetes killing the pod because it failed the "Readiness probe".

Consider the use case that you want to run a database query that generates X number of messages every 10 minutes. You wouldn't want autoscaling to drop the vertex.

FWIW - I swapped out the UDF source with the built-in HTTP source, and it runs happily without adding any messages to the pipeline until receiving a POST, so the behavior I would like is compatible with Numaflow, I just don't appear to know how to build a User Defined Source.

tolmanam added the bug Something isn't working label Dec 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

User Defined Async Source - "Readiness probe failed" when there are no more messages #128

User Defined Async Source - "Readiness probe failed" when there are no more messages #128

tolmanam commented Dec 27, 2023 •

edited

Loading

tolmanam commented Dec 27, 2023

tolmanam commented Dec 27, 2023

kohlisid commented Jan 12, 2024

tolmanam commented Jan 14, 2024

User Defined Async Source - "Readiness probe failed" when there are no more messages #128

User Defined Async Source - "Readiness probe failed" when there are no more messages #128

Comments

tolmanam commented Dec 27, 2023 • edited Loading

Description

To Reproduce

Quick and dirty:

Expected behavior

Environment

tolmanam commented Dec 27, 2023

tolmanam commented Dec 27, 2023

kohlisid commented Jan 12, 2024

tolmanam commented Jan 14, 2024

tolmanam commented Dec 27, 2023 •

edited

Loading