[BAHIR-295] Added backpressure & ratelimit support #101

iammehrabalam · 2021-12-15T07:46:44Z

No description provided.

eskabetxe · 2022-01-12T07:56:01Z

@iammehrabalam thanks for your contribution
@lresende could you check?

lresende · 2022-01-23T19:00:09Z

@iammehrabalam What is the backward compatibility story for this change? Also, should the sample be using the new capabilities to demonstrate the new functionality?

iammehrabalam · 2022-01-24T07:03:03Z

The behaviour will be exactly same what was earlier if below spark streaming config is not set.

spark.streaming.backpressure.initialRate
spark.streaming.receiver.maxRate
spark.streaming.backpressure.pid.minRate

So it means backward compatible.

Added a test case which demonstrate rate and batch size.

@lresende

iammehrabalam · 2022-02-28T06:42:37Z

@lresende @eskabetxe reminder

datasherlock · 2023-02-04T14:33:20Z

The backpressure implementation isn't working as expected. My understanding is that the backpressure mechanism will control the input rate but never exceed the spark.streaming.receiver.maxRate. But this doesn't seem to be honoured since we're noticing that the receiver input rate breaches the spark.streaming.receiver.maxRate every now and then and tends to put a lot of pressure on the pipeline.

Context - I created a Spark Scala app with 900 receivers, spark.streaming.receiver.maxRate=1500 and batchInterval=60s. My understanding is that the total number of records per batch should not be greater than 900*1500*60 = 81,000,000 records. But I am noticing that some batches are going as high as 776,732,455 records where the processing time is >>> batchInterval

datasherlock · 2023-02-06T07:49:57Z

Based on https://spark.apache.org/docs/latest/streaming-custom-receivers.html#receiver-reliability, the rate control mechanism will have to be implemented by the receiver (if reliable). I do not see any logic that caps the input rates to the maxRate in the code. Could that be the reason why the backpressure limits are not honoured?

LeonardMeyer · 2023-03-03T16:28:19Z

Just stumbled upon this PR. For anyone interested, my guess is that the correct implementation should use Spark Streaming's BlockGenerator class. It would give the whole process spark.streaming.backpressure.enabled support for free since its RateLimiter implementation can be notified by Spark.

iammehrabalam · 2023-06-04T14:31:12Z

@LeonardMeyer you are right but the rate limit will only be applied when single data is written into the store (https://github.com/apache/spark/blob/595ad30e6259f7e4e4252dfee7704b73fd4760f7/streaming/src/main/scala/org/apache/spark/streaming/receiver/Receiver.scala#L118). In case of writing iterator (i.e block) directly rate limit will not be applied by default.

In Pubsub Receiver, the iterator store method is called where we added rate limit (i.e the same rate limit is generated based on backpressure )

iammehrabalam · 2023-06-04T14:36:40Z

@datasherlock Ideally it should work. If possible share spark configuration so I can help.
For rate limit logic you can check updateRateLimit & pushToStoreAndAck method in this PR.

irajhedayati · 2023-09-19T02:21:19Z

This change was suggested two years ago. Is there any plan to push it through?

iammehrabalam added 4 commits December 15, 2021 13:11

Added backpressure & ratelimit support

bd0f7c5

removed scheduled thread to update ratelimit

ad74b81

added comment

2670355

Added buffer logic and rate multiplier factor

526a8b7

iammehrabalam force-pushed the BAHIR-295 branch from 068233b to 526a8b7 Compare January 3, 2022 11:49

Now we can pull data from pubsub regional endpoint also

9f81a1c

eskabetxe force-pushed the master branch from f7d2d64 to 8b647c8 Compare November 4, 2022 08:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BAHIR-295] Added backpressure & ratelimit support #101

[BAHIR-295] Added backpressure & ratelimit support #101

iammehrabalam commented Dec 15, 2021

eskabetxe commented Jan 12, 2022

lresende commented Jan 23, 2022

iammehrabalam commented Jan 24, 2022

iammehrabalam commented Feb 28, 2022

datasherlock commented Feb 4, 2023 •

edited

Loading

datasherlock commented Feb 6, 2023

LeonardMeyer commented Mar 3, 2023 •

edited

Loading

iammehrabalam commented Jun 4, 2023 •

edited

Loading

iammehrabalam commented Jun 4, 2023

irajhedayati commented Sep 19, 2023

[BAHIR-295] Added backpressure & ratelimit support #101

Are you sure you want to change the base?

[BAHIR-295] Added backpressure & ratelimit support #101

Conversation

iammehrabalam commented Dec 15, 2021

eskabetxe commented Jan 12, 2022

lresende commented Jan 23, 2022

iammehrabalam commented Jan 24, 2022

iammehrabalam commented Feb 28, 2022

datasherlock commented Feb 4, 2023 • edited Loading

datasherlock commented Feb 6, 2023

LeonardMeyer commented Mar 3, 2023 • edited Loading

iammehrabalam commented Jun 4, 2023 • edited Loading

iammehrabalam commented Jun 4, 2023

irajhedayati commented Sep 19, 2023

datasherlock commented Feb 4, 2023 •

edited

Loading

LeonardMeyer commented Mar 3, 2023 •

edited

Loading

iammehrabalam commented Jun 4, 2023 •

edited

Loading