BAAS-28597: Add counter for limiter wait calls #117

Calvinnix · 2024-02-13T18:45:22Z

No description provided.

Calvinnix · 2024-02-13T18:47:23Z

runtime.go

@@ -213,6 +214,10 @@ func (self *Runtime) Ticks() uint64 {
 	return self.ticks
 }

+func (self *Runtime) LimiterWaitCount() uint64 {
+	return self.limiterWaitCount


@arahmanan the baas counterpart will be to log the LimiterWaitCount when the function execution finishes similar to how we log Ticks(). (This could also be a prometheus counter potentially, or perhaps I can piggyback on the Ticks log and include limiterWaitCount... we can discuss in the BAAS pr though)

oh I like the idea of adding it to the ticks log! If we do that, adding a new prom metric won't be necessary.

runtime.go

arahmanan · 2024-02-14T02:26:45Z

vm.go

@@ -617,6 +617,7 @@ func (vm *vm) run() {
 				ctx = context.Background()
 			}

+			vm.r.limiterWaitCount++


we aren't actually waiting every time we call WaitN. To be able to determine if we are actually waiting take a look at this as an example. We'll only wait if the delay is > 0.

[opt] in addition to the number of times we waited, I think it would also be valuable to track the total delay for the entire execution.

Oh I see the distinction, does that come down to timing?

We have a rate limit of 10_000_000 and a burst value of 250_000 which I think is effectively 50_000 when you account for the burst divisor,

Aside from a lot of process yielding, I'm not sure I fully understand what scenario we actually wait and for how long.

If we use 10_000_000 ticks in half a second would we wait another half a second because that is the amount left until we can execute 50_000 ticks (provided by fillBucket)?

If we use 10_000_000 ticks with only 1ms left in the "second" window, would we only wait that 1ms before we can execute 50_000 ticks?

If the above is true I assume we could do something like:

reservation := vm.r.limiter.Reserve(time.Now(), vm.r.limiterTicksLeft) // count reservation.Delay() // increment wait counter if Delay > 0

if r := vm.r.limiter.ReserveN(time.Now(), vm.r.limiterTicksLeft); r.OK() { waitDelayMS := r.Delay().Milliseconds() if waitDelayMS > 0 { vm.r.limiterWaitTotalMS += waitDelayMS vm.r.limiterWaitCount++ } } if waitErr := vm.r.limiter.WaitN(ctx, vm.r.limiterTicksLeft); waitErr != nil {

From the comment above WaitN:

WaitN blocks until lim permits n events to happen.

In other words, we aren't going to wait if the limit allows those n events to be processed. To be able to determine if we're actually waiting or not, you want to do something along these lines:

now = time.Now() r := vm.r.limiter.ReserveN(now, vm.r.limiterTicksLeft) if !r.OK() { panic("") } // Wait if necessary delay = r.DelayFrom(now) if delay > 0 { vm.r.limiterWaitCount++ err := util.Sleep(ctx, delay) if err != nil { r.Cancel() panic(err) } }

If we use 10_000_000 ticks in half a second would we wait another half a second because that is the amount left until we can execute 50_000 ticks (provided by fillBucket)?

Not exactly. We would approximately wait until enough time has passed to process the next 50k ticks. i.e. (50_000 / 10_000_000) => 0.005s => 5ms. You can find more about that here.

If we use 10_000_000 ticks with only 1ms left in the "second" window, would we only wait that 1ms before we can execute 50_000 ticks?

This can't happen. The rate limiter won't process more than 10_000_000 ticks / s. So it would just take 1s to process the first 10MM ticks. In other words, since we process 50k ticks at a time, a function can process at most 50k ticks every 5ms.

These docs do a decent job at explaining how this works. Let me know if you have more questions about it. I'm happy to also hop on a quick call.

arahmanan · 2024-02-14T02:29:15Z

runtime.go

@@ -213,6 +214,10 @@ func (self *Runtime) Ticks() uint64 {
 	return self.ticks
 }

+func (self *Runtime) LimiterWaitCount() uint64 {
+	return self.limiterWaitCount


oh I like the idea of adding it to the ticks log! If we do that, adding a new prom metric won't be necessary.

arahmanan

LGTM pending benchmark results and one super minor comment.

arahmanan · 2024-02-20T23:51:08Z

vm.go

+			delay := r.DelayFrom(now)
+			if delay > 0 {
+				vm.r.limiterWaitCount++
+				vm.r.limiterWaitTotalTime += delay.Nanoseconds()


[nit] thoughts on changing limiterWaitTotalTime to be of type time.Duration? That makes what's returned by LimiterWaitTotalTime a little more explicit without having to read the docstring.

Ooo I like that! Great idea

…is 0

arahmanan · 2024-02-22T15:29:23Z

vm.go

+			select {
+			case <-ctx.Done():
+				panic(ctx.Err())


[opt] I don't believe this extra check is necessary if delay == 0. We already interrupt the VM here, when the context times out / is canceled.

Hmm this was added in response to this test failing.. Though I still have test failures so I need to investigate what is happening.

https://spruce.mongodb.com/version/65df6c7e0ae606b392c17522/task-duration?duration=DESC

Sorry, I'm not sure I follow. Did you figure out what caused the test to fail? I still think we can get rid of this extra select statement. Correct me if I'm wrong.

Ok tests are finally passing with these changes, test-api-other and test-vm-goja tasks do not finish without the extra context check (evergreen).

This check also exists in the WaitN function, so we're just pulling that over here.

Let me know if you'd like to chat about this.

Specifically the TestFunctionExecTimeLimit test is what never finishes.

I see. [opt] The wait function has the select statement before it performs the reserve/wait operations. Should we do the same here? i.e. have this select statement right after we define ctx.

Yeah I think that's a good idea to stay consistent.

…at might be why it was failing

arahmanan · 2024-02-28T20:36:41Z

vm.go

+			now := time.Now()
+			r := vm.r.limiter.ReserveN(now, vm.r.limiterTicksLeft)
+			if !r.OK() {
+				panic("failed to make reservation")


Suggested change

panic("failed to make reservation")

panic(context.DeadlineExceeded)

This will keep the same behavior as the if strings.Contains(waitErr.Error(), "would exceed") { check. That being said, I don't believe this can ever happen at the moment.

arahmanan · 2024-02-28T20:42:17Z

vm.go

+			select {
+			case <-ctx.Done():
+				panic(ctx.Err())


Sorry, I'm not sure I follow. Did you figure out what caused the test to fail? I still think we can get rid of this extra select statement. Correct me if I'm wrong.

arahmanan

LGTM! Just an optional comment.

arahmanan · 2024-02-28T21:22:51Z

vm.go

+			select {
+			case <-ctx.Done():
+				panic(ctx.Err())


I see. [opt] The wait function has the select statement before it performs the reserve/wait operations. Should we do the same here? i.e. have this select statement right after we define ctx.

BAAS-28597: Add counter for limiter wait calls

0ae652c

Calvinnix requested a review from arahmanan February 13, 2024 18:45

Calvinnix commented Feb 13, 2024

View reviewed changes

runtime.go Outdated Show resolved Hide resolved

Calvinnix added 2 commits February 13, 2024 14:02

add doc for LimiterWaitCount

ea92633

formatting

9824c33

arahmanan reviewed Feb 14, 2024

View reviewed changes

Improve wait tracking logic

5edfab1

Calvinnix requested a review from arahmanan February 16, 2024 16:59

adjust how we wait to be more efficient

c0dc87a

arahmanan reviewed Feb 20, 2024

View reviewed changes

Calvinnix added 4 commits February 21, 2024 08:56

change limiterWaitTotalTime type to time.Duration

9796a86

clean up if condition

d861907

another area to clean up if condition

e07d007

need to check if context is Done, this check doesn't happen if delay …

c4ada0f

…is 0

arahmanan reviewed Feb 22, 2024

View reviewed changes

Calvinnix added 2 commits February 28, 2024 12:20

remove ctx cancel check, may re-add and include default case since th…

8b8c99f

…at might be why it was failing

implement the ctx deadline check correctly

5f81ce0

arahmanan reviewed Feb 28, 2024

View reviewed changes

update panic error message

760c1e6

arahmanan approved these changes Feb 28, 2024

View reviewed changes

move select statement for consistency

43a5dab

Calvinnix merged commit 06a8a67 into mongodb-forks:realm Mar 11, 2024
2 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BAAS-28597: Add counter for limiter wait calls #117

BAAS-28597: Add counter for limiter wait calls #117

Calvinnix commented Feb 13, 2024

Calvinnix Feb 13, 2024

arahmanan Feb 14, 2024

arahmanan Feb 14, 2024

Calvinnix Feb 14, 2024

Calvinnix Feb 14, 2024

Calvinnix Feb 14, 2024

arahmanan Feb 14, 2024

arahmanan Feb 14, 2024

arahmanan left a comment

arahmanan Feb 20, 2024

Calvinnix Feb 21, 2024

arahmanan Feb 22, 2024

Calvinnix Feb 22, 2024

Calvinnix Feb 28, 2024

arahmanan Feb 28, 2024

Calvinnix Feb 28, 2024

Calvinnix Feb 28, 2024

Calvinnix Feb 28, 2024 •

edited

Loading

arahmanan Feb 28, 2024

Calvinnix Feb 28, 2024

arahmanan Feb 28, 2024

arahmanan Feb 28, 2024

arahmanan left a comment

arahmanan Feb 28, 2024

	panic("failed to make reservation")
	panic(context.DeadlineExceeded)

BAAS-28597: Add counter for limiter wait calls #117

BAAS-28597: Add counter for limiter wait calls #117

Conversation

Calvinnix commented Feb 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arahmanan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Calvinnix Feb 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arahmanan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Calvinnix Feb 28, 2024 •

edited

Loading