Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BAAS-32231: Track memory of temporary value created in Array.prototype.map calls #123

Merged
merged 4 commits into from
Jun 26, 2024

Conversation

Calvinnix
Copy link

This change allows us to prevent the following function from OOMing pods.

exports = async function () {
  let variants = [];
  for (let i = 0; i < 10; i++) { // array of size 1000 would easily OOM a pod
      variants.push({})
  }
  variants = variants.map((variant) => {
      return {
          a: 'a'.repeat(100 * 1024 * 1024), // 100mb,
      }
  })
  // this function creates 1GB of memory, and succeeds
};

vm.go Outdated Show resolved Hide resolved
@@ -786,6 +786,7 @@ func (r *Runtime) arrayproto_map(call FunctionCall) Value {
if _, stdSrc := o.self.(*arrayObject); stdSrc {
if arr, ok := a.self.(*arrayObject); ok {
values := make([]Value, length)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A concern I had about this approach was the potential for double counting memory, I'm including a breakdown about why this is not an issue.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

values variable memory utilization goes untracked.

@@ -786,6 +786,7 @@ func (r *Runtime) arrayproto_map(call FunctionCall) Value {
if _, stdSrc := o.self.(*arrayObject); stdSrc {
if arr, ok := a.self.(*arrayObject); ok {
values := make([]Value, length)
r.vm.tmpValues = values
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We assign a reference to the values array so that we are now tracking memory utilization of this temporary values array.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any chance this might be used concurrently (I don't think so but you might know better at this point)? I haven't looked into it but just thought about it as part of the changes we made recently in our http2 package and wanted to be careful since this is now shared from the runtime.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah so this is one of the main reasons that I steered away from a previous approach that consisted of putting the temporary objects directly onto the stack. I suspect this is possible but not something we'd commonly encounter.

I think an operation like the one below might give us concurrent usage like you're suggesting

Promise.all([arr1.map(asyncFunc), arr2.map(asyncFunc)])

The end result is that we'd only be able to track a single map operation at a time. Unless we do something like create a unique hash for the map operation and track the values array under that specific hash.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless we do something like create a unique hash for the map operation and track the values array under that specific hash.

Well technically since these are references we'd probably be fine with something like [][]Values and the memory tracker might also need to strip out the nil references since those are no longer needed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JS should be single-threaded, so this function should never be used concurrently within the same runtime unless we have another bug. I think it's safe to assume that this code will run on a single thread. Other members would run into data races as well if that wasn't the case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern was that even though it is single threaded, that it would not finish the entire map operation and bounce around.

Copy link
Author

@Calvinnix Calvinnix Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait.. I found a way to recreate what I was worried about

exports = async function(arg){
  var arr = [1,2,3,4,5]

  await Promise.all([
    arr.map(async x => { setTimeout(() => { console.log(x) }, 1000 - x) }),
    arr.map(async x => { setTimeout(() => { console.log(x) }, 1000 - x) }),
    arr.map(async x => { setTimeout(() => { console.log(x) }, 1000 - x) }),
  ])
};
> took 1.272834522s
> logs: 
5
5
5
4
3
4
4
2
2
3
3
2
1
1
1

Also TIL that these map functions execute in parallel for the entire array 👁️👄👁️

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to create a follow-up to address this concern so that we can go ahead and address the immediate concern of map memory usage because we know we've OOMed because of this.

cc: @arahmanan for visibility because I know we briefly discussed potential solutions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -794,6 +795,7 @@ func (r *Runtime) arrayproto_map(call FunctionCall) Value {
values[k] = callbackFn(fc)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[important] the corresponding value that is created is tracked within the callbackFn function because the value is pushed on the stack BUT that value is popped off the stack at the end of the function call.

The only difference is now instead of just stopping tracking this value altogether, we are moving it to be tracked in the tmpValues array.

@@ -794,6 +795,7 @@ func (r *Runtime) arrayproto_map(call FunctionCall) Value {
values[k] = callbackFn(fc)
}
}
r.vm.tmpValues = nil
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nil out the tmpValues array because the values array will now be tracked when this function returns and the returned array object is pushed onto the stack.

vm.go Outdated Show resolved Hide resolved
builtin_array.go Show resolved Hide resolved
Copy link

@Gabri3l Gabri3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice approach, do you mind also adding some comments in code so it's easier to track it down in the future and understand why we needed this?

@@ -786,6 +786,7 @@ func (r *Runtime) arrayproto_map(call FunctionCall) Value {
if _, stdSrc := o.self.(*arrayObject); stdSrc {
if arr, ok := a.self.(*arrayObject); ok {
values := make([]Value, length)
r.vm.tmpValues = values
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any chance this might be used concurrently (I don't think so but you might know better at this point)? I haven't looked into it but just thought about it as part of the changes we made recently in our http2 package and wanted to be careful since this is now shared from the runtime.

vm.go Outdated Show resolved Hide resolved
checkMem();
return "a"
})`,
expectedSizeDiff: 4 * (1 + SizeString),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notable thing here is that we are now seeing a size diff (this is 0 before this fix)

@@ -561,6 +561,60 @@ func TestStashMemUsage(t *testing.T) {
}
}

func TestTmpValuesMemUsage(t *testing.T) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy pasta from the stack test 😄

Copy link

@arahmanan arahmanan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just some minor stuff from my end.

checkMem();
return "a"
})`,
expectedSizeDiff: 4 * (1 + SizeString),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we multiplying by 4 because we're checking the memory before returning the last element (number 5)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's correct

vm.go Outdated
return ValuesMemUsage(s, ctx)
}

func ValuesMemUsage(s []Value, ctx *MemUsageContext) (memUsage uint64, err error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to export this function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah goja is under the same package, yeah I'll make this a private function, nice catch!

vm_test.go Outdated
name: "should exit early given tmpValues over the memory limit",
val: []Value{valueInt(99), valueInt(99), valueInt(99), valueInt(99)},
memLimit: 0,
expectedMem: SizeInt,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] thoughts on adding a similar test without the limit set to 0 to make sure that we're iterating over all values.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the test above to account for multiple values, does that work for you?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 perfect!

Copy link

@Gabri3l Gabri3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super minor comments, 🚀

memory_test.go Show resolved Hide resolved
vm.go Show resolved Hide resolved
@Calvinnix Calvinnix requested a review from Gabri3l June 26, 2024 18:57
Copy link

@kpatel71716 kpatel71716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice stuff! I'm also good pushing the follow-up into BAAS-32408

Copy link

@Gabri3l Gabri3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Calvinnix Calvinnix merged commit fbae8e7 into mongodb-forks:realm Jun 26, 2024
2 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants