-
Notifications
You must be signed in to change notification settings - Fork 76
kola: Add --max-machines #1161
base: master
Are you sure you want to change the base?
kola: Add --max-machines #1161
Conversation
ffb769c
to
f1a758f
Compare
Tweaked this to default |
A couple questions: What happens if:
|
Nice, good catch!
Probably we should automatically skip tests that have a required machine count > I think we could at least get in the leaks and fixes for now, right? That alone should greatly reduce memory stress on the nodes we're allocated on. WDYT about splitting those out as a separate PR? |
Yeah, something like that.
OK done in But, I think we'll need this PR in order to make the pipeline truly reliable in the face of hard memory caps. |
This is only implemented for qemu at the moment, though it'd be a mostly mechanical change to propagate it to the other providers. For our pipeline testing, we need to have a hard cap on the number of qemu instances we spawn, otherwise we can go over the RAM allocated to the pod. Actually the FCOS pipeline today doesn't impose a hard cap, and my test pipeline in the coreosci (nested GCP virt) ended up bringing down the node via the OOM killer. There were a few bugs here; first we were leaking the spawned qemu instance. We also need to invoke `Wait()` synchronously in destruction. Then, add a dependency on the `golang/x/semaphore` library, and use it to implement a max limit. Closes: https://github.com/coreos/mantle/issues/1157
f1a758f
to
e2e5a03
Compare
Added a few reviewers, though it sounds like the above is needed so we avoid timeouts before this is ready. Is it fair to mark this WIP for now? |
And I think this might might sense to be redirected to the cosa repo. |
This is only implemented for qemu at the moment, though it'd
be a mostly mechanical change to propagate it to the other
providers.
For our pipeline testing, we need to have a hard cap on the number
of qemu instances we spawn, otherwise we can go over the RAM
allocated to the pod.
Actually the FCOS pipeline today doesn't impose a hard cap, and
my test pipeline in the coreosci (nested GCP virt) ended up bringing
down the node via the OOM killer.
There were a few bugs here; first we were leaking the spawned
qemu instance. We also need to invoke
Wait()
synchronously indestruction.
Then, add a dependency on the
golang/x/semaphore
library, anduse it to implement a max limit.
Closes: https://github.com/coreos/mantle/issues/1157