-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: reduce “Waiting on concurrency group” #8
Conversation
When setting the attribute concurrency_group on deploy steps, set concurrency_method: eager. By default (ordered), a BuildKite build’s deploy steps will wait for all the smoke_test and build steps of a previous build. This is a problem in rollbacks, which run much faster than normal builds because they have no smoke_test or build steps, so they are likely to reach their deploy steps before the previous build does.
Set the concurrency and concurrency_group attributes only on deploy steps, not validation_test, so that a validation_test step does not block another BuildKite build’s deploy step. Note that one build’s validation test may fail if another build deploys some other code, but that is something we will just have to be aware of when two deploys happen right after each other.
On this point, I believe this was introduced to solve that very problem, but what was the reason for having to revert? If there is a validation test that is running, but we have a deployment we want to do immediately (for rollback, etc) and do not want to be blocked, then I think we should probably cancel the validation test at that time. Is there some other reason? |
Yes, just to make deploys not wait for the previous validation_test.
Yes, that works too. I am willing to revert that commit. |
Okay, thank you. I think we would want to keep the current behavior if possible. Do we need to set |
This reverts commit c25c111. This was a minor speedup and we don't need it since we can easily cancel validation test steps if necessary.
Yes, I believe we should set As for what happens when there are two steps in sequence that are both In testing, I observed that when you use concurrency_group with eager, the jobs can interleave:
|
Thanks a lot for testing. So, if I understand correctly, if |
Correct. I think usually it will do the right thing of delaying the next deployment until the previous validation_test finishes. But there is a possibility of interleaving the next deploy (or some steps within a multistep deploy) before the previous validation_test. I think making validation_test concurrency:1 still does some good even though it is not reliable anymore. For our use case, I think “ordered” is what we want most of the time. One build should deploy to environment:qa (and the environment:prod build gets blocked) before the next build. It’s only rollback deploys that should cut in line before the previous build’s deploy job. I don’t know a way of letting only rollbacks run before the previous build’s deploy other than manually cancelling the previous build or changing to “eager”. |
@yonran, could we force "ordered" for deploys to QA but not for deploys to prod? Since rollbacks are prod-only, would that allow a rollback-only exception? |
Yes we could do that. That would ensure that automatic qa deploys and tests are still ordered, while cautious prod deploys use the riskier str for “Deploy #\d+ to prod.example.com” builds. Thinking further, perhaps we could actually make the prod deploy + validation_test ordered too if we dynamically insert barriers when a deploy is starting (see Buildkite Blog (2020-11): Concurrency Gates or BuildKite Docs: Controlling concurrency: Concurrency and parallelism) (add one barrier after the |
Given that, why is |
fix: make deploy steps from different builds unordered
When setting the attribute
concurrency_group
on deploy steps, setconcurrency_method: eager
. By default (ordered
), a BuildKite build’sdeploy
steps will wait for all thesmoke_test
and build steps of a previous build. This is a problem in rollbacks, which run much faster than normal builds because they have nosmoke_test
orbuild
steps, so they are likely to reach theirdeploy
steps before the previous build does.This will make
concurrency: 1
act in a similar manner to controlling concurrency by limiting the number of deploy queues. See BuildKite Docs: Controlling Concurrency: Controlling command Order.fix: Allow concurrent validation_test steps(this commit was removed)
Set theconcurrency
andconcurrency_group
attributes only ondeploy
steps, notvalidation_test
, so that avalidation_test
step does not block another BuildKite build’sdeploy
step. Note that one build’svalidation_test
may fail if another build deploys some other code, but that is something we will just have to be aware of when two deploys happen right after each other.This will make the behavior more similar to the previous behavior.validation_test
runs on a different queue thandeploy
so it never used to blockdeploy
before we addedconcurrency_group
.