unstage on status going from success to failure, and unconditionally unstage on sync #950

xmo-odoo · 2024-09-13T15:11:32Z

odoo/odoo#165931 had a rough life: everything seemed fine and it chilled for a week or two then it got staged. At this point, it unwittingly started failing stagings. Which happens. It's ok.

However things then took a turn for the worse: a well meaning individual looking for the guilty rebuilt it, which caused a failure but that failure was a false negative (a non-deterministic test): https://runbot.odoo.com/runbot/build/68162127 so everything looked OK.

However getting a failure notification the reviewer updated the pull request: odoo/odoo#165931 (comment) which did reveal the true failure also present on stagings: https://runbot.odoo.com/runbot/build/68174601

Except... the PR was still breaking stagings hours later and had to be r-'d: odoo/odoo#165931 (comment)

That is due to the combination of two... assumptions? Issues? Oversight?

statuses are assumed to go forwards, without the PR getting updated a status normally doesn't go from success to failure, as a result there is no unstaging on that path:

runbot/runbot_merge/models/pull_requests.py

Lines 1165 to 1185 in 146564a

    
               @api.depends( 
        
                   'statuses', 'overrides', 'target', 'parent_id', 
        
                   'repository.status_ids.context', 
        
                   'repository.status_ids.branch_filter', 
        
                   'repository.status_ids.prs', 
        
               ) 
        
               def _compute_statuses(self): 
        
                   for pr in self: 
        
                       statuses = {**json.loads(pr.statuses), **pr._get_overrides()} 
        
                       pr.statuses_full = json.dumps(statuses, indent=4) 
        
                       st = 'success' 
        
                       for ci in pr.repository.status_ids._for_pr(pr): 
        
                           v = (statuses.get(ci.context) or {'state': 'pending'})['state'] 
        
                           if v in ('error', 'failure'): 
        
                               st = 'failure' 
        
                               break 
        
                           if v == 'pending': 
        
                               st = 'pending' 
        
                       pr.status = st

however on update a PR is only unstaged if it's ready (after all how could it be staged and not ready):

runbot/runbot_merge/controllers/__init__.py

Lines 288 to 289 in 3bc5b4e

    
           if pr_obj.state == 'ready': 
        
               pr_obj.unstage("updated by %s", event['sender']['login'])

as it turns out this is not an optimisation (though it looks like one) but more of an artefact of the original structure where an update didn't fully reset PRs:

runbot/runbot_merge/controllers/__init__.py

Lines 154 to 160 in 5c4018b

    
           elif pr_obj.state == 'ready': 
        
               pr_obj.state = 'approved' 
        
               pr_obj.staging_id.cancel( 
        
                   "Updated PR %s:%s, removing staging %s", 
        
                   pr_obj.repository.name, pr_obj.number, 
        
                   pr_obj.staging_id, 
        
               )

when the behaviour was later changed in ec5d60d this bit wasn't touched

The issue is that the failed rebuild transitioned the PR back into "approved" (reviewed but no or failed statuses), so by the time the PR was updated it was not ready anymore, and thus didn't get unstaged on that second step either.

the PR should be unstaged unconditionally on update, the optimisation is very limited as unstaging just checks if the PR is staged, somethign which isn't free (we need to search through the o2m between batches and stagings) but probably not the worst
CI failure should probably unstage, in this case the failure was illegitimate but there could be scenarios where it's very legit e.g.
- trigger a time-based failure during a rebuild (so that's an actual failure, just inconsistent)
- rebase-and-rebuild failing assuming that sends statuses on the original commit, doing this would have caught the issue (which looks to be an incompatibility between this PR and an other change merged since it'd last been built)
- remove an override set in error

The text was updated successfully, but these errors were encountered:

xmo-odoo · 2024-09-13T15:11:56Z

/cc @d-fence

And unconditionally unstage when the HEAD of a PR is synchronised. While a rebuild on a PR which was previously staged can be a false positive (e.g. because it hit a non-derministic test the second time around) it can also be legitimate e.g. auto-rebase of an old PR to check it out. In that case the PR should be unstaged. Furthermore, as soon as the PR gets rebuilt it goes back into `approved` (because the status goes to pending and currently there's no great way to suppress that in the rebuild case without also fucking it up for the sync case). This would cause a sync on the PR to be missed (in that it would not unstage the PR), which is broken. Fix that by not checking the state of the PR beforehand, it seems to be an unnecessary remnant of older code, and not really an optimisation (or at least one likely not worth bothering with, especially as we then proceed to perform a full PR validation anyway). Fixes #950

xmo-odoo added the mergebot label Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unstage on status going from success to failure, and unconditionally unstage on sync #950

unstage on status going from success to failure, and unconditionally unstage on sync #950

xmo-odoo commented Sep 13, 2024

xmo-odoo commented Sep 13, 2024

unstage on status going from success to failure, and unconditionally unstage on sync #950

unstage on status going from success to failure, and unconditionally unstage on sync #950

Comments

xmo-odoo commented Sep 13, 2024

xmo-odoo commented Sep 13, 2024