opensuse builders have failures #967

hannesm · 2024-08-28T17:23:35Z

as seen on https://ocaml.ci.dev/github/mirage/mirage-vnetif/commit/bda581dd8b89f9832277a06470fe668ddc9fcd46/variant/opensuse-15.6-4.14_opam-2.2 and a bunch of other PRs

+ /usr/bin/sudo "zypper" "--non-interactive" "refresh"
- Retrieving repository 'Update repository of openSUSE Backports' metadata [...error]
- Repository 'Update repository of openSUSE Backports' is invalid.
- [repo-backports-update|http://download.opensuse.org/update/leap/15.6/backports/] Valid metadata not found at specified URL
- History:
-  - [|] Error trying to read from 'http://download.opensuse.org/update/leap/15.6/backports/'
-  - Location 'http://download.opensuse.org/update/leap/15.6/backports/repodata/repomd.xml' is temporarily unaccessible.
- 
- Please check if the URIs defined for this repository are pointing to a valid repository.
- Skipping repository 'Update repository of openSUSE Backports' because of the above error.
- Retrieving repository 'Non-OSS Repository' metadata [...error]
- Repository 'Non-OSS Repository' is invalid.
- [repo-non-oss|http://download.opensuse.org/distribution/leap/15.6/repo/non-oss/] Valid metadata not found at specified URL
- History:
-  - [|] Error trying to read from 'http://download.opensuse.org/distribution/leap/15.6/repo/non-oss/'
-  - Location 'http://download.opensuse.org/distribution/leap/15.6/repo/non-oss/repodata/repomd.xml' is temporarily unaccessible.
- 
- Please check if the URIs defined for this repository are pointing to a valid repository.
- Skipping repository 'Non-OSS Repository' because of the above error.
- Repository 'Main Repository' is up to date.
- Retrieving repository 'Update repository with updates from SUSE Linux Enterprise 15' metadata [.........
- ..error]
- Repository 'Update repository with updates from SUSE Linux Enterprise 15' is invalid.
- [repo-sle-update|http://download.opensuse.org/update/leap/15.6/sle/] Valid metadata not found at specified URL
- History:
-  - Location 'http://download.opensuse.org/update/leap/15.6/sle/repodata/5ab09dba36b7095eff4841729dadf66c553b39ead7045e25271149dbcb929206-primary.xml.gz' is temporarily unaccessible.
- 
- Please check if the URIs defined for this repository are pointing to a valid repository.
- Skipping repository 'Update repository with updates from SUSE Linux Enterprise 15' because of the above error.
- Retrieving repository 'Main Update Repository' metadata [...error]
- Repository 'Main Update Repository' is invalid.
- [repo-update|http://download.opensuse.org/update/leap/15.6/oss/] Valid metadata not found at specified URL
- History:
-  - [|] Error trying to read from 'http://download.opensuse.org/update/leap/15.6/oss/'
-  - Location 'http://download.opensuse.org/update/leap/15.6/oss/repodata/repomd.xml' is temporarily unaccessible.
- 
- Please check if the URIs defined for this repository are pointing to a valid repository.
- Skipping repository 'Main Update Repository' because of the above error.
- Retrieving repository 'Update Repository (Non-Oss)' metadata [...error]
- Repository 'Update Repository (Non-Oss)' is invalid.
- [repo-update-non-oss|http://download.opensuse.org/update/leap/15.6/non-oss/] Valid metadata not found at specified URL
- History:
-  - [|] Error trying to read from 'http://download.opensuse.org/update/leap/15.6/non-oss/'
-  - Location 'http://download.opensuse.org/update/leap/15.6/non-oss/repodata/repomd.xml' is temporarily unaccessible.
- 
- Please check if the URIs defined for this repository are pointing to a valid repository.
- Skipping repository 'Update Repository (Non-Oss)' because of the above error.
- Some of the repositories have not been refreshed because of an error.
Fatal error: System package update failed with exit code 4 at command:
    sudo zypper --non-interactive refresh
"/usr/bin/env" "bash" "-c" "opam update --depexts && opam install --cli=2.2 --depext-only -y mirage-vnetif.dev $DEPS" failed with exit status 99
2024-08-28 15:47.21: Job failed: Failed: Build failed

The text was updated successfully, but these errors were encountered:

shonfeder · 2024-08-28T17:54:45Z

This appears to be happening when attempting to update depexts, so I'm not sure that this is an ocaml-ci issue rather than a problem with OpenSUSE or (with opam's depext logic on OpenSUSE)?

A search of the error pointed me to https://forums.opensuse.org/t/zypper-broken-repository-is-invalid/142206, but that page currently reports a service outage.

However, https://forum.rockstor.com/t/zypper-repository-error-message/8676 notes this could be a transient issue that occurs during repository updates. I'll poke around the image and report back what I find.

hannesm · 2024-08-28T18:12:28Z

Thanks for your investigations.

shonfeder · 2024-08-28T18:13:17Z

Thanks for the report :)

hannesm · 2024-08-28T18:18:56Z

I guess it is not very clear to me when to report an issue and when not - of course there are temporary failures from other systems that you can't fix in ocaml-ci. But then, there are some issues (e.g. DNS resolution errors, HTTP limit exceeded for GitHub downloads) that should be addressed by ocaml-ci.

In the end, I guess I'm still dreaming of a CI system where such a temporary failure from another system, as (maybe?) the issue reported here, is captured with a special exit code by the CI system, and it presents a warning sign (instead of a failure), and schedules another build in the future (when hopefully the other system is back in action). Now, I've been over the years talking about this a lot of times, but it seems like either nobody else would like to have such a system, or it is too hard to implement. Sadly so...

shonfeder · 2024-08-28T18:44:25Z

I tried to reproduce this locally using the docker instructions, and it just went thru fine. Also rebuilding the failed jobs on the PR you linked succeeded: https://ocaml.ci.dev/github/mirage/mirage-vnetif/commit/da7409bd0152b0a01692df9016e2cc724849ce3b

Seems this is indeed a transient issue with OpenSUSE.

guess I'm still dreaming of a CI system where such a temporary failure from another system, as (maybe?) the issue reported here, is captured with a special exit code by the CI system, and it presents a warning sign (instead of a failure), and schedules another build in the future (when hopefully the other system is back in action).

Better and more fine grained error handling is definitely desirable (see, E.g., ocurrent/opam-repo-ci#328) and scheduling retries around known failure points that may be flaky is also a good idea. That said, as you can see from this case, the stacks we are working on are pretty deep (here: ocurrent -> ocluster -> obuilder -> docker -> opam -> OpenSUSE -> zypper, with plenty of calls over the wire between various steps). Adding a mechanism that would allow us to tag certain kinds of errors as retryable is easy, but actually building up the catalogue of such errors for all the different levels is a virtually endless task :)

Still, getting that first part in place would definitely be helpful.

shonfeder self-assigned this Aug 28, 2024

shonfeder closed this as completed Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opensuse builders have failures #967

opensuse builders have failures #967

hannesm commented Aug 28, 2024

shonfeder commented Aug 28, 2024 •

edited

Loading

hannesm commented Aug 28, 2024

shonfeder commented Aug 28, 2024

hannesm commented Aug 28, 2024

shonfeder commented Aug 28, 2024 •

edited

Loading

opensuse builders have failures #967

opensuse builders have failures #967

Comments

hannesm commented Aug 28, 2024

shonfeder commented Aug 28, 2024 • edited Loading

hannesm commented Aug 28, 2024

shonfeder commented Aug 28, 2024

hannesm commented Aug 28, 2024

shonfeder commented Aug 28, 2024 • edited Loading

shonfeder commented Aug 28, 2024 •

edited

Loading

shonfeder commented Aug 28, 2024 •

edited

Loading