Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opensuse builders have failures #967

Closed
hannesm opened this issue Aug 28, 2024 · 5 comments
Closed

opensuse builders have failures #967

hannesm opened this issue Aug 28, 2024 · 5 comments
Assignees

Comments

@hannesm
Copy link
Contributor

hannesm commented Aug 28, 2024

as seen on https://ocaml.ci.dev/github/mirage/mirage-vnetif/commit/bda581dd8b89f9832277a06470fe668ddc9fcd46/variant/opensuse-15.6-4.14_opam-2.2 and a bunch of other PRs

+ /usr/bin/sudo "zypper" "--non-interactive" "refresh"
- Retrieving repository 'Update repository of openSUSE Backports' metadata [...error]
- Repository 'Update repository of openSUSE Backports' is invalid.
- [repo-backports-update|http://download.opensuse.org/update/leap/15.6/backports/] Valid metadata not found at specified URL
- History:
-  - [|] Error trying to read from 'http://download.opensuse.org/update/leap/15.6/backports/'
-  - Location 'http://download.opensuse.org/update/leap/15.6/backports/repodata/repomd.xml' is temporarily unaccessible.
- 
- Please check if the URIs defined for this repository are pointing to a valid repository.
- Skipping repository 'Update repository of openSUSE Backports' because of the above error.
- Retrieving repository 'Non-OSS Repository' metadata [...error]
- Repository 'Non-OSS Repository' is invalid.
- [repo-non-oss|http://download.opensuse.org/distribution/leap/15.6/repo/non-oss/] Valid metadata not found at specified URL
- History:
-  - [|] Error trying to read from 'http://download.opensuse.org/distribution/leap/15.6/repo/non-oss/'
-  - Location 'http://download.opensuse.org/distribution/leap/15.6/repo/non-oss/repodata/repomd.xml' is temporarily unaccessible.
- 
- Please check if the URIs defined for this repository are pointing to a valid repository.
- Skipping repository 'Non-OSS Repository' because of the above error.
- Repository 'Main Repository' is up to date.
- Retrieving repository 'Update repository with updates from SUSE Linux Enterprise 15' metadata [.........
- ..error]
- Repository 'Update repository with updates from SUSE Linux Enterprise 15' is invalid.
- [repo-sle-update|http://download.opensuse.org/update/leap/15.6/sle/] Valid metadata not found at specified URL
- History:
-  - Location 'http://download.opensuse.org/update/leap/15.6/sle/repodata/5ab09dba36b7095eff4841729dadf66c553b39ead7045e25271149dbcb929206-primary.xml.gz' is temporarily unaccessible.
- 
- Please check if the URIs defined for this repository are pointing to a valid repository.
- Skipping repository 'Update repository with updates from SUSE Linux Enterprise 15' because of the above error.
- Retrieving repository 'Main Update Repository' metadata [...error]
- Repository 'Main Update Repository' is invalid.
- [repo-update|http://download.opensuse.org/update/leap/15.6/oss/] Valid metadata not found at specified URL
- History:
-  - [|] Error trying to read from 'http://download.opensuse.org/update/leap/15.6/oss/'
-  - Location 'http://download.opensuse.org/update/leap/15.6/oss/repodata/repomd.xml' is temporarily unaccessible.
- 
- Please check if the URIs defined for this repository are pointing to a valid repository.
- Skipping repository 'Main Update Repository' because of the above error.
- Retrieving repository 'Update Repository (Non-Oss)' metadata [...error]
- Repository 'Update Repository (Non-Oss)' is invalid.
- [repo-update-non-oss|http://download.opensuse.org/update/leap/15.6/non-oss/] Valid metadata not found at specified URL
- History:
-  - [|] Error trying to read from 'http://download.opensuse.org/update/leap/15.6/non-oss/'
-  - Location 'http://download.opensuse.org/update/leap/15.6/non-oss/repodata/repomd.xml' is temporarily unaccessible.
- 
- Please check if the URIs defined for this repository are pointing to a valid repository.
- Skipping repository 'Update Repository (Non-Oss)' because of the above error.
- Some of the repositories have not been refreshed because of an error.
Fatal error: System package update failed with exit code 4 at command:
    sudo zypper --non-interactive refresh
"/usr/bin/env" "bash" "-c" "opam update --depexts && opam install --cli=2.2 --depext-only -y mirage-vnetif.dev $DEPS" failed with exit status 99
2024-08-28 15:47.21: Job failed: Failed: Build failed
@shonfeder shonfeder self-assigned this Aug 28, 2024
@shonfeder
Copy link
Contributor

shonfeder commented Aug 28, 2024

This appears to be happening when attempting to update depexts, so I'm not sure that this is an ocaml-ci issue rather than a problem with OpenSUSE or (with opam's depext logic on OpenSUSE)?

A search of the error pointed me to https://forums.opensuse.org/t/zypper-broken-repository-is-invalid/142206, but that page currently reports a service outage.

However, https://forum.rockstor.com/t/zypper-repository-error-message/8676 notes this could be a transient issue that occurs during repository updates. I'll poke around the image and report back what I find.

@hannesm
Copy link
Contributor Author

hannesm commented Aug 28, 2024

Thanks for your investigations.

@shonfeder
Copy link
Contributor

Thanks for the report :)

@hannesm
Copy link
Contributor Author

hannesm commented Aug 28, 2024

I guess it is not very clear to me when to report an issue and when not - of course there are temporary failures from other systems that you can't fix in ocaml-ci. But then, there are some issues (e.g. DNS resolution errors, HTTP limit exceeded for GitHub downloads) that should be addressed by ocaml-ci.

In the end, I guess I'm still dreaming of a CI system where such a temporary failure from another system, as (maybe?) the issue reported here, is captured with a special exit code by the CI system, and it presents a warning sign (instead of a failure), and schedules another build in the future (when hopefully the other system is back in action). Now, I've been over the years talking about this a lot of times, but it seems like either nobody else would like to have such a system, or it is too hard to implement. Sadly so...

@shonfeder
Copy link
Contributor

shonfeder commented Aug 28, 2024

I tried to reproduce this locally using the docker instructions, and it just went thru fine. Also rebuilding the failed jobs on the PR you linked succeeded: https://ocaml.ci.dev/github/mirage/mirage-vnetif/commit/da7409bd0152b0a01692df9016e2cc724849ce3b

Seems this is indeed a transient issue with OpenSUSE.

guess I'm still dreaming of a CI system where such a temporary failure from another system, as (maybe?) the issue reported here, is captured with a special exit code by the CI system, and it presents a warning sign (instead of a failure), and schedules another build in the future (when hopefully the other system is back in action).

Better and more fine grained error handling is definitely desirable (see, E.g., ocurrent/opam-repo-ci#328) and scheduling retries around known failure points that may be flaky is also a good idea. That said, as you can see from this case, the stacks we are working on are pretty deep (here: ocurrent -> ocluster -> obuilder -> docker -> opam -> OpenSUSE -> zypper, with plenty of calls over the wire between various steps). Adding a mechanism that would allow us to tag certain kinds of errors as retryable is easy, but actually building up the catalogue of such errors for all the different levels is a virtually endless task :)

Still, getting that first part in place would definitely be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants