-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opensuse builders have failures #967
Comments
This appears to be happening when attempting to update depexts, so I'm not sure that this is an ocaml-ci issue rather than a problem with OpenSUSE or (with opam's depext logic on OpenSUSE)? A search of the error pointed me to https://forums.opensuse.org/t/zypper-broken-repository-is-invalid/142206, but that page currently reports a service outage. However, https://forum.rockstor.com/t/zypper-repository-error-message/8676 notes this could be a transient issue that occurs during repository updates. I'll poke around the image and report back what I find. |
Thanks for your investigations. |
Thanks for the report :) |
I guess it is not very clear to me when to report an issue and when not - of course there are temporary failures from other systems that you can't fix in ocaml-ci. But then, there are some issues (e.g. DNS resolution errors, HTTP limit exceeded for GitHub downloads) that should be addressed by ocaml-ci. In the end, I guess I'm still dreaming of a CI system where such a temporary failure from another system, as (maybe?) the issue reported here, is captured with a special exit code by the CI system, and it presents a warning sign (instead of a failure), and schedules another build in the future (when hopefully the other system is back in action). Now, I've been over the years talking about this a lot of times, but it seems like either nobody else would like to have such a system, or it is too hard to implement. Sadly so... |
I tried to reproduce this locally using the docker instructions, and it just went thru fine. Also rebuilding the failed jobs on the PR you linked succeeded: https://ocaml.ci.dev/github/mirage/mirage-vnetif/commit/da7409bd0152b0a01692df9016e2cc724849ce3b Seems this is indeed a transient issue with OpenSUSE.
Better and more fine grained error handling is definitely desirable (see, E.g., ocurrent/opam-repo-ci#328) and scheduling retries around known failure points that may be flaky is also a good idea. That said, as you can see from this case, the stacks we are working on are pretty deep (here: ocurrent -> ocluster -> obuilder -> docker -> opam -> OpenSUSE -> zypper, with plenty of calls over the wire between various steps). Adding a mechanism that would allow us to tag certain kinds of errors as retryable is easy, but actually building up the catalogue of such errors for all the different levels is a virtually endless task :) Still, getting that first part in place would definitely be helpful. |
as seen on https://ocaml.ci.dev/github/mirage/mirage-vnetif/commit/bda581dd8b89f9832277a06470fe668ddc9fcd46/variant/opensuse-15.6-4.14_opam-2.2 and a bunch of other PRs
The text was updated successfully, but these errors were encountered: