Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Github CI Anomaly Zoo #1394

Open
andres-erbsen opened this issue Oct 1, 2022 · 44 comments
Open

Github CI Anomaly Zoo #1394

andres-erbsen opened this issue Oct 1, 2022 · 44 comments

Comments

@andres-erbsen
Copy link
Contributor

@JasonGross
Copy link
Collaborator

JasonGross commented Oct 2, 2022

To shed a bit more light on this

If you go to the PR at mit-plv/bedrock2#280, we see
image
and the actions log shows
image
so it looks like the overall status of the commit includes runs of that commit across all branches, but the displayed checks are only for master or something?

@JasonGross
Copy link
Collaborator

I'm thinking this might actually be an OOM kill, especially given that we're running the checker while replacing vm_compute with the lazy reduction machine.

@JasonGross
Copy link
Collaborator

Plausibly related: actions/runner-images#736 (comment) (but this is only about mac)

Maybe worth asking about at https://github.com/orgs/community/discussions/

@andres-erbsen
Copy link
Contributor Author

andres-erbsen commented Oct 13, 2022

@andres-erbsen
Copy link
Contributor Author

andres-erbsen commented Oct 13, 2022

https://github.com/mit-plv/fiat-crypto/actions/runs/3245312644/jobs/5322679386

Cloning into 'boringssl'...
  error: 44224 bytes of body are still expected
  fetch-pack: unexpected disconnect while reading sideband packet
  fatal: early EOF
  fatal: fetch-pack: invalid index-pack output
  + exit 128

@andres-erbsen
Copy link
Contributor Author

@JasonGross
Copy link
Collaborator

https://github.com/mit-plv/fiat-crypto/actions/runs/3245312653/jobs/5322679307

This is new

# ocamlfind: Package `findlib' not found
# Error while running 'D:\a\fiat-crypto\fiat-crypto\_opam\bin/ocamlfind.exe query findlib -format %v' (exit code 2)
# Configuration script failed!

I've seen a bunch of errors about dynlink failing, but not this one

@JasonGross JasonGross pinned this issue Oct 14, 2022
@JasonGross
Copy link
Collaborator

We should include snippets of the logs in the comments; GH deletes logs after a couple months

@JasonGross
Copy link
Collaborator

https://github.com/mit-plv/fiat-crypto/actions/runs/3242704485/jobs/5316358457

COQFLAGS="-Q src/bedrock2 bedrock2 -Q src/bedrock2Examples bedrock2Examples -Q /github/workspace/rupicola/bedrock2/deps/coqutil/src/coqutil coqutil " ../etc/bytedump.sh bedrock2.PrintListByte.allBytes > special/BytedumpTest.out.tmp
make[2]: *** [Makefile:64: special/BytedumpTest.out] Error 1
make[1]: *** [Makefile:105: bedrock2_ex] Error 2

How do we debug this? Maybe we shouldn't suppress stderr in bytedump?

@JasonGross
Copy link
Collaborator

validate failed on bytedump after previous steps succeeded:
https://github.com/mit-plv/fiat-crypto/actions/runs/3247910630/jobs/5328487093#step:17:3485

@JasonGross
Copy link
Collaborator

Command exited with non-zero status 2
/Users/runner/work/fiat-crypto/fiat-crypto/rupicola/bedrock2/deps/coqutil/src/coqutil/Tactics/syntactic_unify.vo (real: 0.35, user: 0.09, sys: 0.09, mem: 57556 ko)
make[3]: *** [/Users/runner/work/fiat-crypto/fiat-crypto/rupicola/bedrock2/deps/coqutil/src/coqutil/Tactics/syntactic_unify.vo] Error 2

This suspiciously started happening when I started not trying to rebuild on Mac OS

@JasonGross
Copy link
Collaborator

Can anyone reproduce the (seemingly consistent?) issue that make -j2 fails on Mac OS the first time it's run after a clean checkout? c.f. #1458

@andres-erbsen
Copy link
Contributor Author

# Usage: coqworker.opt --kind=[proof|query|tactic] $args
# got -main-channel
# File ".\src/Curves/Edwards/XYZT/Basic.v", line 124, characters 6-336:
# Error:
# Anomaly
# "Uncaught exception Failure("The spawned process did not connect back in 10.0s")."
# Please report at [http://coq.inria.fr/bugs/.](http://coq.inria.fr/bugs/)
# 
# make: *** [Makefile.coq:793: src/Curves/Edwards/XYZT/Basic.vo] Error 129

https://github.com/mit-plv/fiat-crypto/actions/runs/3281473086/jobs/5403532417#step:17:1812

@andres-erbsen
Copy link
Contributor Author

@JasonGross
Copy link
Collaborator

Sometimes the version of Coq changes between one step and the next in the docker action
coq-community/docker-coq-action#80
https://github.com/mit-plv/fiat-crypto/actions/runs/3287380197/jobs/5416490558#step:17:501

This results in a bytedump failure because the bedrock2 files are compiled with an outdated version of Coq and so we get inconsistent assumptions

@andres-erbsen
Copy link
Contributor Author

@andres-erbsen
Copy link
Contributor Author

@andres-erbsen
Copy link
Contributor Author

@andres-erbsen
Copy link
Contributor Author

@andres-erbsen
Copy link
Contributor Author

@JasonGross
Copy link
Collaborator

JasonGross commented Nov 9, 2022

gcc: fatal error: no input files https://github.com/mit-plv/fiat-crypto/actions/runs/3420869283/jobs/5696244304#step:17:453

That's probably just me messing up how to get the version of gcc; I guess it's not gcc -v? The line to be fixed is

The actual failure here is

COQFLAGS="-Q src/bedrock2 bedrock2 -Q src/bedrock2Examples bedrock2Examples -Q /github/workspace/rupicola/bedrock2/deps/coqutil/src/coqutil coqutil " ../etc/bytedump.py bedrock2.PrintListByte.allBytes > special/BytedumpTest.out.tmp

Coq < Coq < Toplevel input, characters 1-55:
> Require bedrock2.PrintListByte bedrock2.PrintListByte.
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error:
Compiled library bedrock2.PrintListByte (in file /github/workspace/rupicola/bedrock2/bedrock2/src/bedrock2/PrintListByte.vo) makes inconsistent assumptions over library Coq.Init.Ltac

Coq < 
Coq < 
Unnamed_thm < 
Unnamed_thm < 
Unnamed_thm < Toplevel input, characters 32-63:
>   PrintListByte.print_list_byte bedrock2.PrintListByte.allBytes.
>                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error: The reference bedrock2.PrintListByte.allBytes was not found in the current environment.

Unnamed_thm < 
Coq < 
Error: make[2]: *** [Makefile:64: special/BytedumpTest.out] Error 2
Error: make[1]: *** [Makefile:105: bedrock2_ex] Error 2
make: *** No rule to make target 'bedrock2', needed by '.Makefile.coq.d'.  Stop.

which presumably is #1394 (comment)

@JasonGross
Copy link
Collaborator

https://github.com/mit-plv/fiat-crypto/actions/runs/3446654504/jobs/5751826046 Windows: Retrieve the Cygwin cache
Error: getaddrinfo ENOTFOUND www.cygwin.com

@JasonGross
Copy link
Collaborator

https://github.com/mit-plv/fiat-crypto/actions/runs/3457159795/jobs/5770405148#step:6:116

** Fatal error: Cannot run cygpath -m "libshell32" "libshell32.lib" "libshell32.dll.a" "libshell32.a" "D:/a/fiat-crypto/fiat-crypto/_opam/lib/ocaml\libshell32" "D:/a/fiat-crypto/fiat-crypto/_opam/lib/ocaml\libshell32.lib" "D:/a/fiat-crypto/fiat-crypto/_opam/lib/ocaml\libshell32.dll.a" "D:/a/fiat-crypto/fiat-crypto/_opam/lib/ocaml\libshell32.a" "lib\libshell32" "lib\libshell32.lib" "lib\libshell32.dll.a" "lib\libshell32.a" "D:/a/fiat-crypto/fiat-crypto/_opam/lib/ocaml\flexdll\libshell32" "D:/a/fiat-crypto/fiat-crypto/_opam/lib/ocaml\flexdll\libshell32.lib" "D:/a/fiat-crypto/fiat-crypto/_opam/lib/ocaml\flexdll\libshell32.dll.a" "D:/a/fiat-crypto/fiat-crypto/_opam/lib/ocaml\flexdll\libshell32.a" "/usr/lib/gcc/x86_64-w64-mingw32/11/libshell32" "/usr/lib/gcc/x86_64-w64-mingw32/11/libshell32.lib" "/usr/lib/gcc/x86_64-w64-mingw32/11/libshell32.dll.a" "/usr/lib/gcc/x86_64-w64-mingw32/11/libshell32.a" "/usr/x86_64-w64-mingw32/lib/x86_64-w64-mingw32/11/libshell32" "/usr/x86_64-w64-mingw32/lib/x86_64-w64-mingw32/11/libshell32.lib" "/usr/x86_64-w64-mingw32/lib/x86_64-w64-mingw32/11/libshell32.dll.a" "/usr/x86_64-w64-mingw32/lib/x86_64-w64-mingw32/11/libshell32.a" "/usr/x86_64-w64-mingw32/lib/libshell32" "/usr/x86_64-w64-mingw32/lib/libshell32.lib" "/usr/x86_64-w64-mingw32/lib/libshell32.dll.a" "/usr/x86_64-w64-mingw32/lib/libshell32.a" "/usr/x86_64-w64-mingw32/sys-root/mingw/lib/x86_64-w64-mingw32/11/libshell32" "/usr/x86_64-w64-mingw32/sys-root/mingw/lib/x86_64-w64-mingw32/11/libshell32.lib" "/usr/x86_64-w64-mingw32/sys-root/mingw/lib/x86_64-w64-mingw32/11/libshell32.dll.a" "/usr/x86_64-w64-mingw32/sys-root/mingw/lib/x86_64-w64-mingw32/11/libshell32.a" "/usr/x86_64-w64-mingw32/sys-root/mingw/lib/libshell32" "/usr/x86_64-w64-mingw32/sys-root/mingw/lib/libshell32.lib" "/usr/x86_64-w64-mingw32/sys-root/mingw/lib/libshell32.dll.a" "/usr/x86_64-w64-mingw32/sys-root/mingw/lib/libshell32.a"
# File "caml_startup", line 1:
# Error: Error during linking (exit code 2)
# make[1]: *** [Makefile.common:184: _build/default/coq-core.install] Error 1
# make[1]: *** Waiting for unfinished jobs....
# make[1]: Leaving directory '/cygdrive/d/a/fiat-crypto/fiat-crypto/_opam/.opam-switch/build/coq.8.15.2'
# make: *** [Makefile.make:122: submake] Error 2

@andres-erbsen
Copy link
Contributor Author

   2 |  opam exec -- cat ~/.opam/log/*
     |  ~~~~
     | The term 'opam' is not recognized as a name of a cmdlet, function, script file, or executable program. Check the
     | spelling of the name, or if a path was included, verify that the path is correct and try again.

https://github.com/mit-plv/fiat-crypto/actions/runs/3470364042/jobs/5798475008

@JasonGross
Copy link
Collaborator

   2 |  opam exec -- cat ~/.opam/log/*
     |  ~~~~
     | The term 'opam' is not recognized as a name of a cmdlet, function, script file, or executable program. Check the
     | spelling of the name, or if a path was included, verify that the path is correct and try again.

https://github.com/mit-plv/fiat-crypto/actions/runs/3470364042/jobs/5798475008

This is because I have the cat step set to run even if earlier steps failed. The actual issue here is that opam failed to install in an earlier step with

connect ETIMEDOUT 185.199.110.133:443
  Waiting 20 seconds before trying again
  connect ETIMEDOUT 185.199.110.133:443
  Waiting 14 seconds before trying again
  Error: connect ETIMEDOUT 185.199.110.133:443

@JasonGross
Copy link
Collaborator

validate times out in https://github.com/mit-plv/fiat-crypto/actions/runs/3470357503/jobs/5798461079 and https://github.com/mit-plv/fiat-crypto/actions/runs/3470364029/jobs/5798476434 because changed Coq version between steps results in project rebuild. We should really have a better way of dealing with Coq changing versions under us... (no progress on coq-community/docker-coq-action#80 yet...)

@JasonGross
Copy link
Collaborator

boringssl test failure at https://github.com/mit-plv/fiat-crypto/actions/runs/3524875493/jobs/5910863477#step:7:111 on a PR changing exact _ to refine _ in a Qed'd proof (aka a PR that cannot have changed anything impacting BoringSSL)

 FAILED: crypto_test_data.cc /home/runner/work/fiat-crypto/fiat-crypto/boringssl/build/crypto_test_data.cc 
  cd /home/runner/work/fiat-crypto/fiat-crypto/boringssl && /opt/hostedtoolcache/go/1.13.15/x64/bin/go run util/embed_test_data.go -file-list /home/runner/work/fiat-crypto/fiat-crypto/boringssl/build/embed_test_data_args.txt > /home/runner/work/fiat-crypto/fiat-crypto/boringssl/build/crypto_test_data.cc
  # command-line-arguments
  Error: util/embed_test_data.go:79:16: undefined: os.ReadFile
  Error: util/embed_test_data.go:129:16: undefined: os.ReadFile

@JasonGross
Copy link
Collaborator

Package g++-7 is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Package 'g++-7' has no installation candidate

https://github.com/mit-plv/fiat-crypto/actions/runs/3524875493/jobs/5936926857#step:3:67

@andres-erbsen
Copy link
Contributor Author

https://github.com/mit-plv/bedrock2/actions/runs/3653687403/jobs/6176913824 timed out for no reason I can see

@JasonGross
Copy link
Collaborator

https://github.com/mit-plv/bedrock2/actions/runs/3653687403/jobs/6176913824 timed out for no reason I can see

Windows does this sometimes, I think: it fails to quit terminal after all scripts complete

@JasonGross
Copy link
Collaborator

@andres-erbsen
Copy link
Contributor Author

https://github.com/mit-plv/fiat-crypto/actions/runs/7700820211/job/20985462543?pr=1811

error: failed retrieving file 'ocaml-compiler-libs-5.1.0-1-x86_64.pkg.tar.zst.sig' from geo.mirror.pkgbuild.com : Operation too slow. Less than 1 bytes/sec transferred the last 10 seconds
 coq-8.18.0-2-x86_64 downloading...
error: failed retrieving file 'ghc-9.0.2-3-x86_64.pkg.tar.zst.sig' from geo.mirror.pkgbuild.com : Operation too slow. Less than 1 bytes/sec transferred the last 10 seconds
warning: failed to retrieve some files
error: failed to commit transaction (unexpected error)

@andres-erbsen
Copy link
Contributor Author

Connecting to gitlab.alpinelinux.org (172.105.69.85:443)
wget: server returned error: HTTP/1.1 404 Not Found
ssl_client: write: Broken pipe
Error: Process completed with exit code 1.

https://github.com/mit-plv/rewriter/actions/runs/7705023867/job/20998315736?pr=148

@andres-erbsen
Copy link
Contributor Author

error: failed retrieving file 'perl-5.38.2-1-x86_64.pkg.tar.zst.sig' from geo.mirror.pkgbuild.com : Operation too slow. Less than 1 bytes/sec transferred the last 10 seconds
https://github.com/mit-plv/fiat-crypto/actions/runs/8750960028/job/24015572653?pr=1885

@JasonGross
Copy link
Collaborator

https://github.com/mit-plv/fiat-crypto/actions/runs/9056914029/job/24884198262#step:7:29

Run jirutka/setup-alpine@v1
Run sudo -E ./setup-alpine.sh
Prepare rootfs directory
Download static apk-tools
  ▷ Downloading https://gitlab.alpinelinux.org/api/v4/projects/5/packages/generic/v2.14.0/x86_64/apk.static
  curl: (7) Failed to connect to gitlab.alpinelinux.org port 443 after 9945 ms: Connection timed out
  
  Error occurred at line 176:
    173 | APK="$RUNNER_TEMP/apk"
    174 | 
    175 | info "Downloading ${INPUT_APK_TOOLS_URL%\#*}"
  > 176 | download_file "$INPUT_APK_TOOLS_URL" "$APK"
    177 | chmod +x "$APK"
    178 | 
    179 | 
  Error: Error occurred at line 176: download_file "$INPUT_APK_TOOLS_URL" "$APK" (see the job log for more information)
  Error: Process completed with exit code 1.

@JasonGross
Copy link
Collaborator

[1 of 2] Compiling Main             ( src/ExtractionHaskell/bedrock2_fiat_crypto.hs, src/ExtractionHaskell/bedrock2_fiat_crypto.o )
panic! (the 'impossible' happened)
  GHC version 9.10.1:
	heap overflow

Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug

https://github.com/mit-plv/fiat-crypto/actions/runs/9138561025/job/25130899016?pr=1913#step:4:4949

Maybe we need to bump -M7G to -M8G? (And maybe we should upstream a bug report to not call heap overflows impossible?)

@andres-erbsen
Copy link
Contributor Author

@JasonGross
Copy link
Collaborator

And maybe we should upstream a bug report to not call heap overflows impossible?

I've reported https://gitlab.haskell.org/ghc/ghc/-/issues/24927 and https://gitlab.haskell.org/ghc/ghc/-/issues/20683 about the memory issues

JasonGross added a commit that referenced this issue Jun 17, 2024
JasonGross added a commit that referenced this issue Jun 18, 2024
@JasonGross
Copy link
Collaborator

JasonGross commented Aug 5, 2024

[intern /home/coq/.opam/4.13.1+flambda/lib/coq/theories/btauto/Reflect.vo ...
Context: vmlibrary/fld=1/vm_to_patch/fld=3/vm_non_subst_reloc/tag=8/fld=0/vm_caml_prim/tag=0
#0(#0(#0(1,1),#0(0,0),#0(1,0),#0(2,2),#0(3,2),#0(4,1),#0(5,2),#0(6,3)),1,9)
 failed!]

Fatal Error: Failure: Validation failed: sum: unexpected tag (in tag=0).
             Please report at [http://coq.inria.fr/bugs/.](http://coq.inria.fr/bugs/)
Command exited with non-zero status 129

https://github.com/mit-plv/fiat-crypto/actions/runs/10189997626/job/28191187326#step:8:3829
logs_26665157425.zip
coq/coq#19455

@JasonGross
Copy link
Collaborator

The Debian CI has started dying with Error: Process completed with exit code 137. And no other info
https://github.com/mit-plv/fiat-crypto/actions/runs/10401520403/job/28804231157

OCAMLC src/ExtractionJsOfOCaml/bedrock2_fiat_crypto.ml -o src/ExtractionJsOfOCaml/bedrock2_fiat_crypto.byte
Warning: Stack size (16384) may be too small.
Warning: To avoid stack overflows in OCaml compilation, setting stack size to the recommended minimum of 66520 kbytes
src/ExtractionJsOfOCaml/bedrock2_fiat_crypto.byte (real: 25.09, user: 22.67, sys: 2.15, mem: 3301724 ko)
OCAMLC src/ExtractionJsOfOCaml/with_bedrock2_fiat_crypto.ml -o src/ExtractionJsOfOCaml/with_bedrock2_fiat_crypto.byte
Warning: Stack size (16384) may be too small.
Warning: To avoid stack overflows in OCaml compilation, setting stack size to the recommended minimum of 66520 kbytes
src/ExtractionJsOfOCaml/with_bedrock2_fiat_crypto.byte (real: 25.49, user: 22.96, sys: 2.17, mem: 3301512 ko)
Error: Process completed with exit code 137.

@JasonGross
Copy link
Collaborator

JasonGross commented Sep 20, 2024

On Windows:

opam exec -- etc/ci/github-actions-make.sh -j2 deps
Fatal error: exception C:\hostedtoolcache\windows\opam\2.2.1\x86_64\opam.exe: "create_process" failed on etc/ci/github-actions-make.sh: Exec format error

Probably related to upgrading setup-ocaml?

Edit: Reported upstream as ocaml/setup-ocaml#866

@JasonGross
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants