-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error on 1.10 ubuntu long #3184
Comments
No Idea |
Some weird GC corruption that seems to happen when the Serialization/IPC tests happen, it seems related to julia tasks but I haven't been able to reproduce this locally. I have the long testset running in a loop with So far I got only one other crash but in the test group
|
|
The same crash as described by @benlorenz happened also in the corresponding test run for #3018 after the changes that were pushed yesterday. |
The second backtrace reported in here by @benlorenz involves Singular.jl and the primdec library function minAssGTZ -- specifically the code in Singular.jl which converts its return value to Julia. Maybe there is a |
After digging into the first backtrace again, this is a GC corruption error (https://github.com/oscar-system/Oscar.jl/actions/runs/7364038493/job/20044077891#step:7:4949), so this could be due to the same issue. |
I have a preliminary fix for the crash I reported ( |
The original error ( |
Another occurence (on macOS): https://github.com/oscar-system/Oscar.jl/actions/runs/7585991305/job/20663114060?pr=3213 |
In both recent occurrences, the crash happend shortly after we see
which I think means it is probably in the middle of testing |
Specifically, if we add a "Starting tests..." message before loading IPC.jl, and also force a full GC before that message, then perhaps we can get a better idea as to whether the corruption happens before IPC.jl, or during it? |
I can add the message, but I would like to hold off a bit with adding something like an explicit GC now since we just started doing the tests with libsingular_julia 0.40.11 which is the first version including my sleftv fix. (At least until we see another error with that version...) |
It still happens with the new libsingular and even with the explicit GC call it happens within the IPC.jl tests: https://github.com/oscar-system/Oscar.jl/actions/runs/7638814195/job/20810486432?pr=3229#step:8:4959 |
Also happened https://github.com/oscar-system/Oscar.jl/actions/runs/7638161558/job/20808482695?pr=3226 Could it be that it again can only reproduced on a memory starved machine, with 7-8 GB RAM? |
The workers should be less memory starved now, they were recently upgraded to have 4 CPUs and 16 GB of memory. |
A recent crash is reported at https://github.com/oscar-system/Oscar.jl/actions/runs/7642653452/job/20822790076?pr=3236 |
I have opened a PR to disable the IPC test for now while I try to debug this further: #3246 |
And herr is an instance of the crash with Julia 1.9: https://github.com/oscar-system/Oscar.jl/actions/runs/7665378425/job/20891166477?pr=3247 |
Thanks for noticing. That is interesting, it turns out that the effect of doing |
Our CI looks a lot better now without the IPC.jl tests, which should help with development. But I am continuing to look into this. I just found this one during
from https://github.com/oscar-system/Oscar.jl/actions/runs/7679187557/job/20929824694?pr=3212#step:8:1790 |
After some more debugging I found that the error will quite surely be gone once 1.10.1 is released, fixed via JuliaLang/julia@ In this workflow I have about 150 successful runs of the long group including the IPC.jl tests, with an intermediate julia build from the So once that is released I will try to reactivate these tests and hopefully close this ticket. |
…lved (oscar-system#3246)" (oscar-system#3368) * Revert "Serialization: disable IPC test until oscar-system#3184 is solved (oscar-system#3246)" This reverts commit 67ccc93. * tests: remove GC.gc() before IPC tests
This is back: https://github.com/Nemocas/Nemo.jl/actions/runs/8546742962/job/23417708965?pr=1700 (This downstream test run only checks Oscar.) |
If one looks at https://github.com/oscar-system/Oscar.jl/commits/master/, one sees that often "Run tests / test (~1.10.0-0, long, ubuntu-latest) (push)" fails. The error looks scary, e.g. in https://github.com/oscar-system/Oscar.jl/actions/runs/7364038493/job/20044077891#step:7:4952 and https://github.com/oscar-system/Oscar.jl/actions/runs/7364038493/job/20044077891#step:7:26094:
Does anyone have an idea where that might be coming from? I have not tried to reproduce it locally. It does not look like #2441.
CC: @lgoettgens @benlorenz
The text was updated successfully, but these errors were encountered: