Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal associated primes #3705

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

HechtiDerLachs
Copy link
Collaborator

An attempt to make the specialized functionality in Singular for zero dimensional ideals available.

This seemed to be useful for @simonbrandhorst in some examples, but now I can't even get the tests to terminate. Let's see what the CI says.

@wdecker : The documentation reads as if only QQ was allowed as a coefficient ring. Do you remember whether this is the case? Because it seems to have run also over number fields.

@HechtiDerLachs HechtiDerLachs marked this pull request as draft May 8, 2024 12:12
@HechtiDerLachs
Copy link
Collaborator Author

@ederc : Could you have a look at the tests? I don't understand what's going wrong. But maybe minimal_primes now called singular_generators before and that lead to some false caching?

@simonbrandhorst
Copy link
Collaborator

Thanks for wrapping this @HechtiDerLachs !

@ederc
Copy link
Member

ederc commented May 8, 2024

Well, the tests fail because you try to apply assPrimes from Singular on a zero dimensional ideal over a finite field, but assPrimes assumes that the ideal lives over QQ. AFAIK there is a modular std call behind the scenes in assPrimes.

@HechtiDerLachs
Copy link
Collaborator Author

The problem I meant occured in an earlier test run. Some call to dim was complaining that singular_groebner_generators did not return a groebner basis. But it doesn't seem to be reproduced anymore; sorry for calling you.

But what you say about minAss is good to know. I will restrict the cases where it's called.

@HechtiDerLachs
Copy link
Collaborator Author

@ederc : Now the error is back, see this test.

@ederc
Copy link
Member

ederc commented May 9, 2024

There was a missing caching of the isGB flag if a GB was computed during small_generating_set. I pushed a fix to your branch.

@simonbrandhorst
Copy link
Collaborator

is there anything holding this PR up @HechtiDerLachs ?

@HechtiDerLachs
Copy link
Collaborator Author

The tests were still failing. I just had a look and it seems there is another bug in minimal_primes (due to me, unfortunately). I will have a look.

@HechtiDerLachs
Copy link
Collaborator Author

@ederc : I'm sorry, but it looks like I accidentally overwrote your fixes to this branch when doing a rebase. Do you still have them somewhere? If yes, could you push them here again? Thx!

@ederc
Copy link
Member

ederc commented Jun 29, 2024

I do not have this code anymore, we need to look again where the isGB flag needs to be set.

@benlorenz
Copy link
Member

@ederc : I'm sorry, but it looks like I accidentally overwrote your fixes to this branch when doing a rebase. Do you still have them somewhere? If yes, could you push them here again? Thx!

The Github UI shows HechtiDerLachs force-pushed the minimal_associated_primes branch from 9aa5177 to 2cd696e, and clicking on that 9aa5177 commit should get you to the last commit before the force push, and the history is also available. With that commit hash known you can also try doing git log 9aa5177 locally since git will keep lost commits for a few days.
You could also look through git reflog locally.

@ederc
Copy link
Member

ederc commented Jun 30, 2024

@HechtiDerLachs Caching isGB is back in.

@HechtiDerLachs
Copy link
Collaborator Author

HechtiDerLachs commented Jun 30, 2024

Thanks a lot @ederc and @benlorenz ! I was hoping that something like this was possible, but didn't know how.

Unfortunately it seems that a lot of tests time out. Or something else goes wrong which I do not fully understand, yet.

Edit: I checked two of the failing tests locally around the point where they were cancelled and they run just fine on my machine.

Copy link

codecov bot commented Jul 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.61%. Comparing base (0fc7ce7) to head (8914d1f).
Report is 61 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3705      +/-   ##
==========================================
- Coverage   84.63%   79.61%   -5.02%     
==========================================
  Files         600      615      +15     
  Lines       82622    86506    +3884     
==========================================
- Hits        69927    68873    -1054     
- Misses      12695    17633    +4938     
Files with missing lines Coverage Δ
src/Rings/mpoly-ideals.jl 91.63% <100.00%> (-1.69%) ⬇️

... and 262 files with indirect coverage changes

@HechtiDerLachs
Copy link
Collaborator Author

I really can't make sense of the failing tests. Everything that I tried on my local machine about things where the CI gets stuck really goes through for me. And in some cases I can't even find the error messages.

@HechtiDerLachs HechtiDerLachs marked this pull request as ready for review July 1, 2024 09:01
@fingolfin
Copy link
Member

Triage talked about it yesterday and we suspect that it's the same problem as in #3905 i.e. Singular tries to start subprocesses and gets stuck. So hopefully @hannes14 can take a look when he's back from vacation.

@fieker
Copy link
Contributor

fieker commented Aug 28, 2024

Its still under Hans aegis:

  • it is fixed (in Singular)
  • it needs the release cascade
  • (maybe)

@fingolfin fingolfin closed this Sep 4, 2024
@fingolfin fingolfin reopened this Sep 4, 2024
@fingolfin
Copy link
Member

@hannes14 updated Singular_jll and just told me that Singular.jl should automatically use that. So I've now restarted the tests, let's see if it helps.

@benlorenz
Copy link
Member

benlorenz commented Sep 4, 2024

I looked into the errors / warnings a bit, so far I have found the following:

│    Can't call method "instance_for_owner" on an undefined value at /Users/aaruni/Desktop/oscar-runners/runner-4/julia-depot/scratchspaces/d720cf60-89b5-51f5-aff5-213f193123e7/polymake_15999920327334887436_1.10_depstree_v2/share/polymake/perllib/Polymake/Core/BigObject.pm line 775 during global destruction.

This is triggered from an atexit handler from the perl code which tries to clean up polymake objects. Not totally sure why this error doesn't appear during normal process exit, maybe because some other finalizers need to run first but these are not triggered correctly for these processes.

polymake:  WARNING: Automatic update of the interface definition file /home/runner/.julia/scratchspaces/d720cf60-89b5-51f5-aff5-213f193123e7/polymake_11299506659249835626_1.10_userdir/wrappers.0/apps/fan/cpperl/wrap-check_fan.cpperl refused:
2024-09-04T09:36:57.9086549Z Timestamp file /home/runner/.julia/scratchspaces/d720cf60-89b5-51f5-aff5-213f193123e7/polymake_11299506659249835626_1.10_userdir/wrappers.0/build/Opt/.apps.built is missing unxpectedly.

This is also during process exit, maybe because multiple processes are trying to modify the same files at the same time (and this is not really a place where we can add a pidlock). All these processes are trying to save the same new auto-generated code that was probably triggered in the original process.

      From worker 4:	libc++abi: terminating due to uncaught exception of type std::__1::bad_function_call: std::exception
      From worker 4:	
      From worker 4:	[68228] signal (6): Abort trap: 6
      From worker 4:	in expression starting at /Users/aaruni/Desktop/oscar-runners/runner-3/_work/Oscar.jl/Oscar.jl/test/AlgebraicGeometry/Schemes/resolutions.jl:1
      From worker 4:	__pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line)
      From worker 4:	Allocations: 509000079 (Pool: 508678466; Big: 321613); GC: 90

This is in a macos job and I don't really know what exactly triggers this, but this could also be due to code running at exit without some julia exit handlers running before this, maybe related to this:
https://github.com/oscar-system/Polymake.jl/blob/master/src/Polymake.jl#L317-L322

Note:
All these subprocesses dying during exit should not really cause the tests to fail (only the doctests due to the printing).

Many of the CI jobs also timed out, something got a lot slower here or got stuck somewhere?

@benlorenz
Copy link
Member

benlorenz commented Sep 5, 2024

I ran the Oscar tests on this branch locally and after a while it also got stuck with many processes (22 processes including the main one), all of them seem to be sleeping / waiting.
(The PC has 12 Cores / 24 Threads, AMD Ryzen 9 5900X)

The main process (28895) is deep inside some singular code, called from minimal_primes:

Test Summary:             | Pass  Total  Time
mpolyquo-localizations.jl |   37     37  2.1s

======================================================================================
Information request received. A stacktrace will print followed by a 1.0 second profile
======================================================================================

cmd: /net/site-local.linux64/julia/julia-1.10.5/bin/julia 28895 running 1 of 1

signal (10): User defined signal 1
__poll at /lib64/libc.so.6 (unknown line)
_Z12slStatusSsiLP6slistsiPi at /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libSingular.so (unknown line)
_Z10jjWAIT1ST1P6sleftvS0_ at /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libSingular.so (unknown line)
_Z15iiExprArith1TabP6sleftvS0_iPK8sValCmd1iPK13sConvertTypes at /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libSingular.so (unknown line)
_Z7yyparsev at /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libSingular.so (unknown line)
_Z10iiAllStartP8procinfoPKc13feBufferTypesi at /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libSingular.so (unknown line)
_Z8iiPStartP5idrecP6sleftv at /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libSingular.so (unknown line)
_Z11iiMake_procP5idrecP11sip_packageP6sleftv at /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libSingular.so (unknown line)
_Z6jjPROCP6sleftvS0_S0_ at /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libSingular.so (unknown line)
_ZL21iiExprArith2TabInternP6sleftvS0_iS0_iPK8sValCmd2iiPK13sConvertTypes.part.85 at /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libSingular.so (unknown line)
...
_Z10iiAllStartP8procinfoPKc13feBufferTypesi at /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libSingular.so (unknown line)
_Z8iiPStartP5idrecP6sleftv at /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libSingular.so (unknown line)
_Z11iiMake_procP5idrecP11sip_packageP6sleftv at /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libSingular.so (unknown line)
_Z15ii_CallLibProcMPKcPPvPiP8ip_sringRi at /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libSingular.so (unknown line)
_Z31call_singular_library_procedureNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEP8ip_sringN5jlcxx8ArrayRefIP11_jl_value_tLi1EEE at /home/datastore/lorenz/software/julia/depot/artifacts/9e729ade26a239c52713155b6c560dd4615b31a5/lib/libsingular_julia.so (unknown line)
_ZNSt17_Function_handlerIFP11_jl_value_tNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEP8ip_sringN5jlcxx8ArrayRefIS1_Li1EEEEPSD_E9_M_invokeERKSt9_Any_dataOS7_OS9_OSC_ at /home/datastore/lorenz/software/julia/depot/artifacts/9e729ade26a239c52713155b6c560dd4615b31a5/lib/libsingular_julia.so (unknown line)
_ZN5jlcxx6detail11CallFunctorIP11_jl_value_tJNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEP8ip_sringNS_8ArrayRefIS3_Li1EEEEE5applyEPKvNS_13WrappedCppPtrESH_P10jl_array_t at /home/datastore/lorenz/software/julia/depot/artifacts/9e729ade26a239c52713155b6c560dd4615b31a5/lib/libsingular_julia.so (unknown line)
call_singular_library_procedure at /home/datastore/lorenz/software/julia/depot/packages/CxxWrap/5IZvn/src/CxxWrap.jl:624
unknown function (ip: 0x148fccf9d9e0)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
low_level_caller at /home/datastore/lorenz/software/julia/depot/packages/Singular/A4riV/src/caller.jl:399
assPrimes at /home/datastore/lorenz/software/julia/depot/packages/Singular/A4riV/src/Meta.jl:44
unknown function (ip: 0x148fccf9d2b5)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
#minimal_primes#346 at /home/datastore/lorenz/software/julia/Oscar.jl/src/Rings/mpoly-ideals.jl:1072
minimal_primes at /home/datastore/lorenz/software/julia/Oscar.jl/src/Rings/mpoly-ideals.jl:1068 [inlined]
__compute_is_prime__ at /home/datastore/lorenz/software/julia/Oscar.jl/src/Rings/mpoly-ideals.jl:1663
#373 at /home/datastore/lorenz/software/julia/depot/packages/AbstractAlgebra/sxDV6/src/Attributes.jl:357 [inlined]
get! at ./dict.jl:479
get_attribute! at /home/datastore/lorenz/software/julia/depot/packages/AbstractAlgebra/sxDV6/src/Attributes.jl:230 [inlined]
is_prime at /home/datastore/lorenz/software/julia/Oscar.jl/src/Rings/mpoly-ideals.jl:1662
unknown function (ip: 0x148fca651095)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:3077
__compute_is_prime__ at /home/datastore/lorenz/software/julia/Oscar.jl/src/Rings/mpolyquo-localizations.jl:2023
#1001 at /home/datastore/lorenz/software/julia/depot/packages/AbstractAlgebra/sxDV6/src/Attributes.jl:357 [inlined]
get! at ./dict.jl:479
get_attribute! at /home/datastore/lorenz/software/julia/depot/packages/AbstractAlgebra/sxDV6/src/Attributes.jl:230 [inlined]
is_prime at /home/datastore/lorenz/software/julia/Oscar.jl/src/Rings/mpolyquo-localizations.jl:2022
unknown function (ip: 0x148fa7ea52f5)
_jl_invoke at /cache/build/builder-amdci4-4/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]

One Process (1803) is in a wait call:

0x0000148febf0fc1f in wait4 () from /lib64/libc.so.6
#0  0x0000148febf0fc1f in wait4 () from /lib64/libc.so.6
#1  0x0000148f53324606 in ssiClose(sip_link*) () from /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libSingular.so
#2  0x0000148f533232e4 in slClose(sip_link*) () from /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libSingular.so
#3  0x0000148f532c9445 in iiExprArith1Tab(sleftv*, sleftv*, int, sValCmd1 const*, int, sConvertTypes const*) () from /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libSingular.so
#4  0x0000148f532a0977 in yyparse() () from /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libSingular.so
#5  0x0000148f532e414c in iiAllStart(procinfo*, char const*, feBufferTypes, int) () from /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libSingular.so
...

One Process (1843) is in a futex_wait (called from some openmp exit handlers?):

Using host libthread_db library "/lib64/libthread_db.so.1".
futex_wait (val=120, addr=0x14b2b2044) at /workspace/srcdir/gcc-13.2.0/libgomp/config/linux/x86/futex.h:97
97      /workspace/srcdir/gcc-13.2.0/libgomp/config/linux/x86/futex.h: No such file or directory.
#0  futex_wait (val=120, addr=0x14b2b2044) at /workspace/srcdir/gcc-13.2.0/libgomp/config/linux/x86/futex.h:97
#1  do_wait (val=120, addr=<optimized out>) at /workspace/srcdir/gcc-13.2.0/libgomp/config/linux/wait.h:67
#2  gomp_team_barrier_wait_end (bar=0x14b2b2040, state=120) at /workspace/srcdir/gcc-13.2.0/libgomp/config/linux/bar.c:112
#3  0x0000148feaa8c00e in gomp_team_barrier_wait_final (bar=bar@entry=0x14b2b2040) at /workspace/srcdir/gcc-13.2.0/libgomp/config/linux/bar.c:136
#4  0x0000148feaa8a96d in gomp_team_end () at /workspace/srcdir/gcc-13.2.0/libgomp/team.c:956
#5  0x0000148febe7bbd9 in __run_exit_handlers () from /lib64/libc.so.6
#6  0x0000148febe7bd6a in exit () from /lib64/libc.so.6
#7  0x0000148f53313bc2 in m2_end () from /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libSingular.so
#8  0x0000148f53328e10 in ssiRead1(sip_link*) () from /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libSingular.so
#9  0x0000148f5332c404 in ssiOpen(sip_link*, short, sleftv*) () from /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libSingular.so
#10 0x0000148f53323174 in slOpen(sip_link*, short, sleftv*) () from /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libSingular.so
#11 0x0000148f532c9445 in iiExprArith1Tab(sleftv*, sleftv*, int, sValCmd1 const*, int, sConvertTypes const*) () from /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libSingular.so

And finally there are further 19 processes which are waiting in a read like this:

0x0000148fec03a754 in read () from /lib64/libpthread.so.0
#0  0x0000148fec03a754 in read () from /lib64/libpthread.so.0
#1  0x0000148f53ed2f89 in s_getc(s_buff_s*) () from /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libpolys.so
#2  0x0000148f53ed31e4 in s_readint(s_buff_s*) () from /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libpolys.so
#3  0x0000148f53328dae in ssiRead1(sip_link*) () from /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libSingular.so
#4  0x0000148f53323421 in slRead(sip_link*, sleftv*) () from /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libSingular.so
#5  0x0000148f532af29e in jjREAD(sleftv*, sleftv*) () from /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libSingular.so
#6  0x0000148f532c9445 in iiExprArith1Tab(sleftv*, sleftv*, int, sValCmd1 const*, int, sConvertTypes const*) () from /home/datastore/lorenz/software/julia/depot/artifacts/178cf93ab258b57deb473bfa44f78c5994f8628c/lib/libSingular.so
...

I can keep those processes alive for a while if there is anything I should check, they don't need much memory and all are idle anyway.

@benlorenz benlorenz closed this Sep 13, 2024
@benlorenz benlorenz reopened this Sep 13, 2024
@fingolfin
Copy link
Member

The atexit related crashes are fixed in @hannes14 most recent update from last week. But there are still errors / crashes, just again different ones

@benlorenz
Copy link
Member

There are still exit handlers running

│    Can't call method "flags" on an undefined value at /Users/aaruni/Desktop/oscar-runners/runner-2/julia-depot/scratchspaces/d720cf60-89b5-51f5-aff5-213f193123e7/polymake_6360604600323823479_1.10_depstree_v2/share/polymake/perllib/Polymake/Core/BigObject.pm line 1247 during global destruction.

which I think is problematic. But at least I don't see any stuck jobs that ran into a timeout.

Many of the failing jobs don't have any logs for the Run tests step, some show an exit code of 143 (sigterm) which might indicate the workers running out of memory? (due to multiple processes?)

@fingolfin
Copy link
Member

@hannes14 so I guess that ball is in your court again ;-).

Out of curiosity, how does Singular determine how many processes to fork? Does it take the number of cores and/or RAM into account?

@hannes14
Copy link
Member

Singular uses sysconf(_SC_NPROCESSORS_ONLN) resp. sysconf(_SC_NPROCESSORS_CONF),
RAM is not considered

result = typeof(I)[]
# `unique!` does not work for lists of ideals. I don't know why, but for the moment we need the
# following workaround.
for p in filter!(!is_one, vcat([minimal_primes(j; algorithm, factor_generators=false) for j in J]...))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for p in filter!(!is_one, vcat([minimal_primes(j; algorithm, factor_generators=false) for j in J]...))
for p in filter!(!is_one, reduce(vcat, [minimal_primes(j; algorithm, factor_generators=false) for j in J]))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants