-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
semop lock error during 3D classification #1177
Comments
Did you read and try suggestions in #738? |
If this is the same issue as described in #738 this is caused by the OS destroying the semaphores. Are you logging out of the machine whilst RELION is running? And on the cluster did you log into and then log out of the node the job was running on?
@biochem-fan this was not an issue in RELION-4 because this code was omitted due the |
thanks for the response. #738 suggests adding coarse search option to yes? I had that turned off, I set it to on and am re-running the job. I'll update my post when I know whether it worked. Regarding the logging in/out: yes I am. Well sortof. I have a GPU node reserved which I log into using turboVNC, so it's a remote access node which runs constantly. The session is running constantly and I login from home/work to check on my job, however to regain access to the node each time I must SSH directly into the nod and reset my password. When it was run on the cluster I don't recall whether I logged into the node but I doubt it. I'll try running it again that way and ensure I do not log in to that node. I ran ipcs -s, results below. I am runing on two GPUs. Should I run this again if/when the job fails? ------ Semaphore Arrays -------- |
You should try running this when you "login from home/work to check on my job". In the case I saw in #738 what I did was:
The workaround was to use |
If you have admin rights you could also test if adding |
update: it made it to the second iteration! So adding in coarse search option made a difference. Nice. |
update. this just happened again. Same dataset, this time during ab initio. It got to iteration 113 then crashed. See below. I definitely did NOT log into the node this time as it was processing.
|
this is happening now for all of my refine jobs. At least I can get to about iteration 10 before it crashes. Any idea what is causing this? |
Running this command interactively on a GPU node with two 2080Ti cards. This same error occurs when submiting to slurm cluster on our HPC.
running Relion 5 beta 3 commit 6331fe
command:
mpirun --np 5 --oversubscribe relion_refine_mpi --o Class3D/job055/run --ios Extract/job025/optimisation_set.star --gpu "" --ref InitialModel/box40_bin8_invert.mrc --firstiter_cc --trust_ref_size --ini_high 60 --dont_combine_weights_via_disc --pool 3 --pad 2 --ctf --iter 25 --tau2_fudge 1 --particle_diameter 440 --K 1 --flatten_solvent --zero_mask --oversampling 1 --healpix_order 2 --offset_range 5 --offset_step 2 --sym C1 --norm --scale --j 1 --pipeline_control Class3D/job055/
error:
This is a template for reporting bugs. Please fill in as much information as you can.
Describe your problem
Please write a clear description of what the problem is.
Data processing questions should be posted to the CCPEM mailing list, not here.
DO NOT cross post a same question to multiple issues and/or many mailing lists (CCPEM, 3DEM, etc).
Environment:
Dataset:
Job options:
note.txt
in the job directory):Error message:
Please cite the full error message as the example below.
The text was updated successfully, but these errors were encountered: