-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BOWTIE2 task failing when scratch set as /tmp #465
Comments
Get the same issue on Ubuntu 22.04.01 Setting 'export NXF_DEBUG=3' I can see that the process is trying to run nxf_mktemp to /dev/shm but then looking for the files in /tmp so this is causing an issue. Is there an option in nextflow to disable using /dev/shm completely ? Have tried removing the /dev/shm mount point but nextflow still trying to use it ? .nextflow.log
|
I'm trying this now |
I can confirm the bug:
I wonder if it's something to do with the container... Googling came across this: nf-core/eager#894 (comment) Maybe you can see if there is a similar singularity configuration that was recorded as being teh problem? |
I tried the "mount tmp = no" option and sadly didn't get that far, but should revisit under a virtual machine that I can test on a bit deeper. If I rename the fastp container to the bowtie container then I don't get the same error, just the expected "bowtie command not found" which would seem to think a /tmp container is possibly working. Still can't see the folder in /tmp or /dev/shm but with the test data maybe it's failing and cleaning too quickly? |
If I go the opposite direction and rename the bowtie2 image to fastp I can recreate the "destination doesn't exist error" so would agree that it's the container that's the issue (stupidly didn't think of swapping container names before to test). The other containers work correctly with /tmp so will have to check what's different with the container. |
To be honest we need to replace the local bowtie2 modules with the official nf-core module versions. We are getting there slowly, but MAG is a very complicated pipeline so ti's it's going slowly unfortunately... |
That does sound a bit of a nightmare with the number of modules required, let us know if you need someone to help with testing. Just to confirm it is the singularity images that are the issue, looking inside the container it seems to be this symbolic link that causes the issue on /tmp To replicate:
I found that by also binding the /tmp folder to /run (this is where the link points) I could get the container to run correctly. I tried setting this in runOptions in nextflow config but it adds the extra Bind option to the end of the command and /tmp binding fails as earlier in the line. Managed to adapt the nf-core/eager/issues/894 comment to turn off autoMounts and manually mount /tmp and $PWD Here is my example test config file in case anyone else has a similar issue (obviously just a test example)
|
I did have a look at swapping in the newer singularity images from official nf-core modules but some of these still have the similar issue, for example these 2 images:
A lot of the others are working, seems to be images from 2020/2021 have the /run -> /tmp symbolic link, images from late 2021 onwards no longer follow that trend. |
I have also just encountered this exact same bug running on a new cluster, where I have large amounts of scratch space I'd like to make use of. For me, the pipeline was failing at FASTQC_RAW, so I haven't had a chance to see which other containers are affected.
That's interesting - I am also having some trouble with an older container from that period: #462. I've rebuilt it (need to test it still), but this might warrant rebuilds of a good chuck of biocontainers from that period? |
To ocme back to this: is the 'best solution' to just update all the container versions? |
I think it's definitely worth a try, but it will probably depend on what the base image for the container is (busybox/ubuntu? I don't know which of these are affected), and if tools are old/haven't received a recent update, the container images will probably have to be rebuilt before the fix works by updating the build number in the bioconda recipe. |
OK, then maybe we can schedule in 2.6 a mass update of all containers? (And if required request the custom ones...) |
Sounds good. Is there a quick way to dump a list of all the container modules? Happy to take some time and make a list of all the ones that are 2021 or older |
Assuming a cutoff date for replacement of 01/01/2022:
|
Nice, I guess the next thing to do is to identify which local modules are pretty close to official nf-core ones, and also switch them out (and presumably wil have the latest container too) |
Before I turned off automounts I started swapping containers out for alternative versions, there are some official nf-core versions that have the same issue (fastqc and metabat as an example), but they do have alternative versions on https://depot.galaxyproject.org/singularity which worked correctly. I got stuck when I hit the mulled-v2 images as some of these didn't have alternative versions I could easily swap in. |
I think all that's needed is a PR to the multi-package-containers repo with an update to the relevant hash.tsv line, increasing the build number (last entry on the line). Though this might be base image dependent. |
Description of the bug
We are trying to redirect the scratch to /tmp on the BOWTIE2 task but it fails with:
This process works correctly on other tasks such as FASTP with same config (when /tmp set as scratch).
Looking in the /tmp directory the folder is not being created as part of this process, the issue seems to be before singularity is even called.
The nxf_mktemp() function looks correct when compared to a working task (same as FASTP).
If I set the scratch folder to something other than /tmp then it works correctly for BOWTIE step.
The function is working correctly for fastqc/fastp/Busco (and the other processes) using /tmp for scratch.
Can anyone try running the test profile with singularity and using /tmp as scratch to confirm ?
Command used and terminal output
nextflow run nf-core/mag -profile test,singularity -c local.config --outdir output
Relevant files
config file example:
.nextflow.log
System information
Nextflow : Version: 22.10.3 build 5834
Hardware : tested on HPC system (EPYC CPU) and local VM (AMD processor, 10 CPU, 15GB RAM)
OS : RockyLinux 8.7
nf-core-mag-2.3.2
singularity-ce version 3.9.8
Executor : local
$TMPDIR is set to /tmp
The text was updated successfully, but these errors were encountered: