resource requested too high on Rackham #138

gbdias · 2024-11-08T12:10:17Z

In the feature_hic_scaffolding branch many processes are requesting >20 cpus when using the Uppmax profile. This causes an error and nextflow exits the pipeline.

This happened with FASTK_FASTK (requested 36 cpus), I changed it in the config file and then the same happened with HIFIASM (requested 46 cpus). I am using the uppmax,singularity profiles.

[36/4c3ea7] Submitted process > BUILD_MERYL_HIFI_DATABASE:MERYL_COUNT (Tetrastemma_melanocephalum)
ERROR ~ Error executing process > 'ASSEMBLE:ASSEMBLE_HIFI:HIFIASM (hifiasm-raw-default)'

Caused by:
  Failed to submit process to grid scheduler for execution

Command executed:

  sbatch .command.run

Command exit status:
  1

Command output:
  sbatch: error: CPU count per node can not be satisfied
  sbatch: error: Batch job submission failed: Requested node configuration is not available

Work dir:
  /crex/proj/snic2021-6-194/nobackup/BGE_Tetrastemma_melanocephalum_assembly/analyses/01_ebp_pipeline/nxf-work/de/48c7588ca07cb3a07f85338f96dc7e

The uppmax.config file has

resourceLimits = [
        memory: 125.GB,
        cpus: 16,
        time: 10.d
    ]

but it doesn't seem that this is being honoured.

The text was updated successfully, but these errors were encountered:

mahesh-panchal · 2024-11-08T12:20:55Z

What's your Nextflow version?

mahesh-panchal · 2024-11-08T12:22:45Z

Also note, that you only need -profile uppmax.

Are you using the project template?

gbdias · 2024-11-08T12:23:58Z

Nextflow version 23.10.1
Yes using the project template.

Could profile singularity be causing this?

mahesh-panchal · 2024-11-08T12:27:17Z

No, the profile shouldn't cause this. It could be the Nextflow version since I'm not sure when resourceLimits was introduced.

Update the conda environment to use the latest nextflow.

Are you using the latest version of the project template? You should then only need to run

BRANCH=feature_hic_scaffolding ./run_nextflow.sh

https://github.com/NBISweden/assembly-project-template/blob/main/analyses/01_ebp-assembly-workflow/run_nextflow.sh

gbdias · 2024-11-08T12:50:11Z

updated nextflow to 24.10.0, updated run_nextflow.sh to the version you pointed. Now an error about node not being available. Maybe there's no 1TB nodes on Rackham?

BRANCH=feature_hic_scaffolding ./run_nextflow.sh
N E X T F L O W  ~  version 24.10.0
WARN: It appears you have never run this project before -- Option `-resume` is ignored
Launching `/home/guibo205/git/NBIS/Earth-Biogenome-Project-pilot/main.nf` [gloomy_mccarthy] DSL2 - revision: 13e9362f94

    Running NBIS Earth Biogenome Project Assembly workflow.

WARN: There's no process matching config selector: QUARTO -- Did you mean: QUAST?
WARN: There's no process matching config selector: MULTIQC
[skipping] Stored process > DECONTAMINATE:FCSGX_FETCHDB (1)
Staging foreign file: ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
[skipping] Stored process > PREPARE_INPUT:UNTAR_TAXONOMY (1)
[a5/e486bb] Submitted process > PREPARE_INPUT:TAXONKIT_NAME2LINEAGE (Tetrastemma melanocephalum)
[ba/256c82] Submitted process > PREPARE_INPUT:GOAT_TAXONSEARCH (Tetrastemma_melanocephalum)
[88/38b3f2] Submitted process > ASSEMBLY_REPORT:TOL_SEARCH (Taxid: 307678)
ERROR ~ Error executing process > 'BUILD_MERYL_HIC_DATABASE:MERYL_COUNT (Tetrastemma_melanocephalum)'

Caused by:
  Failed to submit process to grid scheduler for execution


Command executed:

  sbatch .command.run

Command exit status:
  1

Command output:
  sbatch: error: Batch job submission failed: Requested node configuration is not available

Work dir:
  /crex/proj/snic2021-6-194/nobackup/BGE_Tetrastemma_melanocephalum_assembly/analyses/01_ebp_pipeline/nxf-work/50/d1884374d80638ef5b899aeb2ba856

Container:
  /proj/snic2021-6-194/nobackup/ebp-singularity-cache/depot.galaxyproject.org-singularity-meryl-1.4.1--h4ac6f70_0.img

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details

            Thank you for using the NBIS Earth Biogenome Project Assembly workflow.
            The workflow completed unsuccessfully.

            Please read over the error message. If you are unable to solve it, please
            post an issue at https://github.com/NBISweden/Earth-Biogenome-Project-pilot/issues
            where we will do our best to help.

WARN: Killing running tasks (1)

mahesh-panchal · 2024-11-08T13:02:53Z

There are 1TB nodes, but Meryl should not be requesting that. What's in your .command.run header?

gbdias · 2024-11-08T13:17:23Z

It seems that mem gets defined twice also

#!/bin/bash
#SBATCH -J nf-BUILD_MERYL_HIC_DATABASE_MERYL_COUNT_(Tetrastemma_melanocephalum)
#SBATCH -o /crex/proj/snic2021-6-194/BGE_Tetrastemma_melanocephalum_assembly/nobackup/01_ebp_pipeline/nxf-work/a0/050f796b5960abc768db059744e3ea/.command.log
#SBATCH --no-requeue
#SBATCH --signal B:USR2@30
#SBATCH -c 16
#SBATCH -t 48:00:00
#SBATCH --mem 128000M
#SBATCH -A naiss2024-5-420 -p core -C mem1TB
NXF_CHDIR=/crex/proj/snic2021-6-194/BGE_Tetrastemma_melanocephalum_assembly/nobackup/01_ebp_pipeline/nxf-work/a0/050f796b5960abc768db059744e3ea

gbdias · 2024-11-08T13:24:55Z

Ok, the problem was a nextflow.config file I had in the same directory. I thought it would get ignored since I'm not passing it as a parameter in the nextflow call inside run_nextflow.sh, but apparently it gets picked up regardless?

Deleting it seems to solve it.

mahesh-panchal · 2024-11-08T14:17:22Z

Yes, that's how you can change settings per project, when running from a remote.

https://www.nextflow.io/docs/latest/config.html#configuration-file ( i.e. number 4. takes the nextflow.config when you run. Number 3 is the nextflow.config in the repository).

I still need to check what the resource limits are in Uppmax, but be aware if that if Martin's change limited it to a standard node, you'll need to increase the resourceLimits for a specific process as well as generally requesting more and adding the appropriate flag in the clusterOptions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resource requested too high on Rackham #138

resource requested too high on Rackham #138

gbdias commented Nov 8, 2024

mahesh-panchal commented Nov 8, 2024

mahesh-panchal commented Nov 8, 2024

gbdias commented Nov 8, 2024

mahesh-panchal commented Nov 8, 2024

gbdias commented Nov 8, 2024

mahesh-panchal commented Nov 8, 2024

gbdias commented Nov 8, 2024

gbdias commented Nov 8, 2024

mahesh-panchal commented Nov 8, 2024

resource requested too high on Rackham #138

resource requested too high on Rackham #138

Comments

gbdias commented Nov 8, 2024

mahesh-panchal commented Nov 8, 2024

mahesh-panchal commented Nov 8, 2024

gbdias commented Nov 8, 2024

mahesh-panchal commented Nov 8, 2024

gbdias commented Nov 8, 2024

mahesh-panchal commented Nov 8, 2024

gbdias commented Nov 8, 2024

gbdias commented Nov 8, 2024

mahesh-panchal commented Nov 8, 2024