Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resource requested too high on Rackham #138

Open
gbdias opened this issue Nov 8, 2024 · 9 comments
Open

resource requested too high on Rackham #138

gbdias opened this issue Nov 8, 2024 · 9 comments

Comments

@gbdias
Copy link
Contributor

gbdias commented Nov 8, 2024

In the feature_hic_scaffolding branch many processes are requesting >20 cpus when using the Uppmax profile. This causes an error and nextflow exits the pipeline.

This happened with FASTK_FASTK (requested 36 cpus), I changed it in the config file and then the same happened with HIFIASM (requested 46 cpus). I am using the uppmax,singularity profiles.

[36/4c3ea7] Submitted process > BUILD_MERYL_HIFI_DATABASE:MERYL_COUNT (Tetrastemma_melanocephalum)
ERROR ~ Error executing process > 'ASSEMBLE:ASSEMBLE_HIFI:HIFIASM (hifiasm-raw-default)'

Caused by:
  Failed to submit process to grid scheduler for execution

Command executed:

  sbatch .command.run

Command exit status:
  1

Command output:
  sbatch: error: CPU count per node can not be satisfied
  sbatch: error: Batch job submission failed: Requested node configuration is not available

Work dir:
  /crex/proj/snic2021-6-194/nobackup/BGE_Tetrastemma_melanocephalum_assembly/analyses/01_ebp_pipeline/nxf-work/de/48c7588ca07cb3a07f85338f96dc7e

The uppmax.config file has

resourceLimits = [
        memory: 125.GB,
        cpus: 16,
        time: 10.d
    ]

but it doesn't seem that this is being honoured.

@mahesh-panchal
Copy link
Collaborator

What's your Nextflow version?

@mahesh-panchal
Copy link
Collaborator

Also note, that you only need -profile uppmax.

Are you using the project template?

@gbdias
Copy link
Contributor Author

gbdias commented Nov 8, 2024

Nextflow version 23.10.1
Yes using the project template.

Could profile singularity be causing this?

@mahesh-panchal
Copy link
Collaborator

No, the profile shouldn't cause this. It could be the Nextflow version since I'm not sure when resourceLimits was introduced.

Update the conda environment to use the latest nextflow.

Are you using the latest version of the project template? You should then only need to run

BRANCH=feature_hic_scaffolding ./run_nextflow.sh

https://github.com/NBISweden/assembly-project-template/blob/main/analyses/01_ebp-assembly-workflow/run_nextflow.sh

@gbdias
Copy link
Contributor Author

gbdias commented Nov 8, 2024

updated nextflow to 24.10.0, updated run_nextflow.sh to the version you pointed. Now an error about node not being available. Maybe there's no 1TB nodes on Rackham?

BRANCH=feature_hic_scaffolding ./run_nextflow.sh
N E X T F L O W  ~  version 24.10.0
WARN: It appears you have never run this project before -- Option `-resume` is ignored
Launching `/home/guibo205/git/NBIS/Earth-Biogenome-Project-pilot/main.nf` [gloomy_mccarthy] DSL2 - revision: 13e9362f94

    Running NBIS Earth Biogenome Project Assembly workflow.

WARN: There's no process matching config selector: QUARTO -- Did you mean: QUAST?
WARN: There's no process matching config selector: MULTIQC
[skipping] Stored process > DECONTAMINATE:FCSGX_FETCHDB (1)
Staging foreign file: ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
[skipping] Stored process > PREPARE_INPUT:UNTAR_TAXONOMY (1)
[a5/e486bb] Submitted process > PREPARE_INPUT:TAXONKIT_NAME2LINEAGE (Tetrastemma melanocephalum)
[ba/256c82] Submitted process > PREPARE_INPUT:GOAT_TAXONSEARCH (Tetrastemma_melanocephalum)
[88/38b3f2] Submitted process > ASSEMBLY_REPORT:TOL_SEARCH (Taxid: 307678)
ERROR ~ Error executing process > 'BUILD_MERYL_HIC_DATABASE:MERYL_COUNT (Tetrastemma_melanocephalum)'

Caused by:
  Failed to submit process to grid scheduler for execution


Command executed:

  sbatch .command.run

Command exit status:
  1

Command output:
  sbatch: error: Batch job submission failed: Requested node configuration is not available

Work dir:
  /crex/proj/snic2021-6-194/nobackup/BGE_Tetrastemma_melanocephalum_assembly/analyses/01_ebp_pipeline/nxf-work/50/d1884374d80638ef5b899aeb2ba856

Container:
  /proj/snic2021-6-194/nobackup/ebp-singularity-cache/depot.galaxyproject.org-singularity-meryl-1.4.1--h4ac6f70_0.img

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details

            Thank you for using the NBIS Earth Biogenome Project Assembly workflow.
            The workflow completed unsuccessfully.

            Please read over the error message. If you are unable to solve it, please
            post an issue at https://github.com/NBISweden/Earth-Biogenome-Project-pilot/issues
            where we will do our best to help.

WARN: Killing running tasks (1)

@mahesh-panchal
Copy link
Collaborator

There are 1TB nodes, but Meryl should not be requesting that. What's in your .command.run header?

@gbdias
Copy link
Contributor Author

gbdias commented Nov 8, 2024

It seems that mem gets defined twice also

#!/bin/bash
#SBATCH -J nf-BUILD_MERYL_HIC_DATABASE_MERYL_COUNT_(Tetrastemma_melanocephalum)
#SBATCH -o /crex/proj/snic2021-6-194/BGE_Tetrastemma_melanocephalum_assembly/nobackup/01_ebp_pipeline/nxf-work/a0/050f796b5960abc768db059744e3ea/.command.log
#SBATCH --no-requeue
#SBATCH --signal B:USR2@30
#SBATCH -c 16
#SBATCH -t 48:00:00
#SBATCH --mem 128000M
#SBATCH -A naiss2024-5-420 -p core -C mem1TB
NXF_CHDIR=/crex/proj/snic2021-6-194/BGE_Tetrastemma_melanocephalum_assembly/nobackup/01_ebp_pipeline/nxf-work/a0/050f796b5960abc768db059744e3ea

@gbdias
Copy link
Contributor Author

gbdias commented Nov 8, 2024

Ok, the problem was a nextflow.config file I had in the same directory. I thought it would get ignored since I'm not passing it as a parameter in the nextflow call inside run_nextflow.sh, but apparently it gets picked up regardless?

Deleting it seems to solve it.

@mahesh-panchal
Copy link
Collaborator

Yes, that's how you can change settings per project, when running from a remote.

https://www.nextflow.io/docs/latest/config.html#configuration-file ( i.e. number 4. takes the nextflow.config when you run. Number 3 is the nextflow.config in the repository).

I still need to check what the resource limits are in Uppmax, but be aware if that if Martin's change limited it to a standard node, you'll need to increase the resourceLimits for a specific process as well as generally requesting more and adding the appropriate flag in the clusterOptions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants