Issue on low performance running of RepeatMasker on HPC cluster #276

manighanipoor · 2024-09-10T00:01:44Z

I am running Repeatmasker on a snake genome using a denovo TE library (created by RM2) on a HPC cluster using 20 CPUs. But the speed is very low and I encounter node failure. I contacted HPC support and they believe it happens because of system overhead due to high number of batches. I just realized RepeatMasker performs better if I increase the "-fra" option to 1000000 as it drops number of batches. Do you think it would affect TE identification sensitivity or accuracy?

Cheers,
Mani

rmhubley · 2024-09-11T19:35:06Z

We use clusters at UCSC, Texas Tech and Univ of Arizona and I haven't seen an issue with batch overhead, but perhaps your cluster has some restrictive quotas that are interfering with the runs. With any cluster I would recommend making sure you are running on a local disk (local to the machine) for speed, and breaking up your sequence into batches of 50MB (or higher ) and run them independently through RepeatMasker on different nodes ( leaving the default -frag parameter ). We have a Nextflow script that does this for you on Slurm-based clusters ( https://github.com/Dfam-consortium/RepeatMasker_Nextflow ). If you change the -frag parameter, you increase the size in which the GC background value is determined. This is used to select the appropriate scoring matrix used during alignment of consensus sequences. If you increase this too much you will probably lose some lower-scoring annotations in your output.

manighanipoor · 2024-09-15T02:09:24Z

Thanks,

How can we configure the RepeatMasker_Nextflow script to run batches on different nodes as it doesn't seem to be preconfigured for that? Shou I ask HPC support to do that?

Cheers,
Mani

rmhubley · 2024-09-17T17:44:59Z

That's exactly what it's meant to do. We regularly run it on 100's of nodes. There is an option "--cluster" that currently accepts either "local" or one of several cluster names that we use. You will need to edit the RepeatMasker_Nextflow.nf file and configure it for your needs. For instance, look in the script for where quanah is defined:

///////                
/////// CUSTOMIZE CLUSTER ENVIRONMENT HERE BY ADDING YOUR OWN
/////// CLUSTER NAMES OR USE 'local' TO RUN ON THE CURRENT 
/////// MACHINE.
///////                                              
// No cluster...just local execution
if ( params.cluster == "local" ) {
...
}else if ( params.cluster == "quanah" || params.cluster == "nocona" ){
  thisExecutor = "slurm"
  thisQueue = params.cluster                                                                   
  thisOptions = "--tasks=1 -N 1 --cpus-per-task=${proc} --exclude=cpu-23-1"
  thisAdjOptions = "--tasks=1 -N 1 --cpus-per-task=2 --exclude=cpu-23-1"       
  ucscToolsDir="/lustre/work/daray/software/ucscTools"             
  repeatMaskerDir="/lustre/work/daray/software/RepeatMasker-4.1.2-p1"    
  thisScratch = false                                                
}

You would modify this block to accept the name of your cluster and set it's parameters here. Nextflow supports quite a few cluster job managers. The above example uses SLURM. Once you have made your changes, you simply use the "--cluster myclustername" option when you run.

manighanipoor · 2024-09-18T00:43:46Z

Thnaks for your help.

manighanipoor · 2024-09-18T00:48:43Z

You mentioned:
With any cluster I would recommend making sure you are running on a local disk (local to the machine) for speed

Just wondering how can we run it locally on the cluster?

rmhubley · 2024-10-03T22:12:01Z

This depends on your cluster's architecture. Most often the individual compute nodes have hard drive (or SSD) attached. The administrator may have decided not to make that accessible to jobs running on that node. On many clusters, the admins have setup a scratch area on those local drives, where files can be copied and processes create temporary files more efficiently than over NFS. If your cluster supports this, the Nextflow script can take advantage of it.

manighanipoor · 2024-10-04T08:28:27Z

Thanks for the comment

manighanipoor added the question label Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue on low performance running of RepeatMasker on HPC cluster #276

Issue on low performance running of RepeatMasker on HPC cluster #276

manighanipoor commented Sep 10, 2024

rmhubley commented Sep 11, 2024

manighanipoor commented Sep 15, 2024

rmhubley commented Sep 17, 2024

manighanipoor commented Sep 18, 2024

manighanipoor commented Sep 18, 2024

rmhubley commented Oct 3, 2024

manighanipoor commented Oct 4, 2024

Issue on low performance running of RepeatMasker on HPC cluster #276

Issue on low performance running of RepeatMasker on HPC cluster #276

Comments

manighanipoor commented Sep 10, 2024

rmhubley commented Sep 11, 2024

manighanipoor commented Sep 15, 2024

rmhubley commented Sep 17, 2024

manighanipoor commented Sep 18, 2024

manighanipoor commented Sep 18, 2024

rmhubley commented Oct 3, 2024

manighanipoor commented Oct 4, 2024