-
Notifications
You must be signed in to change notification settings - Fork 110
ABySS Users FAQ
- I am getting an error that says
Kmer::setLength(unsigned int): Assertion `length <= 64' failed
- My ABySS assembly jobs hang when I run them with high k values! (e.g. k=250)
- My ABySS MPI job with a large number of processors (over 1000) is using much more memory than expected. What's up?
- My ABySS assembly fails and I get an error that says
abyss-fixmate: error: All reads are mateless. This can happen when first and second read IDs do not match.
- Why do I count more contigs than
abyss-fac
that are larger than 500 bp?
ABySS has a compile-time parameter for the maximum value of k. As of ABySS 1.3.7, the maximum k value is 64 by default. In order to do assemblies with higher k values you must compile ABySS from source and use the --enable-maxk
option during the configure
step, i.e.
$ ./configure --enable-maxk=128
$ make
$ make install
The value of --enable-maxk
should be a multiple of 4. ABySS needs to know the maximum value of k so that it can minimize the amount of memory it uses to represent the de Bruijn graph. If memory usage is not a concern, you may set --enable-maxk
as high as you like.
Users sometimes encounter a problem where they have compiled ABySS with the appropriate --enable-maxk value and yet they still see the length <= 64
error when they try to run assembly jobs. This is usually because the PATH
environment variable has not been set correctly in their cluster job script. abyss-pe
is a Makefile that invokes a number of different ABySS binaries (e.g. 'ABYSS-P', 'abyss-scaffold'), and it will use whichever binaries it finds first on PATH
. For example, consider the following script:
#!/bin/sh
PATH=/home/joe/bin:$PATH
/home/joe/abyss-1.3.7/maxk_96/bin/abyss-pe k=96 name=assembly in='read1.fastq read2.fastq'
The user might expect the binaries they have compiled and installed under /home/joe/abyss-1.3.7/maxk_96/bin
to be used for the assembly job. However, if there is another set of ABySS binaries that have been compiled without the --enable-maxk=96
option and installed to /home/joe/bin, those will be used instead.
In order to debug this sort of problem, it helps to put a which
command in the job script and then look at the log output of the cluster job to check where it getting its ABySS binaries from, e.g.
#!/bin/sh
PATH=/home/joe/bin:$PATH
which ABYSS-P
/home/joe/abyss-1.3.7/maxk_96/bin/abyss-pe k=96 name=assembly in='read1.fastq read2.fastq'
The way that OpenMPI handles messages changes when the message sizes exceeded a certain size called the eager send limit. In ABySS, message size depends directly on k, and when the eager send limit is exceeded, assembly jobs will deadlock.
The best workaround for this problem is to explicitly set the eager send limit. This can be done by setting an environment variable called mpirun
in your cluster job script.
Example:
#!/bin/sh
PATH=/home/joe/abyss-1.3.7/maxk_96/bin:$PATH
export mpirun='mpirun --mca btl_sm_eager_limit 16000 --mca btl_openib_eager_limit 16000'
abyss-pe k=96 name=assembly in='read1.fastq read2.fastq'
The values for the btl_sm_eager_limit
and btl_openib_eager_limit
are in bytes, and it is usually fine to set them both to the same value. The formula for determining the appropriate value is:
eager_limit >= (max_k/4 + 32) * 100
3. My ABySS MPI job with a large number of processors (over 1000) is using much more memory than expected. What's up?
The default parameters of Open MPI allocate a large amount of memory to communication buffers. The following options will reduce the amount of memory allocated to buffers.
mpirun --mca btl_openib_receive_queues X,128,256,192,128:X,4096,256,128,32:X,12288,256,128,32:X,65536,256,128,3
4. My ABySS assembly fails and I get an error that says abyss-fixmate: error: All reads are mateless. This can happen when first and second read IDs do not match.
During the contig and scaffold stages of an assembly, ABySS aligns the paired end reads to the sequences that have been assembled so far (e.g. unitigs), so that it can link them into larger sequences (e.g. contigs). In order to be able to do this, ABySS needs to be able to correctly match up reads that belong to the same pair. If you are seeing this error, please check that:
- The input read files are sorted by read name.
- For reads that belong to the same pair, they either have identical names or have the same prefix followed by
/1
and/2
.
abyss-fac
does not count Ns toward the 500 bp, and samtools faidx
counts all symbols. See the ABySS stats file format.