-
Notifications
You must be signed in to change notification settings - Fork 110
ABySS Users FAQ
-
I am getting an error that says
Kmer::setLength(unsigned int): Assertion `length <= 64' failed
ABySS has a compile-time parameter for the maximum value of k. As of ABySS 1.3.7, the maximum k value is 64 by default. In order to do assemblies with higher k values you must compile ABySS from source and use the
--enable-maxk
option during theconfigure
step, i.e.
$ ./configure --enable-maxk=128 $ make $ make install
The value of ``--enable-maxk`` should be a multiple of 4. ABySS needs to know the maximum value of k so that it can minimize the amount of memory it uses to represent the de Bruijn graph. If memory usage is not a concern, you may set ``--enable-maxk`` as high as you like.
Users sometimes encounter a problem where they have compiled ABySS with the appropriate --enable-maxk value and yet they still see the ``length <= 64`` error when they try to run assembly jobs. This is usually because the ``PATH`` environment variable has not been set correctly in their cluster job script. ``abyss-pe`` is a Makefile that invokes a number of different ABySS binaries (e.g. 'ABYSS-P', 'abyss-scaffold'), and it will use whichever binaries it finds first on ``PATH``. For example, consider the following script:
#!/bin/sh PATH=/home/joe/bin:$PATH /home/joe/abyss-1.3.7/maxk_96/bin/abyss-pe k=96 name=assembly in='read1.fastq read2.fastq'
The user might expect the binaries they have compiled and installed under ``/home/joe/abyss-1.3.7/maxk_96/bin`` to be used for the assembly job. However, if there is another set of ABySS binaries that have been compiled without the ``--enable-maxk=96`` option and installed to /home/joe/bin, those will be used instead.
In order to debug this sort of problem, it helps to put a ``which`` command in the job script and then look at the log output of the cluster job to check where it getting its ABySS binaries from, e.g.
#!/bin/sh PATH=/home/joe/bin:$PATH which ABYSS-P /home/joe/abyss-1.3.7/maxk_96/bin/abyss-pe k=96 name=assembly in='read1.fastq read2.fastq'
2. *My ABySS assembly jobs hang when I run them with high k values! (e.g. k=250)*
The way that OpenMPI handles messages changes when the message sizes exceeded a certain size called the "eager send limit". In ABySS, message size depends directly on k, and when the eager send limit is exceeded, assembly jobs will deadlock.
The best workaround for this problem is to explicitly set the eager send limit. This can be done by setting an environment variable called ``mpirun`` in your cluster job script.
Example:
#!/bin/sh PATH=/home/joe/abyss-1.3.7/maxk_96/bin:$PATH export mpirun="mpirun --mca btl_sm_eager_limit 16000 --mca btl_openib_eager_limit 16000" abyss-pe k=96 name=assembly in='read1.fastq read2.fastq'
The values for the ``btl_sm_eager_limit`` and ``btl_openib_eager_limit`` are in bytes, and it is usually fine to set them both to the same value. The formula for determining the appropriate value is:
eager_limit >= (max_k/4 + 32) * 100
3. My ABySS MPI job with a large number of processors (over 1000) is using much more memory than expected. What's up?
The default parameters of Open MPI allocate a large amount of memory to communication buffers. The following options will reduce the amount of memory allocated to buffers.
mpirun --mca btl_openib_receive_queues X,128,256,192,128:X,4096,256,128,32:X,12288,256,128,32:X,65536,256,128,3
4. My ABySS assembly fails and I get an error that says "abyss-fixmate: error: All reads are mateless. This can happen when first and second read IDs do not match."
During the contig and scaffold stages of an assembly, ABySS aligns the paired end reads to the sequences that have been assembled so far (e.g. unitigs), so that it can link them into larger sequences (e.g. contigs). In order to be able to do this, ABySS needs to be able to correctly match up reads that belong to the same pair.
If you are seeing this error, please check that:
1. The input read files are sorted by read name.
2. For reads that belong to the same pair, they either have identical names or have the same prefix followed by "/1" and "/2".
5. Why do I count more contigs than `abyss-fac` that are larger than 500 bp?
`abyss-fac` does not count Ns toward the 500 bp, and `samtools faidx` counts all symbols.