Skip to content
Shaun Jackman edited this page Jun 10, 2014 · 18 revisions
  1. I am getting an error that says Kmer::setLength(unsigned int): Assertion `length <= 64' failed

    ABySS has a compile-time parameter for the maximum value of k. As of ABySS 1.3.7, the maximum k value is 64 by default. In order to do assemblies with higher k values you must compile ABySS from source and use the --enable-maxk option during the configure step, i.e.

$ ./configure --enable-maxk=128 $ make $ make install

   
   The value of ``--enable-maxk`` should be a multiple of 4.  ABySS needs to know the maximum value of k so that it can minimize the amount of memory it uses to represent the de Bruijn graph.  If memory usage is not a concern, you may set ``--enable-maxk`` as high as you like.

   Users sometimes encounter a problem where they have compiled ABySS with the appropriate --enable-maxk value and yet they still see the ``length <= 64`` error when they try to run assembly jobs.  This is usually because the ``PATH`` environment variable has not been set correctly in their cluster job script. ``abyss-pe`` is a Makefile that invokes a number of different ABySS binaries (e.g. 'ABYSS-P', 'abyss-scaffold'), and it will use whichever binaries it finds first on ``PATH``.  For example, consider the following script:

#!/bin/sh PATH=/home/joe/bin:$PATH /home/joe/abyss-1.3.7/maxk_96/bin/abyss-pe k=96 name=assembly in='read1.fastq read2.fastq'


    The user might expect the binaries they have compiled and installed under ``/home/joe/abyss-1.3.7/maxk_96/bin`` to be used for the assembly job.  However, if there is another set of ABySS binaries that have been compiled without the ``--enable-maxk=96`` option and installed to /home/joe/bin, those will be used instead.

    In order to debug this sort of problem, it helps to put a ``which`` command in the job script and then look at the log output of the cluster job to check where it getting its ABySS binaries from, e.g.

#!/bin/sh PATH=/home/joe/bin:$PATH which ABYSS-P /home/joe/abyss-1.3.7/maxk_96/bin/abyss-pe k=96 name=assembly in='read1.fastq read2.fastq'


2. *My ABySS assembly jobs hang when I run them with high k values! (e.g. k=250)*

   The way that OpenMPI handles messages changes when the message sizes exceeded a certain size called the "eager send limit".  In ABySS, message size depends directly on k, and when the eager send limit is exceeded, assembly jobs will deadlock.

   The best workaround for this problem is to explicitly set the eager send limit. This can be done by setting an environment variable called ``mpirun`` in your cluster job script.

   Example:

#!/bin/sh PATH=/home/joe/abyss-1.3.7/maxk_96/bin:$PATH export mpirun="mpirun --mca btl_sm_eager_limit 16000 --mca btl_openib_eager_limit 16000" abyss-pe k=96 name=assembly in='read1.fastq read2.fastq'


   The values for the ``btl_sm_eager_limit`` and ``btl_openib_eager_limit`` are in bytes, and it is usually fine to set them both to the same value.  The formula for determining the appropriate value is: 

eager_limit >= (max_k/4 + 32) * 100


3. My ABySS MPI job with a large number of processors (over 1000) is using much more memory than expected. What's up?

   The default parameters of Open MPI allocate a large amount of memory to communication buffers. The following options will reduce the amount of memory allocated to buffers.

mpirun --mca btl_openib_receive_queues X,128,256,192,128:X,4096,256,128,32:X,12288,256,128,32:X,65536,256,128,3

Clone this wiki locally