LaRA 2 is an improved version of LaRA, a tool for sequence-structure alignment of RNA sequences. It...
- computes all pairwise sequence-structure alignments of the input sequences
- produces files that can be processed with T-Coffee or MAFFT to compute a multiple sequence-structure alignment
- employs methods from combinatorial optimization to compute feasible solutions for an integer linear program
- can read many input formats for RNA structure, e.g. Dot-bracket notation, Stockholm, Vienna format
- is implemented to use multiple threads on your machine and runs therefore very fast
- has a vectorized alignment kernel, which computes the results even faster
- is based on the SeqAn library, currently version 2
- is well-documented and easy to use
Clone the repository and use the --recurse-submodules option for downloading SeqAn and Lemon as submodules.
% git clone --recurse-submodules https://github.com/seqan/lara.git
Alternatively, you can download a zip package of the repository via the green button at the top of the github page. If you do so, please unzip the file into a new subdirectory named lara and download the dependencies separately.
- platforms: Linux, MacOS
- compiler: gcc ≥ 5 or clang ≥ 3.8 or icc ≥ 17
- cmake ≥ 3.8
LaRA is dependent on the following libraries:
To process the output for multiple alignments (3 or more sequences), you need either
Optionally, LaRA can predict the RNA structures for you if you provide
Note: Users reported problems with installing ViennaRNA, so we provide some hints here.
- Install the GNU MPFR Library first.
- Exclude unnecessary components of ViennaRNA:
./configure --without-swig --without-kinfold --without-forester --without-rnalocmin --without-gsl
- If you have linker issues use
./configure --disable-lto
- If your system supports SSE4.1 instructions then we recommend
./configure --enable-sse
If you have further suggestions, we are happy to add them here.
Please create a new directory and build the program for your platform.
% mkdir bin % cd bin % cmake ../lara % make % cd ..
After building the program binary, running LaRA is as simple as
% bin/lara -i sequences.fasta
With the -i parameter you can pass one of the following formats to LaRA. The filename must end with one of the specified suffixes, because the suffix determines the correct format parser.
- FASTA sequence format (
.fa
,.fasta
,.faa
,.ffn
,.fna
,.frn
) - FASTQ sequence with quality annotation (
.fq
,.fastq
) - Raw sequence format (
.raw
) - EMBL sequence format (
.embl
) - Genbank sequence format (
.gbk
) - Dot-bracket notation, with support for various bracket types (
.dbn
) - Vienna format, dot-bracket without pseudoknot (
.dbv
) - Stockholm format (
.sth
) - Connectivity Table (
.ct
) - Bpseq format (
.bpseq
) - Extended Bpseq, with support for base pair probabilities (
.ebpseq
)
Note that for some formats you need the ViennaRNA dependency, as the program must predict base pair probabilities.
Instead, you can pass at least two dot plot files, which contain the base pair probabilities for a single sequence each.
Important: RNAfold must be executed with -p
in order to retrieve a _dp.ps
dot plot file!
% bin/lara -d seq1_dp.ps -d seq2_dp.ps
The pairwise structural alignments are printed to stdout in the T-Coffee Library format (see below). If you want to store the result in a file, please use the -w option or redirect the output.
% bin/lara -i sequences.fasta -w results.lib % bin/lara -i sequences.fasta > results.lib
We recommend you to specify the number of threads with the -j option, e.g. to execute 4 alignments in parallel. If you specify -j 0 the program tries to detect the maximal number of threads available on your machine.
% bin/lara -i sequences.fasta -j 4
For a list of options, please see the help message:
% bin/lara --help
Each output format is sorted primarily by the first and subsequently by the second sequence index.
The result of LaRA is a T-Coffee library file and its format is documented here. It contains the structural scores for each residue pair of each computed sequence pair. This file is the input for T-Coffee, which computes the multiple alignment based on the scores:
% bin/t_coffee -lib results.lib
LaRA has an additional output format that can be read by the MAFFT framework. Each pairwise alignment produces three lines: a description line composed of the two sequence ids and the two gapped sequences of the alignment.
> first id && second id AACCG-UU -ACCGGUU > first id && third id AA-CCGUU AAGCCGUU
MAFFT invokes LaRA with the option -o pairs for receiving this output format.
LaRA can produce the aligned FastA format, which is recommended for a single pairwise alignment. It looks like a normal FastA file with gap symbols in the sequences:
> first id AACCG-UU > second id -ACCGGUU
You need to pass the option -o fasta to the LaRA call for getting this output format.
LaRA prints a warning if you use this format with more than two sequences. Using this format with 3 or more sequences is possible but not recommended, because additional pairwise alignments will simply be appended to the file, and it may be hard to distinguish the pairs later. In addition, this can confuse other programs, which expect a single multiple sequence alignment as produced by MAFFT or T-Coffee.
LaRA 2 is being developed by Jörg Winkler and Gianvito Urgese, but it incorporates a lot of work from other members of the SeqAn project.
You can ask questions and report bugs on the github tracker. Please also subscribe and/or star us! | |
You can also follow SeqAn on twitter to receive updates on LaRA. |
Icons on this page by Austin Andrews: https://github.com/Templarian/WindowsIcons