LaRA 2: Lagrangian Relaxed structural Alignment

LaRA 2 is an improved version of LaRA, a tool for sequence-structure alignment of RNA sequences. It...

computes all pairwise sequence-structure alignments of the input sequences
produces files that can be processed with T-Coffee or MAFFT to compute a multiple sequence-structure alignment
employs methods from combinatorial optimization to compute feasible solutions for an integer linear program
can read many input formats for RNA structure, e.g. Dot-bracket notation, Stockholm, Vienna format
is implemented to use multiple threads on your machine and runs therefore very fast
has a vectorized alignment kernel, which computes the results even faster
is based on the SeqAn library, currently version 2
is well-documented and easy to use

Download instructions

Clone the repository and use the --recurse-submodules option for downloading SeqAn and Lemon as submodules.

% git clone --recurse-submodules https://github.com/seqan/lara.git

Alternatively, you can download a zip package of the repository via the green button at the top of the github page. If you do so, please unzip the file into a new subdirectory named lara and download the dependencies separately.

Requirements

platforms: Linux, MacOS
compiler: gcc ≥ 5 or clang ≥ 3.8 or icc ≥ 17
cmake ≥ 3.8

LaRA is dependent on the following libraries:

To process the output for multiple alignments (3 or more sequences), you need either

Optionally, LaRA can predict the RNA structures for you if you provide

ViennaRNA 2

Note: Users reported problems with installing ViennaRNA, so we provide some hints here.

Install the GNU MPFR Library first.
Exclude unnecessary components of ViennaRNA: ./configure --without-swig --without-kinfold --without-forester --without-rnalocmin --without-gsl
If you have linker issues use ./configure --disable-lto
If your system supports SSE4.1 instructions then we recommend ./configure --enable-sse

If you have further suggestions, we are happy to add them here.

Build instructions

Please create a new directory and build the program for your platform.

% mkdir bin
% cd bin
% cmake ../lara
% make
% cd ..

Usage

After building the program binary, running LaRA is as simple as

% bin/lara -i sequences.fasta

With the -i parameter you can pass one of the following formats to LaRA. The filename must end with one of the specified suffixes, because the suffix determines the correct format parser.

FASTA sequence format (.fa, .fasta, .faa, .ffn, .fna, .frn)
FASTQ sequence with quality annotation (.fq, .fastq)
Raw sequence format (.raw)
EMBL sequence format (.embl)
Genbank sequence format (.gbk)
Dot-bracket notation, with support for various bracket types (.dbn)
Vienna format, dot-bracket without pseudoknot (.dbv)
Stockholm format (.sth)
Connectivity Table (.ct)
Bpseq format (.bpseq)
Extended Bpseq, with support for base pair probabilities (.ebpseq)

Note that for some formats you need the ViennaRNA dependency, as the program must predict base pair probabilities. Instead, you can pass at least two dot plot files, which contain the base pair probabilities for a single sequence each. Important: RNAfold must be executed with -p in order to retrieve a _dp.ps dot plot file!

% bin/lara -d seq1_dp.ps -d seq2_dp.ps

The pairwise structural alignments are printed to stdout in the T-Coffee Library format (see below). If you want to store the result in a file, please use the -w option or redirect the output.

% bin/lara -i sequences.fasta -w results.lib
% bin/lara -i sequences.fasta  > results.lib

We recommend you to specify the number of threads with the -j option, e.g. to execute 4 alignments in parallel. If you specify -j 0 the program tries to detect the maximal number of threads available on your machine.

% bin/lara -i sequences.fasta -j 4

For a list of options, please see the help message:

% bin/lara --help

Output format

Each output format is sorted primarily by the first and subsequently by the second sequence index.

for multiple alignments with T-Coffee

The result of LaRA is a T-Coffee library file and its format is documented here. It contains the structural scores for each residue pair of each computed sequence pair. This file is the input for T-Coffee, which computes the multiple alignment based on the scores:

% bin/t_coffee -lib results.lib

for multiple alignments with MAFFT

LaRA has an additional output format that can be read by the MAFFT framework. Each pairwise alignment produces three lines: a description line composed of the two sequence ids and the two gapped sequences of the alignment.

> first id && second id
AACCG-UU
-ACCGGUU
> first id && third id
AA-CCGUU
AAGCCGUU

MAFFT invokes LaRA with the option -o pairs for receiving this output format.

for pairwise alignments

LaRA can produce the aligned FastA format, which is recommended for a single pairwise alignment. It looks like a normal FastA file with gap symbols in the sequences:

> first id
AACCG-UU
> second id
-ACCGGUU

You need to pass the option -o fasta to the LaRA call for getting this output format.

LaRA prints a warning if you use this format with more than two sequences. Using this format with 3 or more sequences is possible but not recommended, because additional pairwise alignments will simply be appended to the file, and it may be hard to distinguish the pairs later. In addition, this can confuse other programs, which expect a single multiple sequence alignment as produced by MAFFT or T-Coffee.

Authorship & Copyright

LaRA 2 is being developed by Jörg Winkler and Gianvito Urgese, but it incorporates a lot of work from other members of the SeqAn project.

Feedback & Updates

	You can ask questions and report bugs on the github tracker. Please also subscribe and/or star us!
	You can also follow SeqAn on twitter to receive updates on LaRA.

Icons on this page by Austin Andrews: https://github.com/Templarian/WindowsIcons

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
benchmark		benchmark
include		include
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.rst		README.rst

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LaRA 2: Lagrangian Relaxed structural Alignment

Download instructions

Requirements

Build instructions

Usage

Output format

for multiple alignments with T-Coffee

for multiple alignments with MAFFT

for pairwise alignments

Authorship & Copyright

Feedback & Updates

About

Releases

Packages

Languages

License

seqan/lara

Folders and files

Latest commit

History

Repository files navigation

LaRA 2: Lagrangian Relaxed structural Alignment

Download instructions

Requirements

Build instructions

Usage

Output format

for multiple alignments with T-Coffee

for multiple alignments with MAFFT

for pairwise alignments

Authorship & Copyright

Feedback & Updates

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages