-
Notifications
You must be signed in to change notification settings - Fork 20
Changelog
Hannes Hauswedell edited this page Aug 26, 2024
·
52 revisions
This includes the changes used in https://doi.org/10.1093/bioinformatics/btae097
- Two new profiles:
pairs-default
andpairs-sensitive
that are useful in combination with higher--num-matches
. 6cd1d33 - Improvements to the core algorithm. 40b49a0
- In edge cases, the alignment coordinates may have been calculated wrongly. f1ebc9a
- Proper exception handling. aad68eb
- Accept
.fna
and.faa
as extensions for FASTA files. 05262cd
- The command line interface is compatible to 3.0.0.
- The on-disk format is compatible to 3.0.0.
- The output generated for most input files has changed slightly since 3.0.0.
- same as for 3.0.0
- New program mode for searching bisulphite data.
- The nucleotide mode has received much more testing and optimisation.
- Huge overhaul of the algorithm; Lambda3 is up to 3x faster than Lambda2 and uses less memory.
- Use
--profile fast
or--profile sensitive
to select finetuned parameter combinations that are faster or more sensitive than the default.
- The command line interface is very similar to Lambda2's, but some options have been added and some removed.
- The on-disk format of the index has changed. You need to recreate your index files or download new ones from the wiki. Indexes are now single files and may be used in compressed state.
- GCC-11 or later and
-std=c++20
is required to build. - Requires 64bit Intel or AMD CPU with
SSE4
andPOPCNT
instructions.
- SeqAn 2.4.1; SeqAn3 is not used anymore
- BioC++ Core 0.7.1
- BioC++ IO
- Cereal 1.3.2
- ShargParser-1.0.0
- fmindex-collection
- all of these requirements are only required at build-time.
- use
--bit-score
in addition to or instead of--e-value
- fix 32bit builds
- bug in BLASTN evalue calculation (slightly changed values)
- various typos and documentation fixes
- fix dispatcher script on macOS
- the command line interface is identical to
lambda >= 1.9.4
- the on-disk index format is compatible to
lambda >= 1.9.3
- requires
seqan >= 2.3.1
; binary packages based onseqan-2.4.0
- compatible with C++17 if used with
seqan >= 2.4.0
This is the 2.0 stable release of lambda. It is identical to 1.9.5. The on-disk format is guaranteed to be preserved and all command line options and internal parameter will remain fixed, unless there are bugs.
- BLASTN was broken (#115)
- wrong escaping of
"
and'
in command line arguments (#116) - mixed lower-case / upper-case letters led to error on indexing (#117)
- divide-by-zero with very small databases (#118)
- lca-computation error with some certain sequences that have no taxonomy information (#119)
- the command line interface is identical to
lambda >= 1.9.4
- the on-disk index format is compatible to
lambda >= 1.9.3
- requires
seqan >= 2.3.1
; binary packages based onseqan-2.4.0
- compatible with C++17 if used with
seqan >= 2.4.0
- fix build with
seqan-2.4.0
(#114)
- both the interface and the generated index files are fully compatible to the
0.9.*
and1.0.*
series - requires
seqan-2.2.0
or later; packages built fromseqan-2.4.0
- all new single-executable interface with sub-commands (like
git
); see the LAMBDA → lambda2-guide (#88, #94)- the executable is now called
lambda2
- the subcommands
mkindexp
,mkindexn
,searchp
andsearchn
are currently supported - the
-p/--program
parameter has disappeared, instead one chooses via the command between nucleotide and protein search (#96) - many options are now auto-detected from the files, including all index-options and the source alphabets (DNA vs AminoAcid) (#6, #60)
- all short options are now single-letter, some short-options specifiers where removed (#108)
- the executable is now called
- man-pages are now automatically generated and included in the packages
- generic and optimised binaries are now shipped within the same package and automatically selected
- index generation is now 30% – 50% faster (#112)
- crash on empty query sequences, fix requires
seqan >= 2.4.0
(#111) - crash if output file is placed given in non-existent or non-writable directory (#113)
- the command line interface has changed considerably, please see the LAMBDA → lambda2-guide
- the on-disk index format is compatible to 1.9.3
- requires
seqan >= 2.3.1
; binary packages based onseqan-2.4.0
- compatible with C++17 if used with
seqan >= 2.4.0
- everything from 1.0.2, including checks for updates
- faster searches through
- smaller indexes again, approx. the size before 1.9.2 (#93)
- species annotation (#76):
- support for extracting RefSeq and UniParc accession IDs
- support for UniProt
.dat
mapping files in addition to NCBI's.accession2taxid
- the expected memory usage is pre-calculated and checked against available (#86)
- fixed subject position error in some matches in
1.9.0 - 1.9.2
(#87) - stricter parameter checking that prevents wrong usage with obscure errors (#92)
- better exception handling (#100)
- some fixes to BLASTN and TBLASTX modes (#95, #102, #104)
- rare crash on FreeBSD
- stack overflow on long exact matches (especially BLASTP, BLASTN)
- the on-disk index format has changed and is not compatible to any previous version
- requires
seqan >= 2.3.1
- various fixes from new SeqAn versions
- both the interface and the generated index files are fully compatible to the
0.9.*
and1.0.*
series - requires
seqan-2.2.0
- everything from 1.0.1
- SIMD parallelization for short query sequences, e.g. Illumina; not yet default (#58)
- species annotation of subject sequences, see Taxonomic Workflows (#63)
- compile time option that enables larger protein databases sequences (#70)
- bi-directional indexes supported; not yet default (#74)
- taxonomic binning, see Taxonomic Workflows (#77)
- crash when encountering input sequences that are shorter than a seed (#66)
- the on-disk index format has changed and is not compatible to any previous version
- some of the optional
--sam-bam-tags
were renamed, see the wiki (#79) - requires
seqan-2.3.1
- can use
.sam
and.bam
also as sequence input (viaseqan-2.3
)
- minor spelling and documentation fixes (#65)
- make Lambda build on 32bit platforms (#68)
- make
Release
the defaultCMAKE_BUILD_TYPE
again -- If you built Lambda yourself and you didn't set this, please rebuild (#71)
- Lambda is now available in Debian as
lambda-align
- it is also built on many non-x86 platforms, including PowerPC and Sparc64
- both the interface and the generated index files are fully compatible to the
0.9.*
and1.0.*
series - requires
seqan-2.2.0
- fix SAM and BAM output (#61)
- new variable length seeding and new seeding strategy, much faster (#17)
- new FM index with EPR dictionaries (faster, but bigger) (#57)
- early support for SIMD operations in extension phase (only faster for small reads) (#58)
- new database/index format, files now moved to sub-directory, better diagnostics (#7)
- support for
.seg
files and masking was removed, it yielded poor results and is superseded by variable length seeding (#47) - the new index format is incompatible to previous releases and it now uses the
-i
parameter (#7) - both, the command line options and the index-format are subject to change during the
1.9.*
cycle!
- wrong handling of empty databases or such with empty sequences (#54)
- removed the git-subtree and retroactively replaced it with git submodule (#55)
- much smaller repository to clone, if you still want SeqAn with Lambda, add
--recursive
- the
1.0.*
series now depends onSeqAn-2.2.0
and not any longer on a development version - previous git clones have been invalidated and must be forced-pulled or newly cloned
- much smaller repository to clone, if you still want SeqAn with Lambda, add
- significant decrease in binary size (6.3MB vs 30MB) and compile time (1.5m vs 10m) (#49)
- improved continuous integration (#52)
- both the interface and the generated index files are fully compatible to the
0.9.*
series
- support for soft-clipping in SAM/BAM IO via
--sam-bam-clip
(#43)
- missing or redundant hard-clip indicators in SAM/BAM cigar strings (#51)
- LLVM/Clang compiler >= 3.8.0 now supported (#27)
- reduced build times on GCC via parallelization (#45)
- continuous integration for OS X (#46)
- OpenBSD is now supported as platform (although slower than other Unixes) (#48)
- Intel Compiler >= 16.0.2 now supported (#50)
- using indexes that are read-only now works (#38)
- fixed a crash in TBlastN-mode (#40)
- fixed a crash in BlastN-mode in combination with suffix array index type (#41)
- support for SAM and BAM output formats (#15), see also Output-Formats
- truncate subject IDs by default to save lots of space (can be deactivated) (#37)
- using SEG files works again (#33)
- compiler specific parts linked statically on Mac OS X (#34)
- BLASTN indexes were incorrectly classified as old format (#35)
- if an outdated index was detected the value 200 is returned by lambda, so scripts can automatically recreate indexes if using them fails (#35)
- mmapped IO for the database enabling faster startup and memory sharing between instances (#3)
- radixsort suffix array creation (originally by @meiers) resulting in over 30% less RAM and up to 30% speed-up (#11)
- all previous algorithms based on sorting superseded by radixsort, please remove e.g.
-a quicksort
from your scripts
- build in release mode by default (#29)
- improved progress reporting during indexing (#31)
- detect most cases where the index is incompatible (#32)
- ported to SeqAn 2.0 bringing in lots of smaller changes (#1)
- support for column reordering and more columns in tabular output (#2)
- gzipped and bzipped input and output files supported (#19)
- some checks are performed on input data to detect wrong alphabets (#21)
- previously generated indexes are unfortunately not compatible with 0.9.*
- hide most options by default (visible again with
--full-help
)
- error in man-page (#5)
- erroneously report "unexpected extension failure" (#9)
- crash in TBlastX mode (#10)
- parameter for number of matches not working (#13)
- crash when putative duplicates heuristic is turned off (#22)
- many small improvements to BlastIO
- fix build on Darwin / MacOS X
-
lambda_indexer
now has a different suffix array construction algorithm - this works on larger files (where the old algorithm sometimes failed) and is fully parallelized
- there is also a rough progress indication when indexing
- default index type was not set to FM in
lambda_indexer
- new default mode with 30-80% speed gains and up to 75% memory reduction over published version
- double-indexing mode with speed gains > 100%
- sensitivity slightly increased at the same time (1-2%)
- renamed many parameters and changed some defaults
- please look at
lambda --help
to see all the changes!! - better control of verbosity with
-v
parameter - threads now controlled with
-t
instead of environment variable
- BlastN mode now usable again and proper parameter-handling added for it
- added percent identity cutoff in addition to e-value cutoff (
-id
) - added a limit for maximum number of matches per query sequence (
-nm
) - added abundancy heuristic (
-pa
) and priorization of hits to not look at all hits if number of hits >> chosen limit - single-indexing mode which has huge memory advantages (
-qi none
) [now default] - FM-Index is now also default
- removed Lambda-Alphabets, since they currently provide little benefit over Murphy10
- indeces with different settings (index type, alphabet) can now be created on the same fasta file without conflicts between them
- changed pre-scoring heuristic to include region around match (
-ps
and-pt
) - fixed build issues with gcc-4.8.x
- FastQ support fixed
- Speed increased by ~20%
- Suffix-Array index memory consumption reduced from 16x to 6x input database size
- experimental support for FM-index as index (instead of SA) [not widely tested, yet]
- small bugs in BLAST output formats corrected
- source-code .tar.gz
- source-code git commit d41b4b58749282dbca838a7f8506c0b378767b1b)
- multiple optimizations
- added option to partition the query sequences
- added overlapping seeds capability
- source-code .tar.gz
- source-code git commit b8ca36432d0530dd5d39560f8e2dc2cffb7c5d9d)
If anything is unclear, don't hesitate to contact to me.