Version 1.4.0: long read binning
Big change is the added binning algorithm for assemblies from long-read datasets.
The overall structure of the pipeline is still similar to what was manuscript, but when clustering, it does not use infomap, but another procedure (an iterative version of DBSCAN).
Use the flag --sequencing-type=long_read
to enable an alternative clustering that works better with long reads.
Other user-visible improvements
- Better error checking at multiple steps in the pipeline so that processes that will crash are caught as early as possible
- Add
--allow-missing-mmseqs2
flag tocheck_install
subcommand (eventually, self-supervision will be the default and mmseqs2 will be an optional dependency)
Command line parameter deprecations
The previous arguments should continue to work, but going forward, the newer arguments are probably a better API.
- Selecting self-supervised learning is now done with the
--self-supervised
flag (instead of--training-type=self
) - Training from multiple samples is now enabled with the
--train-from-many
flag (instead of--mode=several
)
Bugfixes
- The output table sometimes had the wrong path in
v1.3
. This has been fixed - Prodigal is now run in a more robust manner when using multiple threads (#106)