Release Version 1.4.0: long read binning · BigDataBiology/SemiBin

Big change is the added binning algorithm for assemblies from long-read datasets.

The overall structure of the pipeline is still similar to what was manuscript, but when clustering, it does not use infomap, but another procedure (an iterative version of DBSCAN).

Use the flag --sequencing-type=long_read to enable an alternative clustering that works better with long reads.

Other user-visible improvements

Better error checking at multiple steps in the pipeline so that processes that will crash are caught as early as possible
Add --allow-missing-mmseqs2 flag to check_install subcommand (eventually, self-supervision will be the default and mmseqs2 will be an optional dependency)

Command line parameter deprecations

The previous arguments should continue to work, but going forward, the newer arguments are probably a better API.

Selecting self-supervised learning is now done with the --self-supervised flag (instead of --training-type=self)
Training from multiple samples is now enabled with the --train-from-many flag (instead of --mode=several)

Bugfixes

The output table sometimes had the wrong path in v1.3. This has been fixed
Prodigal is now run in a more robust manner when using multiple threads (#106)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version 1.4.0: long read binning

Other user-visible improvements

Command line parameter deprecations

Bugfixes