Skip to content

Version 1.4.0: long read binning

Compare
Choose a tag to compare
@luispedro luispedro released this 15 Dec 03:31
· 166 commits to main since this release

Big change is the added binning algorithm for assemblies from long-read datasets.

The overall structure of the pipeline is still similar to what was manuscript, but when clustering, it does not use infomap, but another procedure (an iterative version of DBSCAN).

Use the flag --sequencing-type=long_read to enable an alternative clustering that works better with long reads.

Other user-visible improvements

  • Better error checking at multiple steps in the pipeline so that processes that will crash are caught as early as possible
  • Add --allow-missing-mmseqs2 flag to check_install subcommand (eventually, self-supervision will be the default and mmseqs2 will be an optional dependency)

Command line parameter deprecations

The previous arguments should continue to work, but going forward, the newer arguments are probably a better API.

  • Selecting self-supervised learning is now done with the --self-supervised flag (instead of --training-type=self)
  • Training from multiple samples is now enabled with the --train-from-many flag (instead of --mode=several)

Bugfixes

  • The output table sometimes had the wrong path in v1.3. This has been fixed
  • Prodigal is now run in a more robust manner when using multiple threads (#106)