Releases: BigDataBiology/SemiBin
Version 2.1.0
Main new feature is adding support for using output of strobealign-aemb. Use of the SemiBin
command (instead of SemiBin2
) will continue to work, but print a warning and set a delay to ask users to upgrade.
Full ChangeLog
- SemiBin: Support running SemiBin with strobealign-aemb (
--abundance
/-a
) - citation: Add citation subcommand
- SemiBin1: Introduce separate SemiBin1 command
- internal: Code simplification and refactor
- deprecation: Deprecate
--orf-finder=fraggenescan
option - Update abundance normalization
- SemiBin: do not use more processes than can be taken advantage of (#155)
Version 2.0.2
Minor bugfix release (#128)
Version 2.0.1
Fix bugs in v2.0.0
, mainly with the SemiBin2
command and argument passing.
Full ChangeLog:
- train_self: Fix bug with
--mode
- concatenate_fasta: Fix bug with compression
- bin_short: Make alias work
Version 2.0.0
Effectively a minor release, turning the SemiBin2 beta into a full SemiBin2 release and soft-deprecating SemiBin1.
Full ChangeLog:
SemiBin
: Better error checking throughoutSemiBin
: Write a log fileconcatenate_fasta
: support compressionconcatenate_fasta
: slightly better error message when contig ID already contains separatorSemiBin
: addbin_short
as alias forbin
Version 1.5.1
Fix use of --no-recluster
with multi_easy_bin, see #128
Version 1.5.0: SemiBin2 beta
Big change is the addition of a SemiBin2
script, which is still experimental, but should be a slightly nicer interface.
User-visible improvements since v1.4.0
- Added a new option for ORF finding, called
fast-naive
which is an internal very fast implementation. - Added the possibility of bypassing ORF finding altogether by providing prodigal outputs directly (or any other gene prediction in the right format)
- Command line argument checking is more exhaustive instead of exiting at first error
- Added
--quiet
flag to reduce the amount of output printed - Better
--help
(group required arguments separately) - Add
--output-compression
option to compress outputs - Add
--tag-output
option which allows for control of the output filenames (and also makes the anvi'o compatible — see discussion at #123. - Add contig->bin mapping table (#123)
SemiBin.main.main1
andSemiBin.main.main2
can now be called as a function with command line arguments (main1
corresponds to SemiBin1 andmain2
corresponds to SemiBin2)
import SemiBin.main
...
SemiBin.main.main2(['single_easy_bin', '--input-fasta', ...])
Version 1.4.0: long read binning
Big change is the added binning algorithm for assemblies from long-read datasets.
The overall structure of the pipeline is still similar to what was manuscript, but when clustering, it does not use infomap, but another procedure (an iterative version of DBSCAN).
Use the flag --sequencing-type=long_read
to enable an alternative clustering that works better with long reads.
Other user-visible improvements
- Better error checking at multiple steps in the pipeline so that processes that will crash are caught as early as possible
- Add
--allow-missing-mmseqs2
flag tocheck_install
subcommand (eventually, self-supervision will be the default and mmseqs2 will be an optional dependency)
Command line parameter deprecations
The previous arguments should continue to work, but going forward, the newer arguments are probably a better API.
- Selecting self-supervised learning is now done with the
--self-supervised
flag (instead of--training-type=self
) - Training from multiple samples is now enabled with the
--train-from-many
flag (instead of--mode=several
)
Bugfixes
- The output table sometimes had the wrong path in
v1.3
. This has been fixed - Prodigal is now run in a more robust manner when using multiple threads (#106)
Version 1.3.1
Version 1.3.0
erroneously made --training-type
mandatory.
We had intended to keep backwards compatibility with previous versions and v1.3.1
fixes that.
Version 1.3.0
Introduces self-supervised learning! This is optional (for now, will become default in SemiBin2), but can achieve better results. See the docs on training SemiBin models for more information).
Also, fixes a few minor bugs, namely bin names in the output table and renames one command line argument from the mispelled --epochs
(instead of a misspelling).
Version 1.2.0
Big change is adding a new chicken caecum prebuilt model (courtesy of Florian Plaza Oñate), but also better outputs.
Full ChangeLog
- Pretrained model from chicken caecum
- Output table with basic information on bins (including N50 & L50)
- When reclustering is used (default), output the unreclusted bins into a directory called
output_prerecluster_bins
- Added
--verbose
flag and silented some of the output when it is not used - Use coloredlogs (if package is available)