Releases: sourmash-bio/sourmash
v4.0.0
Major changes for 4.0
4.0 is a major new version of sourmash, and it contains a number of new and breaking features.
Please see our migration guide for more information on how to migrate from v3.x to version 4.0!
Numerical output and search results are unchanged
There are no changes to numerical output or search results in this release; you should get the same results with v4 as you get with v3, except where command-line parameters need to be adjusted as noted below (see: protein ksize #1277, lca summarize
changes #1175, sourmash gather
on signatures without abundance #1328). Please file an issue if your results change!
New or changed behavior
- default SBT storage is now .sbt.zip (#1174, #1170)
- add
sourmash sketch
command for creating signatures (#1159) - protein ksizes in MinHash are now divided by 3, except in
sourmash compute
(#1277) - refactor MinHash API and implementation: add, iadd, merge, hashes, and max_hash (#1282, #1154, #1139, #1301)
- add HyperLogLog implementation (#1223)
SourmashSignature.name
is now a property (not a method): usestr(sig)
instead ofname()
(#1179, #1232)lca summarize
no longer merges all signatures, and uses hash abundance by default (#1175)index
andlca index
(#1186, #1222) now support--from-file
and no longer require signature files on command line--traverse-directory
is now on by default for signature loading behavior (#1178)sourmash sketch
andsourmash compute
no longer create empty signatures from empty files and stdin (#1347);sourmash sketch
andsourmash compute
setsig.filename
to empty string when filename is-
(#1347);
Feature removal
- remove Python 2.7 support (& end Python 2 compatibility) (#1145, #1144)
- remove
lca gather
(#1307) - remove 10x support from
sourmash compute
(#1229) - remove 'dump' command (#1157)
Feature/function deprecations
- deprecate
sourmash compute
(#1159) - deprecate
load_signatures
,sourmash.load_one_signature
,create_sbt_index
, andload_sbt_index
(#1279, #1304) - deprecate import_csv in favor of new
sourmash sig import --csv
(#1281)
Refactoring, improvements, and minor bug fixes:
- accept file list in
sourmash sig cat
(#1236) - add unique_intersect_bp and gather_result_rank to gather CSV output (#1219)
- remove deprecated minhash functions (#1149)
- fix Rust panic error in signature creation (#1172)
- cache nodes in SBT during search (#1161)
- fix two bugs in gather --output-unassigned (#1156)
- Refactor the gather code so that it uses 'hashes' instead of 'mins' (#1329)
- Update output from gather w/o abundances, so that abund output is empty instead of 0(#1328)
Documentation updates
- substantial revisions and updates to the documentation (#1283)
- add information about versioning, migrations, etc to the docs (#1153)
Infrastructure and CI changes:
- update finch requirement from 0.3.0 to 0.4.1 (#1290)
- update rand for test, and activate "js" feature for getrandom (#1275)
- dev updates (configs and doc) (#1298)
- move wheel building from Travis to GitHub Actions (#1295)
- fix new clippy warnings from Rust 1.49 (#1267)
- use tox for running tests locally (#696)
- CI: small build fixes (#1252)
- CI: Fix releases in GitHub Actions (#1250)
- update build_wheel action paths
- CI: moving python tests from travis to GH actions (#1249)
- CI: move wheel building to GitHub actions (#1244)
- remove last .rst file from docs (#1185)
- update CI for latest branch name change (#1150)
v3.5.1
Feature deprecations
- add deprecation warning for
sourmash compute --input-is-10x
(#1326) - add warnings about new
sourmash lca summarize
behavior (#1326) - add warning for new behavior of
MinHash.merge(...)
(#1326) - add deprecation warning for
TarStorage
(#1165)
Infrastructure and CI changes:
- Backport github actions to stable branch (3.5.x) (#1317)
v3.5.0
This is the first of several minor releases (v3.5.x) from the new stable
branch. These releases focus on preparing for sourmash v4.0 by introducing deprecations and warnings for features that will be removed in v4.0.
Refactoring and deprecations:
MinHash
class refactoring (#1128, #1129); many deprecations for 4.0 and 5.0sourmash dump
deprecated, for removal in 4.0 (#1147)import sourmash_lib
deprecated, for removal in 4.0 (#1143)
Cleanup:
- remove mentions of ijson and khmer (no longer needed dependencies) #1140
Documentation:
- Simplify and clean up README (#1124)
- Add sourmash logo to docs and README (#1127)
- update release process and release notes (#1125)
Rust:
- Update typed-builder requirement from 0.6.0 to 0.7.0 (#1121)
v3.4.1
Major new features:
- Document
sourmash.fig
usage and behavior; enable output ofcompare
clustering with labels (#859) - Adds --majority option to
lca classify
using majority vote algorithm (#1113)
Minor improvements:
- MinHash compatibility check to sourmash sig intersect (#1116)
Bugs fixed:
- add ksize selectors back into sourmash sig functions (#1105)
Documentation updates:
v3.4.0
Major new features:
- enable seamless loading of signatures from indexed databases (#1059, #1083, #1090)
- add
signature cat
andsignature split
commands to combine/split signature files (#1044, #1074) - add compute-optimized MinHash (for small scaled or large cardinalities) in Rust (#1045)
- optionally weight lca summarize output by hashval abundance. (#1022)
- enable moltypes other than DNA in LCA databases (#1013)
Minor improvements:
- add --num-results/-n to gather (#1047)
- improve lca index error message when inserting num signature (#1076)
- autodetect FASTA/FASTQ files if given as signatures (#1078)
- add is_lineage_match, pop_to_rank, make_lineage to lca_utils (#1081)
- use stricter niffler versions and add new gz feature to it (#1070)
- added
MinHash.clear()
andMinHash.add_hash_with_abundance
to Python API (#1046)
Bugs fixed:
- investigations and fixes around new gather behavior. (#1001)
Refactoring:
- move tests from
test_lca
intotest_lca_functions
(#1035) - remove unused run_shell_cmd function (#1032)
- refactor some tests in test_sourmash.py to use @utils.in_tempdir decorators (#1020)
- use install scripts from py-ipfs-http-client (#1068)
Documentation:
- Improve documentation around abundance projection (#1073)
- Replace recommonmark with myst (docs) (#1021)
- Fix doctest filename error (#1040)
Thanks to @luizirber @ctb @bluegenes @erikyoung85 for their contributions!
3.3.1
version 3.3.0
Improvements:
- add
ZipStorage
, support loading SBT databases from storage;.sbt.zip
extensions. (#648) - Replace
khmer.Nodegraph
with rust nodegraph; ~5x speedup of SBT search & gather. (#799)
Bugs:
- Document and (lightly) fix the
LCA_Database
API. (#966) - Fix bug when using Python 3.5 and before; refactor
LCA_Database
tests (#962)
Documentation:
version 3.2.3
Incompatibilities with previous versions due to bugs:
sourmash gather
on SBT databases was setting--threshold-bp=0
in all cases. This was fixed in #942, and output may change. Specify--threshold-bp=0
to recover old behavior.
Improvements:
- refactor LCA_Database class to support programmatic creation. (#946)
- add --singleton option to lca summarize (#922)
- update gather to calculate fraction of match that was in original query (#938)
- add compare --containment (#937)
- add --outdir argument to
sourmash compute
(#935) - improvements to sourmash argparse output for compute. (#931)
Bugs:
- fix
lca classify
bug with -o (#902) - set_abundances now works with large signatures (#911)
- test & fix LinearIndex, SBT, and LCA
gather
thresholding. (#942)
Build, CI and docs:
v3.2.2: 3.2.2
Improvements:
- more refactoring of MinHash API (#889)
- add_hash_with_abundance method in core library (#892)
- Replace mins_push and abunds_push with set_abundances (#887)
- More refactoring of MinHash comparison code (#882)
- better sourmash compare error handling (#876)
Bugs:
- add_hash with num doesn't set abundances properly (#891)
- name signatures based on md5sum, not on name() (#884)
Build, CI and docs:
- update docs for how to run Rust tests (#888)
v3.2.1: 3.2.1
Bugs:
- re-add 'signature' as alias for 'sig' (#881)