Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: add sig cat utility #394

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

WIP: add sig cat utility #394

wants to merge 15 commits into from

Conversation

bluegenes
Copy link
Contributor

@bluegenes bluegenes commented Jul 18, 2024

A version of sourmash sig cat. At the moment, I think this is not faster?? Would probably need to read multithreaded with piz to get faster. Is there any way to avoid loading the sig unless we need to down sample? (e.g just copy file?)

To avoid fail if any collection contains no signatures compatible with the desired parameters, I added a --force parameter that stops load_collection from automatically exiting if there are no signatures to load. I added a corresponding check to sigcat to ensure we have signatures to concatenate prior to starting the output zipfile (bail if not).

If --scaled selection is used, we downsample where possible to return the desired scaled.

usage:

sourmash scripts sigcat  <sigfile_1> <sigfile_2> ... <sigfile_n> -o <output>.zip
usage:  sigcat [-h] [-q] [-d] -o OUTPUT [-k KSIZE] [-s SCALED] [-m {DNA,protein,dayhoff,hp}] [-f] signatures [signatures ...]

concatenate signatures into a single sourmash zip file

positional arguments:
  signatures            sourmash signature files

options:
  -h, --help            show this help message and exit
  -q, --quiet           suppress non-error output
  -d, --debug           provide debugging output
  -o OUTPUT, --output OUTPUT
                        output zip file for final signatures
  -k KSIZE, --ksize KSIZE
                        k-mer size at which to select sketches; no default
  -s SCALED, --scaled SCALED
                        scaled factor at which to do comparisons; no default
  -m {DNA,protein,dayhoff,hp}, --moltype {DNA,protein,dayhoff,hp}
                        molecule type (DNA, protein, dayhoff, or hp; no default)
  -f, --force           force: allow input sig files to contain no signatures or only incompatible signatures

@bluegenes bluegenes changed the title EXP: add sig cat utility WIP: add sig cat utility Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant