-
Notifications
You must be signed in to change notification settings - Fork 250
Home
Welcome to the sra-tools wiki!
2020-01-15 2.10.2 Release
The 2.10.2 release allows for access of dbGaP controlled human data in AWS and GCP buckets if you have approval from dbGaP.
- Prefetch now accepts a JWT with acts both authorization and selection of data to download using the "--perm" command line argument
- Prefetch allows users to download original data files submitted to SRA along with SRA computed data files using "prefetch --type all"
- Prefetch retained the functionality to accept all style kart file, but it is now specified as a command line argument "--cart"
- Prefetch download has been limited to https and the eliminate-qua ls option has been temporarily disabled
- Added command line options for cloud configurations for vdb-config
- Random error at startup of fasterq-dump has been fixed
- "-Z" option is not accepted for fasterq-dump
- A GUID is shown in vdb-config or created if not yet present
2019-08-19
We have released 2.10.0 of sra-tools
that operate natively within AWS and GCP cloud environments. Most of the functionality you are accustomed to has been preserved, although there are a few changes.
- This release allows access to public SRA data stored within cloud buckets, now including the ability to retrieve original submission files (raw, unharmonized, no error correction) with
prefetch
. - The local caching model has changed to support original submission files: we have introduced the accession directory for
prefetch
that will contain any files you have requested related to a particular accession. - Contrary to prior behavior, if you have not specifically established a designated cache area,
prefetch
will use the accession-directory. - Similarly, the converter (dumper) tools will make use of a process-local temporary cache area unless you have configured the toolkit for a specific cache. NB - this behavior will temporarily use more local space, but is preferred for cluster operation.
- Access to data within the cloud will generally require setting up cloud-specific account credentials and making them known to the toolkit via
vdb-config
. The tools will not send out any credentials until you have agreed to accept charges withinvdb-config
. Your account information is required so that the cloud provider may assess egress charges and is not used in any way by NCBI or transmitted for any other purpose. - Access to cloud data from within a region that would not incur egress charges may be allowed without account credentials - as a special exception. In this case, you may configure the toolkit (using
vdb-config
) to send a cloud service provided environment credential as proof of your execution environment.
With release 2.9.1 of sra-tools
we have finally made available the tool fasterq-dump
, a replacement for the much older fastq-dump
tool. As its name implies, it runs faster, and is better suited for large-scale conversion of SRA objects into FASTQ files that are common on sites with enough disk space for temporary files. fasterq-dump
is multi-threaded and performs bulk joins in a way that improves performance as compared to fastq-dump
, which performs joins on a per-record basis (and is single-threaded).
fastq-dump
is still supported as it handles more corner cases than fasterq-dump
, but it is likely to be deprecated in the future.
You can get more information about fasterq-dump
in this Wiki at https://github.com/ncbi/sra-tools/wiki/HowTo:-fasterq-dump.