-
Notifications
You must be signed in to change notification settings - Fork 250
Home
Welcome to the sra-tools wiki!
2020-02-18 The 2.10.3 releases fixed a problem resulting in a segmentation fault in the following tools: sraxf, fasterq-dump, fastq-dump, sam-dump
2020-01-15 2.10.2 Release
The 2.10.2 release provides access to dbGaP controlled human data in AWS and GCP buckets if you have approval from dbGaP. Public SRA data has been available since the 2.10.0 release.
Original submission format and all SRA-formatted data can be accessed and computed on these clouds, eliminating the need to download from NCBI FTP as well as improving performance.
- Prefetch now accepts a JWT with acts both authorization and selection of data to download using the "--perm" command line argument
- Prefetch allows users to download original data files submitted to SRA along with SRA computed data files using "prefetch --type all"
- Prefetch retained the functionality to accept all style kart file, but it is now specified as a command line argument "--cart"
- Prefetch download has been limited to https and the eliminate-qua ls option has been temporarily disabled
- Added command line options for cloud configurations for vdb-config
- Random error at startup of fasterq-dump has been fixed
- "-Z" option is not accepted for fasterq-dump
- A GUID is shown in vdb-config or created if not yet present
2019-08-19
We have released 2.10.0 of sra-tools
that operate natively within AWS and GCP cloud environments. Most of the functionality you are accustomed to has been preserved, although there are a few changes.
- This release allows access to public SRA data stored within cloud buckets, now including the ability to retrieve original submission files (raw, unharmonized, no error correction) with
prefetch
. - The local caching model has changed to support original submission files: we have introduced the accession directory for
prefetch
that will contain any files you have requested related to a particular accession. - Contrary to prior behavior, if you have not specifically established a designated cache area,
prefetch
will use the accession-directory. - Similarly, the converter (dumper) tools will make use of a process-local temporary cache area unless you have configured the toolkit for a specific cache. NB - this behavior will temporarily use more local space, but is preferred for cluster operation.
- Access to data within the cloud will generally require setting up cloud-specific account credentials and making them known to the toolkit via
vdb-config
. The tools will not send out any credentials until you have agreed to accept charges withinvdb-config
. Your account information is required so that the cloud provider may assess egress charges and is not used in any way by NCBI or transmitted for any other purpose. - Access to cloud data from within a region that would not incur egress charges may be allowed without account credentials - as a special exception. In this case, you may configure the toolkit (using
vdb-config
) to send a cloud service provided environment credential as proof of your execution environment.
With release 2.9.1 of sra-tools
we have finally made available the tool fasterq-dump
, a replacement for the much older fastq-dump
tool. As its name implies, it runs faster, and is better suited for large-scale conversion of SRA objects into FASTQ files that are common on sites with enough disk space for temporary files. fasterq-dump
is multi-threaded and performs bulk joins in a way that improves performance as compared to fastq-dump
, which performs joins on a per-record basis (and is single-threaded).
fastq-dump
is still supported as it handles more corner cases than fasterq-dump
, but it is likely to be deprecated in the future.
You can get more information about fasterq-dump
in this Wiki at https://github.com/ncbi/sra-tools/wiki/HowTo:-fasterq-dump.