Link up to AWS S3 buckets #17

ewels · 2020-01-27T09:21:21Z

The NIH has recently started to make SRA data available directly on AWS S3. It would be cool if SRA-Explorer could also link to these.

The complication is that not all datasets are available, and they are spread across more than one S3 bucket. I think that the only way to get the URLs is to take the access number and build a "guess" S3 URI and then test it to see if it exists.

The current buckets are:

An example URL to a specific BAM file: http://sra-pub-src-1.s3.us-east-1.amazonaws.com/DRZ000036/F10-DA.bam.1 (possible to directly download without authentication).

The buckets should allow public and anonymous access, so we should be able to use an AWS SDK to ping the expected files to see if they exist. @wleepang gave a nice example in Python:

>>> import boto3
>>> from botocore import UNSIGNED
>>> from botocore.client import Config
>>> s3 = boto3.client('s3', config=Config(signature_version=UNSIGNED))
>>> s3.head_bucket(Bucket='sra-pub-src-1')
{'ResponseMetadata': {'HTTPStatusCode': 200, 'RetryAttempts': 0, 'HostId': 'sE4i9sSiQmHwuBeBAKp8JUOsDq09BIoX/WtNQlmO+7qvmTe9/bwJfBqkCdAE0cdDg8Fspcbmddc=', 'RequestId': '931ABD9E2B59BA63', 'HTTPHeaders': {'date': 'Wed, 22 Jan 2020 20:38:56 GMT', 'x-amz-id-2': 'sE4i9sSiQmHwuBeBAKp8JUOsDq09BIoX/WtNQlmO+7qvmTe9/bwJfBqkCdAE0cdDg8Fspcbmddc=', 'server': 'AmazonS3', 'transfer-encoding': 'chunked', 'x-amz-request-id': '931ABD9E2B59BA63', 'x-amz-bucket-region': 'us-east-1', 'content-type': 'application/xml'}}}

Note that the files contained within each accession directory seem to be randomly named and quite variable. There are BAM files, FastQ files, Fasta files, all sorts. So we need a big warning notice to (a) let the user know that it's up to them to curate the file list that they're getting and (b) to count and warn about how many datasets we were unable to find.

ewels · 2020-01-27T18:24:26Z

Open data page for this is now up at https://registry.opendata.aws/ncbi-sra/

ewels added the enhancement label Jan 27, 2020

ewels mentioned this issue Aug 11, 2020

Aspera FastQ file link not appearing #24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Link up to AWS S3 buckets #17

Link up to AWS S3 buckets #17

ewels commented Jan 27, 2020

ewels commented Jan 27, 2020

Link up to AWS S3 buckets #17

Link up to AWS S3 buckets #17

Comments

ewels commented Jan 27, 2020

ewels commented Jan 27, 2020