Support streaming from cloud storage #270

SHuang-Broad · 2021-09-13T14:36:00Z

Hi,

we are routinely using Nanoplot in our cloud-native pipelines and would love to see Nanoplot support streaming from cloud strages.

Based on a quick glimpse of the code, it looks like that would require at least one dependency, i.e. pysam to support that.
Are there any other "patches" necessary to support the streaming?

Thanks,
Steve

The text was updated successfully, but these errors were encountered:

wdecoster · 2021-09-13T17:09:43Z

Hi Steve,

Interesting suggestion! I have to admit I don't immediately know on how to adapt the code for this. Since you ask for pysam you are mainly interested in bam/cram files as input? Which you would then specify using an URL?

Cheers,
Wouter

SHuang-Broad · 2021-09-13T17:16:00Z

Our current pipeline uses Google Cloud Storage (gs://...), but I could see users benefit from support for all major cloud service providers, e.g. AWS, Azure.

If Nanoplot only access the BAM through pysam, then probably that's the dependency that needs to support streaming. And the change will be minimal.

This is definitely an optimization, so it's not an urgent need.

SHuang-Broad · 2021-09-13T17:21:01Z

Regarding supporting gs://... path, I think the following link might be useful.
pysam-developers/pysam#592

wdecoster · 2021-09-13T18:49:33Z

Do you have such a (public?) gs://... path for me to test things on? All our data is processed locally.

SHuang-Broad · 2021-09-13T19:38:24Z

we don't have any public data to share (definitely because downloading data from cloud storage incurs costs on the owner of the data unless something like requester pay is specified, so this could easily be abused by malicious actors).

I think these from DeepVariant team themselves might work, but may require you to set up a google cloud account:
https://console.cloud.google.com/storage/browser/deepvariant/pacbio-case-study-testdata?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&prefix=&forceOnObjectsSortingFiltering=false

I'm sorry if this is too much trouble.
Thanks for getting on top of this!
Steve

wdecoster added the enhancement label Jan 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support streaming from cloud storage #270

Support streaming from cloud storage #270

SHuang-Broad commented Sep 13, 2021

wdecoster commented Sep 13, 2021

SHuang-Broad commented Sep 13, 2021

SHuang-Broad commented Sep 13, 2021

wdecoster commented Sep 13, 2021

SHuang-Broad commented Sep 13, 2021

Support streaming from cloud storage #270

Support streaming from cloud storage #270

Comments

SHuang-Broad commented Sep 13, 2021

wdecoster commented Sep 13, 2021

SHuang-Broad commented Sep 13, 2021

SHuang-Broad commented Sep 13, 2021

wdecoster commented Sep 13, 2021

SHuang-Broad commented Sep 13, 2021