Add support for Data Explorer functionality (list, add) #381

robnewman · 2024-01-11T16:04:52Z

Add a new command:

tw data-links

that interacts with the new Data Explorer data-links API endpoint in the Seqera Platform. Some suggested functionality (need to add all the auth syntactic sugar):

tw data-links list --workspace=<workspaceId>                             # list data-links in a workspace
tw data-links list --workspace=<workspaceId> --provider=<cloudProvider>  # subset to a specific cloud provider
tw data-links list --workspace=<workspaceId> --type <cloud|custom>       # subset to auto cloud or custom data links
tw data-links add --workspace=<workspaceId> --name=<dataLinkName> --credentials=<credentials> --description=<description> --provider=<cloudProvider> # add a custom data link to a workspace
tw data-links delete --datalink=<datalinkId> --workspace=<workspaceId>   # delete a datalink in a workspace
tw data-links cp --datalink=<datalinkId>:/path/to/object.txt object.txt  # copy/download a single object (defined by prefix path) from the data link to your localhost 
tw data-links cp /path/to/samplesheet.csv --datalink=<datalinkId> --workspace=<workspaceId> # upload a file from your localhost to the data link
tw data-links cp /path/to/folder --datalink =<datalinkId> --workspace=<workspaceId> --recursive # upload all files in a folder from your localhost to the data link

Tasks

Give feedback

[Data explorer] Add list data links support to CLI #405
[Data explorer] Add create custom data links support to CLI #406
[Data explorer] Add delete custom data links support to CLI #407
[Data explorer] Add content listing functionality to CLI #413
https://github.com/seqeralabs/platform/issues/6457
Options

The text was updated successfully, but these errors were encountered:

ewels · 2024-01-11T16:21:33Z

Presumably tw data-explorer cp downloads to the current working directory? Could be nice to support alternative destinations too.. 🤔 Potentially as a separate tw data-explorer sync command, or a flag, or just a second positional argument..

pditommaso · 2024-01-11T17:08:25Z

For me it's a -1. Why bloating the CLI with this?

ewels · 2024-01-11T19:16:11Z

You could argue that it's not worth having a CLI at all, there's a perfectly good API!

Having it in the CLI makes it faster and easier to work with datasets from the terminal. It improves developer / user experience.

ewels · 2024-01-11T19:19:29Z

Also the technical reason for having it in the CLI for downloading files:

Presigned URLs expire after a short-ish window (@swampie thought it was 1 hour). If downloading a large dataset, the download could easily run for many hours. A generated bash script would therefore fail, however the CLI could request the presigned URLs one at a time in series, meaning that they're always fresh and continue to work.

evanfloden · 2024-01-11T20:07:30Z

Adding a usecase to download/list a dataset, with a flag to download/list the files inside the dataset csv/tsv/table. For example:

tw dataset cp <dataset_id> --files

This downloads the dataset table (csv/tsv) plus the files. In this way the user only has to be concerned with passing around the dataset object, and they can download/list the files at any time. Think a dataset can also be an output so it becomes a packaging mechanism.

Note: Today, the auth to access/download/list files in a dataset is not guaranteed as users can create whatever s3:// paths they want in a csv. This issue also exists when launching a pipeline.

pditommaso · 2024-01-15T10:27:15Z

Fair enough

swampie · 2024-01-18T11:13:50Z

for upload and download why using the seqera cli when you can use the standard cloud tooling?

ewels · 2024-01-18T22:01:53Z

No need to maintain cloud credentials locally
Support multiple compute env types (clouds) with a consistent command and single CLI tool
Download via consistent Seqera identifiers, less risk of sample or file mixup
user experience if we add nice things as suggested by Evan: eg. downloading all data paths within the CSV

mbosio85 · 2024-01-24T09:04:19Z

Considering the ongoing work to extend the Data Explorer availability to personal workspaces, these new CLI capability should be implemented for those as well.

swampie · 2024-01-25T11:35:02Z

I agree with Paolo that the complexity is not justified for the time being: open to discuss

evanfloden · 2024-01-26T12:34:13Z

Adding a very key point being lost here.

Our end users shouldn't need cloud console or cloud provider CLI access. They likely don't have cloud credentials. This is the point of having different roles with WS admins adding credentials and CEs.

End users want to upload data, run pipelines, and download results.

pditommaso · 2024-01-26T13:25:29Z

I agree 💯 that CLI should have first-class support. However, my understanding is that the feature highlighted here does not come for free, it may require some specific endpoints.

robnewman · 2024-04-01T22:17:55Z

Updated original request to match the Data Explorer data-links API endpoint name

robnewman · 2024-04-17T13:24:07Z

TBD - pagination is always returned by the API, need to account for this in the CLI commands.

jordeu · 2024-05-10T14:13:31Z

I feel a bit weird about naming this subcommand data-links. I've checked the data explorer's UI, and there, you can upload files without any mention of the "data link" concept. And you can create new "data links" also without any mention of that concept. Why should we use this name in the CLI?

The sub-title where you can list your "data links" says, "Browse remote data repositories and data for use in Seqera Cloud," with no reference to this "data link" concept. Overall, this "data link" concept is misleading.

I'd call it "data source", and then the command line can be tw data-source... with tw ds... alias. Also, the tw data-source add ... subcommand would be more meaningful.

But because naming is difficult and what sounds good to me may sound terrible to others, I suggest reviewing this naming before hardcoding it into the command line interface. Or at least, if "data link" is chosen as the best way of naming it, the web UI should be consistent and call that section "data links" instead of "data explorer" with explicit references to the "data link" concept when you add a new one.

robnewman · 2024-05-14T14:51:26Z

@jordeu Thanks for the feedback! The Data Explorer API endpoint is called data-links and we were being consistent with that. I think it would be more confusing to have the API endpoint named differently to the CLI interface (when both are publicly accessible). I agree that the term "data-link" is widely used internally but not directly surfaced externally. I would be in favor of explicitly referencing that term in our docs, but open to feedback.

weronikasosnowskaseqera · 2024-05-23T10:52:18Z

@robnewman we are missing here the method to list content

robnewman · 2024-05-23T13:13:57Z

@weronikasosnowskaseqera Please add. I wasn't necessarily comprehensive - just that the functionality needs to exist and reflect the API functionality.

canny · 2024-06-13T15:45:44Z

This issue has been unlinked from a Canny post: Add datasets directly from s3 / data explorer to the platform 😢

robnewman · 2024-08-20T19:40:22Z

This is now done except for the tw data-link cp command. The other commands are part of the v0.9.4 release.

weronikasosnowskaseqera · 2024-08-21T06:46:25Z

tw data-link cp (download/upload) will be handled with another task: https://seqera.atlassian.net/browse/PLAT-289

mbosio85 added this to the v1.0.0 milestone Apr 10, 2024

robnewman added the API New things that have the API that are not yet supported by the CLI label Apr 23, 2024

jimmypoms assigned weronikasosnowskaseqera May 13, 2024

weronikasosnowskaseqera changed the title ~~Add support for Data Explorer functionality (list, add, cp)~~ Add support for Data Explorer functionality (list, add) May 29, 2024

weronikasosnowskaseqera mentioned this issue Jul 15, 2024

[Data explorer] basic data links operations support #411

Merged

weronikasosnowskaseqera added the enhancement New feature or request label Jul 17, 2024

jimmypoms added the sync-jira label Jul 24, 2024

robnewman removed the sync-jira label Aug 20, 2024

robnewman closed this as completed Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Data Explorer functionality (list, add) #381

Add support for Data Explorer functionality (list, add) #381

robnewman commented Jan 11, 2024 •

edited by weronikasosnowskaseqera

Loading

Tasks

ewels commented Jan 11, 2024

pditommaso commented Jan 11, 2024

ewels commented Jan 11, 2024

ewels commented Jan 11, 2024

evanfloden commented Jan 11, 2024

pditommaso commented Jan 15, 2024

swampie commented Jan 18, 2024

ewels commented Jan 18, 2024

mbosio85 commented Jan 24, 2024

swampie commented Jan 25, 2024

evanfloden commented Jan 26, 2024

pditommaso commented Jan 26, 2024

robnewman commented Apr 1, 2024

robnewman commented Apr 17, 2024

jordeu commented May 10, 2024

robnewman commented May 14, 2024

weronikasosnowskaseqera commented May 23, 2024

robnewman commented May 23, 2024

canny bot commented Jun 13, 2024

robnewman commented Aug 20, 2024 •

edited

Loading

weronikasosnowskaseqera commented Aug 21, 2024

Add support for Data Explorer functionality (list, add) #381

Add support for Data Explorer functionality (list, add) #381

Comments

robnewman commented Jan 11, 2024 • edited by weronikasosnowskaseqera Loading

Tasks

ewels commented Jan 11, 2024

pditommaso commented Jan 11, 2024

ewels commented Jan 11, 2024

ewels commented Jan 11, 2024

evanfloden commented Jan 11, 2024

pditommaso commented Jan 15, 2024

swampie commented Jan 18, 2024

ewels commented Jan 18, 2024

mbosio85 commented Jan 24, 2024

swampie commented Jan 25, 2024

evanfloden commented Jan 26, 2024

pditommaso commented Jan 26, 2024

robnewman commented Apr 1, 2024

robnewman commented Apr 17, 2024

jordeu commented May 10, 2024

robnewman commented May 14, 2024

weronikasosnowskaseqera commented May 23, 2024

robnewman commented May 23, 2024

canny bot commented Jun 13, 2024

robnewman commented Aug 20, 2024 • edited Loading

weronikasosnowskaseqera commented Aug 21, 2024

robnewman commented Jan 11, 2024 •

edited by weronikasosnowskaseqera

Loading

robnewman commented Aug 20, 2024 •

edited

Loading