Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add studio flag for rm-dataset and edit-dataset #572

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

amritghimire
Copy link
Contributor

@amritghimire amritghimire commented Nov 7, 2024

This adds a support for --studio flag for edit-dataset and rm-dataset
command. If the --studio flag is passed, it will use the studio client
to process the operation.

Some example are as:

  • datachain rm-dataset "new_test_dataset" --studio --version 1
  • datachain edit-dataset png_files --studio --new-name new_dataset_name

TODO:

  • Add test

Studio PR: https://github.com/iterative/studio/pull/10890

This adds a support for `--studio` flag for edit-dataset and rm-dataset
command. If the --studio flag is passed, it will use the studio client
to process the operation.

Some example are as:
- `datachain rm-dataset "new_test_dataset" --studio --version 1`
- `datachain edit-dataset png_files  --studio --new-name new_dataset_name`

TODO:
- Add test

Studio PR: iterative/studio#10890
@amritghimire amritghimire self-assigned this Nov 7, 2024
Copy link

cloudflare-workers-and-pages bot commented Nov 7, 2024

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: a1610f5
Status: ✅  Deploy successful!
Preview URL: https://340837da.datachain-documentation.pages.dev
Branch Preview URL: https://amrit-temp.datachain-documentation.pages.dev

View logs

Copy link

codecov bot commented Nov 7, 2024

Codecov Report

Attention: Patch coverage is 86.48649% with 5 lines in your changes missing coverage. Please review.

Project coverage is 87.84%. Comparing base (e455180) to head (d50cfea).

Files with missing lines Patch % Lines
src/datachain/studio.py 66.66% 2 Missing and 2 partials ⚠️
src/datachain/remote/studio.py 90.90% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #572      +/-   ##
==========================================
+ Coverage   87.83%   87.84%   +0.01%     
==========================================
  Files         100      100              
  Lines        9993    10024      +31     
  Branches     1356     1363       +7     
==========================================
+ Hits         8777     8806      +29     
+ Misses        873      872       -1     
- Partials      343      346       +3     
Flag Coverage Δ
datachain 87.78% <86.48%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@amritghimire amritghimire requested a review from a team November 7, 2024 17:04
@shcheklein
Copy link
Member

Can be a follow up, but let's rename all these command (similar to ls-dataset that we had). Can we make it all part of the datachain datasets umbrella for example - at least for now. (make datasets alias with ds to make it simpler)

datachain ds ls
datachain ds rm
datachain ds rename (better to have an explicit command vs one edit that is not very clear in terms of its semantics).

@shcheklein
Copy link
Member

We need to figure out docs for this as well.

src/datachain/cli.py Outdated Show resolved Hide resolved
studio: bool = False,
team: Optional[str] = None,
):
if studio:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Q] Is there a reason for deviating from the behaviour implemented in #561? Why can't a user edit/remove a dataset from both local and remote in the same call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For list operation, having the option to list the datasets from both local and remote seems like a frequent operation.

But I am not sure how helpful it will edit or remove dataset from both local and remote in the same call.

I was also looking at the implementation of pull where ds:// prefix is used to identify if it is remote or if it is local. I don't know if it will be confusing to user though to implement it in edit and remove as well. cc. @shcheklein

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I am not sure how helpful it will edit or remove dataset from both local and remote in the same call.

If you think about this in terms of #578 users would definitely want to be able to rename both at the same time. We want to avoid users ending up with datasets with the same identifier being named differently between local and remote as much as possible.

@amritghimire amritghimire requested review from shcheklein, skshetry and a team November 8, 2024 15:14
body["labels"] = labels # type: ignore[assignment]

return self._send_request(
"datachain/edit-dataset",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related to the groups of commands, etc - let's please make API also clean

datachain/datasets/ls
datachain/datasets/rename

-etc

rm-dataset and similar look very weird for REST api.

Copy link
Member

@skshetry skshetry Nov 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could even be a single endpoint that lists on GET, renames via PUT (assuming dataset name is not part of a resource URI as PUT has to be idempotent), removes via DELETE.

Since dataset is part of a team, ideally the URI should be /datachain/<team>/datasets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this comment to the #576 as well. We can take it as a part of refactor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants