Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DEVX:123] Support Dataset Upload #140

Merged
merged 5 commits into from
Aug 10, 2023

Conversation

sainivedh
Copy link
Contributor

@sainivedh sainivedh commented Aug 8, 2023

Dataset Upload Functionality in SDK

What

This enhancement introduces dataset upload functionality to the Clarifai SDK, leveraging the existing Python utility data upload approach.

How

To utilize this new functionality, users can follow these steps:

from clarifai.client.app import App

app = App(app_id="", user_id="")
# List all datasets
all_datasets = app.list_datasets()
# Create a dataset in the Clarifai App
dataset = app.create_dataset(dataset_id="")
# Execute data upload to the Clarifai App dataset
dataset.upload_dataset(task='visual_segmentation', split="train", zoo_dataset='coco_segmentation')

Improvements

  • Dataloader is changed to run time proto extraction instead of in-memory (more context)
  • Added a base ClarifaiDataLoader class

Tests

  • Manual data upload test for

    1. COCO det, seg
    2. xView det
    3. Cifar10, Food101 Image cls
    4. IMDB reviews text cls

@sainivedh sainivedh marked this pull request as ready for review August 9, 2023 07:32
Copy link
Contributor

@sanjaychelliah sanjaychelliah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just minor comments..

Copy link
Contributor

@stmugisha stmugisha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good.

Is the upload sub directory under datasets necessary? (clarifai.datasets.upload).
If not , I think we can reduce the nesting and move the files to under datasets.

clarifai/client/dataset.py Outdated Show resolved Hide resolved
Copy link
Contributor

@ackizilkale ackizilkale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as long as the outstanding conversations are resolved.

@zeiler
Copy link
Member

zeiler commented Aug 9, 2023

only the one comment on changing this notion of zoo to more of what it seems that folder is which is just different types of loaders of datasets. I didn't do a thorough code review as there are many eyes on it overall.

Copy link
Contributor

@stmugisha stmugisha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@sainivedh sainivedh merged commit 30cbc1f into clarifai-sdk-dev Aug 10, 2023
4 checks passed
sainivedh added a commit that referenced this pull request Aug 17, 2023
@sainivedh sainivedh deleted the DEVX-123-add-dataupload branch August 18, 2023 05:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants