Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to run the datachain query to Studio #579

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

amritghimire
Copy link
Contributor

@amritghimire amritghimire commented Nov 9, 2024

The options are:

positional arguments:

  query_file            The query file to run.

options:

--team TEAM           The team to run a job for. By default, it will use team from config.
  --env-file ENV_FILE   File containing environment variables to set for the job.
  --envs ENVS [ENVS ...]
                        Environment variables to set for the job.
  --workers WORKERS     Number of workers to use for the job.
  --files FILES [FILES ...]
                        Files to include in the job.
  --python-version PYTHON_VERSION
                        Python version to use for the job (e.g. '3.9', '3.10', '3.11').
  --req-file REQ_FILE   File containing Python package requirements.
  --reqs REQS [REQS ...]
                        Python package requirements.

Example run:

Example script to run

$ datachain studio run example_query.py --env-file=env_file.txt --envs="ENV_FROM_ARGS=1" --workers=2 --files file.txt --python-version=3.12 --req-file=reqs.txt --reqs="oneliners"

Files:

run/env_file.txt:

ENV_FROM_FILE = 'environments.txt'

run/file.txt

content from file

run/reqs.txt

pyjokes

run/example_query.py

from datachain import DataChain
from os import environ
from oneliners import get_random
import pyjokes

# Define the UDF:
def path_len(path):
    if path.endswith(".json"):
        return (-1,)
    return (len(path),)

if __name__ == "__main__":
    # Run in chain
    print("Environment set from file:", environ["ENV_FROM_FILE"])
    print("Environment set from args:", environ["ENV_FROM_ARGS"])
    print("Oneliners from reqs(args):", get_random())
    print("Joke from pyjokes:(from reqs file)", pyjokes.get_joke())

    print("Content from files(args):", open("file.txt").read())

    DataChain.from_storage(
        uri="gs://datachain-demo/dogs-and-cats/",
    ).map(
        path_len,
        params=["file.path"],
        output={"path_len": int},
    ).show()

TODO:

  • Rename the argument names to better names
  • Add tests

Companion PR: https://github.com/iterative/studio/pull/10897

The options are:
positional arguments:
  query_file            The query file to run.

options:
--team TEAM           The team to run a job for. By default, it will use team from config.
  --env-file ENV_FILE   File containing environment variables to set for the job.
  --envs ENVS [ENVS ...]
                        Environment variables to set for the job.
  --workers WORKERS     Number of workers to use for the job.
  --files FILES [FILES ...]
                        Files to include in the job.
  --python-version PYTHON_VERSION
                        Python version to use for the job (e.g. '3.9', '3.10', '3.11').
  --req-file REQ_FILE   File containing Python package requirements.
  --reqs REQS [REQS ...]
                        Python package requirements.

Example run:
------------
Example script to run
```sh
$ datachain studio run example_query.py --env-file=env_file.txt --envs="ENV_FROM_ARGS=1" --workers=2 --files file.txt --python-version=3.12 --req-file=reqs.txt --reqs="oneliners"
```

Files:
------
`run/env_file.txt`:

```
ENV_FROM_FILE = 'environments.txt'

```
`run/file.txt`

```
content from file

```

`run/reqs.txt`

```
pyjokes
```

`run/example_query.py`

```py

from datachain import DataChain
from os import environ
from oneliners import get_random
import pyjokes

# Define the UDF:
def path_len(path):
    if path.endswith(".json"):
        return (-1,)
    return (len(path),)

if __name__ == "__main__":
    # Run in chain
    print("Environment set from file:", environ["ENV_FROM_FILE"])
    print("Environment set from args:", environ["ENV_FROM_ARGS"])
    print("Oneliners from reqs(args):", get_random())
    print("Joke from pyjokes:(from reqs file)", pyjokes.get_joke())

    print("Content from files(args):", open("file.txt").read())

    DataChain.from_storage(
        uri="gs://datachain-demo/dogs-and-cats/",
    ).map(
        path_len,
        params=["file.path"],
        output={"path_len": int},
    ).show()

```

TODO:

- Rename the argument names to better names
- Add tests
Copy link

cloudflare-workers-and-pages bot commented Nov 9, 2024

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: 56fb352
Status: ✅  Deploy successful!
Preview URL: https://74c7c4d3.datachain-documentation.pages.dev
Branch Preview URL: https://amrit-create-job.datachain-documentation.pages.dev

View logs

Copy link

codecov bot commented Nov 9, 2024

Codecov Report

Attention: Patch coverage is 32.75862% with 39 lines in your changes missing coverage. Please review.

Project coverage is 87.51%. Comparing base (e455180) to head (56fb352).

Files with missing lines Patch % Lines
src/datachain/studio.py 5.40% 34 Missing and 1 partial ⚠️
src/datachain/remote/studio.py 55.55% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #579      +/-   ##
==========================================
- Coverage   87.83%   87.51%   -0.32%     
==========================================
  Files         100      100              
  Lines        9993    10051      +58     
  Branches     1356     1365       +9     
==========================================
+ Hits         8777     8796      +19     
- Misses        873      911      +38     
- Partials      343      344       +1     
Flag Coverage Δ
datachain 87.45% <32.75%> (-0.32%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant