Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove terminal #150

Merged
merged 2 commits into from
May 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
515 changes: 57 additions & 458 deletions docs/concepts/catalog.md

Large diffs are not rendered by default.

53 changes: 53 additions & 0 deletions docs/concepts/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
Without any orchestrator, the simplest pipeline could be the below functions:


```python linenums="1"
def generate():
...
# write some files, data.csv
...
# return objects or simple python data types.
return x, y

def consume(x, y):
...
# read from data.csv
# do some computation with x and y


# Stich the functions together
# This is the driver pattern.
x, y = generate()
consume(x, y)
```

## Runnable representation

The same workflow in ```runnable``` would be:

```python linenums="1"
from runnable import PythonTask, pickled, catalog, Pipeline

generate_task = PythonTask(name="generate", function=generate,
returns=[pickled("x"), y],
catalog=Catalog(put=["data.csv"])

consume_task = PythonTask(name="consume", function=consume,
catalog=Catalog(get=["data.csv"])

pipeline = Pipeline(steps=[generate_task, consume_task])
pipeline.execute()

```


- ```runnable``` exposes the functions ```generate``` and ```consume``` as [tasks](task.md).
- Tasks can [access and return](parameters.md/#access_returns) parameters.
- Tasks can also share files between them using [catalog](catalog.md).
- Tasks are stitched together as [pipeline](pipeline.md)
- The execution environment is configured via # todo


## Examples

All the concepts are accompanied by [examples](https://github.com/AstraZeneca/runnable/tree/main/examples).
247 changes: 226 additions & 21 deletions docs/concepts/parameters.md
Original file line number Diff line number Diff line change
@@ -1,48 +1,253 @@
## TODO: Concretly show an example!
```parameters``` are data that can be passed from one ```task``` to another.

In runnable, ```parameters``` are python data types that can be passed from one ```task```
to the next ```task```. These parameters can be accessed by the ```task``` either as
environment variables, arguments of the ```python function``` or using the
[API](../interactions.md).
For example, in the below snippet, the parameters ```x``` and ```y``` are passed from
```generate``` to ```consume```.

## Initial parameters
```python
x, y = generate() # returns x and y as output
consume(x, y) # consumes x, y as input arguments.
```

The initial parameters of the pipeline can set by using a ```yaml``` file and presented
during execution
The data types of ```x``` and ```y``` can be:

```--parameters-file, -parameters``` while using the [runnable CLI](../usage.md/#usage)
- JSON serializable: int, string, float, list, dict including pydantic models.
- Objects: Any [dill](https://dill.readthedocs.io/en/latest/) friendly objects.

or by using ```parameters_file``` with [the sdk](..//sdk.md/#runnable.Pipeline.execute).

They can also be set using environment variables which override the parameters defined by the file.
## Compatibility

Below table summarizes the input/output types of different task types.
For ex: notebooks can only take JSON serializable parameters as input
but can return json/pydantic/objects.

| | Input | Output |
| -------- | :---------------------: | :----------------------: |
| python | json, pydantic, object via function arguments | json, pydantic, object as ```returns``` |
| notebook | json via cell tagged with ```parameters``` | json, pydantic, object as ```returns``` |
| shell | json via environment variables | json environmental variables as ```returns``` |



## Project parameters

Project parameters can be defined using a ```yaml``` file. These parameters can then be
over-ridden by tasks of the pipeline.

They can also be provided by environment variables prefixed by ```RUNNABLE_PRM_```.
Environmental variables over-ride ```yaml``` parameters.


!!! warning inline end "Type casting"

Annotating the arguments of python function ensures the right data type of arguments.

It is advised to ```cast``` the parameters in notebook tasks or shell.

=== "yaml"

Deeply nested yaml objects are supported.

```yaml
--8<-- "examples/concepts/parameters.yaml"
--8<-- "examples/common/initial_parameters.yaml"
```


=== "environment variables"

Any environment variables prefixed with ```runnable_PRM_ ``` are interpreted as
parameters by the ```tasks```.

The yaml formatted parameters can also be defined as:

```shell
export runnable_PRM_spam="hello"
export runnable_PRM_eggs='{"ham": "Yes, please!!"}'
export runnable_PRM_integer="1"
export runnable_PRM_floater="3.14"
export runnable_PRM_stringer="hello"
export runnable_PRM_pydantic_param="{'x': 10, 'foo': bar}"
export runnable_PRM_chunks="[1, 2, 3]"
```

Parameters defined by environment variables override parameters defined by
```yaml```. This can be useful to do a quick experimentation without changing code.


## Parameters flow
### Accessing parameters

=== "python"

The functions have arguments that correspond to the project parameters.

Without annotations for nested params, they are sent in as dictionary.

```python
--8<-- "examples/03-parameters/static_parameters_python.py"
```

=== "notebook & shell"

The notebook has cell tagged with ```parameters``` which are substituted at run time.

The shell script has access to them as environmental variables.

```python
--8<-- "examples/03-parameters/static_parameters_non_python.py"
```



## Access & returns

### access

The access of parameters returned by upstream tasks is similar to [project parameters](#project-parameters)


### returns

Tasks can return parameters which can then be accessed by downstream tasks.

The syntax is inspired by:

```python
def generate():
...
return x, y

def consume(x, y):
...

x, y = generate() # returns x and y as output
consume(x, y) # consumes x, y as input arguments.
```

and implemented in ```runnable``` as:

=== "sdk"

```python
from runnable import PythonTask
# The returns syntax can be used for notebook and shell scripts too.
generate_task = PythonTask(function="generate", returns=["x", "y"])
consume_task = PythonTask(function="consume")

```
=== "yaml"

```yaml
generate:
type: task
command: generate
next: consume
returns:
- name: x
- name: y
consume:
...
```

!!! warning "order of returns"

The order of ```returns``` should match the order of the python function returning them.


### marking returns as ```metric``` or ```object```

JSON style parameters can be marked as a ```metric``` in
[python functions](task.md/#python-functions), [notebook](task.md/#notebook), [shell](task.md/#shell). Metric parameters can be accessed as normal parameters in downstream steps.

Returns marked as ```pickled``` in [python functions](task.md/#python-functions), [notebook](task.md/#notebook) are serialized using ```dill```.

### Example

```python
import pandas as pd

# Assuming a function return a pandas dataframe and a score
def generate():
...
return df, score

# Downstream step consuming the df and score
def consume(df: pd.Dataframe, score: float):
...
```

=== "sdk"

```python
from runnable import metric, pickled, PythonTask

generate_task = PythonTask(function="generate",
returns=[pickled("df"), # pickle df
metric("score")]) # mark score as metric

consume_task = PythonTask(function="consume")

```

=== "yaml"

```yaml
generate:
type: task
command: generate
next: consume
returns:
- name: df
kind: object
- name: score
kind: metric
consume:
...
```


## Complete Example

=== "python"

=== "python"

```python linenums="1" hl_lines="28-34"
--8<-- "examples/03-parameters/passing_parameters_python.py"
```

=== "yaml"

```yaml linenums="1" hl_lines="25-32"
--8<-- "examples/03-parameters/passing_parameters_python.yaml"
```

=== "notebook"

To access parameters, the cell should be tagged with ```parameters```. Only
JSON style parameters can be injected in.

Any python variable defined during the execution of the notebook matching the
name in ```returns``` is inferred as a parameter. The variable can be either
JSON type or objects.

=== "python"

```python linenums="1" hl_lines="24-29"
--8<-- "examples/03-parameters/passing_parameters_notebook.py"
```

=== "yaml"

```yaml linenums="1" hl_lines="21-28"
--8<-- "examples/03-parameters/passing_parameters_notebook.yaml"
```

=== "shell"

Shell tasks can only access/return JSON style parameters

=== "python"

```python linenums="1" hl_lines="30-36"
--8<-- "examples/03-parameters/passing_parameters_shell.py"
```

=== "yaml"

Tasks can access and return parameters and the patterns are specific to the
```command_type``` of the task nodes. Please refer to [tasks](../concepts/task.md)
for more information.
```yaml linenums="1" hl_lines="26-31"
--8<-- "examples/03-parameters/passing_parameters_shell.yaml"
```
Loading
Loading