diff --git a/docs/concepts/catalog.md b/docs/concepts/catalog.md
index 384eb196..7b68d962 100644
--- a/docs/concepts/catalog.md
+++ b/docs/concepts/catalog.md
@@ -1,486 +1,85 @@
-!!! note "Opt out"
+[tasks](task.md) might also need to pass ```files``` between them.
- Pipelines need not use the ```catalog``` if they prefer other ways to transfer
- data between tasks. The default configuration of ```do-nothing``` is no-op by design.
- We kindly request to raise a feature request to make us aware of the eco-system.
+For example:
-# TODO: Simplify this
+```python linenums="1"
-Catalog provides a way to store and retrieve data generated by the individual steps of the dag to downstream
-steps of the dag. It can be any storage system that indexes its data by a unique identifier.
+def generate():
+ with open("data.csv", "w"):
+ # write content
+ ...
-For example, a local directory structure partitioned by a ```run_id``` or S3 bucket prefixed by ```run_id```.
+def consume():
+ with open("data.csv", "r"):
+ # read content
+ ...
-!!! tip inline end "Checkpoint"
+generate()
+consume()
- Cataloging happens even if the step execution eventually fails. This behavior
- can be used to recover from a failed run from a checkpoint.
-
-
-
-The directory structure within a partition is the same as the project directory structure. This enables you to
-get/put data in the catalog as if you are working with local directory structure. Every interaction with the catalog
-(either by API or configuration) results in an entry in the [```run log```](../concepts/run-log.md/#step_log)
-
-Internally, runnable also uses the catalog to store execution logs of tasks i.e stdout and stderr from
-[python](../concepts/task.md/#python) or [shell](../concepts/task.md/#shell) and executed notebook
-from [notebook tasks](../concepts/task.md/#notebook).
-
-Since the catalog captures the data files flowing through the pipeline and the execution logs, it enables you
-to debug failed pipelines or keep track of data lineage.
-
-
-
-
-!!! warning "Storage considerations"
-
- Since the data is stored per-run, it might cause the catalog to inflate.
-
- Please consider some clean up
- mechanisms to regularly prune catalog for executions that are not relevant.
-
-
-
-
-## Example
-
-
-
-=== "Configuration"
+```
- Below is a sample configuration that uses the local file system as a catalog store.
- The default location of the catalog is ```.catalog``` and is configurable.
- Every execution of the pipeline will create a sub-directory of name ```run_id``` to store the artifacts
- generated from the execution of the pipeline.
+## Runnable representation
- ```yaml
- --8<-- "examples/configs/fs-catalog.yaml"
- ```
+The same can be represented in ```runnable``` as [catalog](../reference.md/#catalog).
- 1. Use local file system as a central catalog, defaults to ```.catalog```
+For example, the above snippet would be:
-=== "python sdk"
+=== "sdk"
- In the below example, the steps ```create_content_in_data_folder``` and ```create_content_in_another_folder```
- create content for downstream steps, i.e ```retrieve_content_from_both``` to consume.
+ ```python linenums="1"
- !!! note "Delete?"
+ from runnable import PythonTask, Pipeline, Catalog
- Since we are executing in local compute and creating sub-directory ```another```, it might be mistaken that
- we are not cataloging anything. We delete ```another``` directory between steps
- to demonstrate that we indeed move files in and out of the catalog.
+ write_catalog = Catalog(put=["data.csv"])
+ read_catalog = Catalog(get=["read.csv"])
- The highlighted lines in the below example show how to specify the files to get/put from the catalog using python SDK.
+ generate_task = PythonTask(name="generate", function=generate, catalog=write_catalog)
+ consume_task = PythonTask(name="consume", function=consume, catalog=read_catalog)
- ```python linenums="1" hl_lines="44 52 68"
- --8<-- "examples/concepts/catalog.py"
+ pipeline = Pipeline(steps=[generate_task, consume_task])
+ pipeline.execute()
```
=== "yaml"
- In the below example, the steps ```data_create``` and ```another_create``` create content for
- downstream steps, i.e ```retrieve``` to consume.
-
- !!! note "Delete?"
-
- Since we are executing in local compute and creating sub-directory ```another```, it might be mistaken that
- we are not cataloging anything. We delete ```another``` directory between steps
- to demonstrate that we indeed move files in and out of the catalog.
-
- The highlighted lines in the below example show how to specify the files to get/put from the catalog using
- yaml.
-
-
- ```yaml linenums="1" hl_lines="19-21 26-28 38-40"
- --8<-- "examples/concepts/catalog.yaml"
+ ```yaml linenums="1"
+ dag:
+ start_at: generate_data
+ steps:
+ generate:
+ type: task
+ command: examples.common.functions.write_files
+ catalog:
+ put:
+ - data.csv
+ next: consume
+ consume:
+ type: task
+ command_type: python
+ command: examples.common.functions.read_files
+ catalog:
+ get:
+ - df.csv
+ - data_folder/data.txt
+ next: success
+ success:
+ type: success
+ fail:
+ type: fail
```
-!!! note "glob pattern"
-
- We use [glob pattern](https://docs.python.org/3/library/glob.html) to search for files.
-
- Note that, the pattern to recursively match all directories is ```**/*```
-
-
-The execution results in the ```catalog``` populated with the artifacts and the execution logs of the tasks.
-
-
-=== "Directory structure"
-
- The directory structure within the ```catalog``` for the execution, i.e meek-stonebraker-0626, resembles
- the project directory structure.
-
- The execution logs of all the tasks are also present in the ```catalog```.
+## Example
- ```
- >>> tree .catalog
- .catalog
- └── meek-stonebraker-0626
- ├── another
- │ └── world.txt
- ├── create_content_in_another_folder.execution.log
- ├── create_content_in_data_folder.execution.log
- ├── data
- │ └── hello.txt
- ├── delete_another_folder.execution.log
- └── retrieve_content_from_both.execution.log
+=== "sdk"
- 4 directories, 6 files
+ ```python linenums="1"
+ --8<-- "examples/04-catalog/catalog.py"
```
-=== "Run log"
-
- The run log captures the data identities of the data flowing through the catalog.
-
+=== "yaml"
- ```json linenums="1" hl_lines="38-53 84-99 169-191"
- {
- "run_id": "meek-stonebraker-0626",
- "dag_hash": "",
- "use_cached": false,
- "tag": "",
- "original_run_id": "",
- "status": "SUCCESS",
- "steps": {
- "create_content_in_data_folder": {
- "name": "create_content_in_data_folder",
- "internal_name": "create_content_in_data_folder",
- "status": "SUCCESS",
- "step_type": "task",
- "message": "",
- "mock": false,
- "code_identities": [
- {
- "code_identifier": "6029841c3737fe1163e700b4324d22a469993bb0",
- "code_identifier_type": "git",
- "code_identifier_dependable": true,
- "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
- "code_identifier_message": ""
- }
- ],
- "attempts": [
- {
- "attempt_number": 1,
- "start_time": "2024-01-06 06:26:56.279278",
- "end_time": "2024-01-06 06:26:56.284564",
- "duration": "0:00:00.005286",
- "status": "SUCCESS",
- "message": "",
- "parameters": {}
- }
- ],
- "user_defined_metrics": {},
- "branches": {},
- "data_catalog": [
- {
- "name": "create_content_in_data_folder.execution.log",
- "data_hash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
- "catalog_relative_path": "meek-stonebraker-0626/create_content_in_data_folder.execution.log",
- "catalog_handler_location": ".catalog",
- "stage": "put"
- },
- {
- "name": "data/hello.txt",
- "data_hash": "6ccad99847c78bfdc7a459399c9957893675d4fec2d675cec750b50ab4842542",
- "catalog_relative_path": "meek-stonebraker-0626/data/hello.txt",
- "catalog_handler_location": ".catalog",
- "stage": "put"
- }
- ]
- },
- "create_content_in_another_folder": {
- "name": "create_content_in_another_folder",
- "internal_name": "create_content_in_another_folder",
- "status": "SUCCESS",
- "step_type": "task",
- "message": "",
- "mock": false,
- "code_identities": [
- {
- "code_identifier": "6029841c3737fe1163e700b4324d22a469993bb0",
- "code_identifier_type": "git",
- "code_identifier_dependable": true,
- "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
- "code_identifier_message": ""
- }
- ],
- "attempts": [
- {
- "attempt_number": 1,
- "start_time": "2024-01-06 06:26:56.353734",
- "end_time": "2024-01-06 06:26:56.357519",
- "duration": "0:00:00.003785",
- "status": "SUCCESS",
- "message": "",
- "parameters": {}
- }
- ],
- "user_defined_metrics": {},
- "branches": {},
- "data_catalog": [
- {
- "name": "create_content_in_another_folder.execution.log",
- "data_hash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
- "catalog_relative_path": "meek-stonebraker-0626/create_content_in_another_folder.execution.log",
- "catalog_handler_location": ".catalog",
- "stage": "put"
- },
- {
- "name": "another/world.txt",
- "data_hash": "869ae2ac8365d5353250fc502b084a28b2029f951ea7da0a6948f82172accdfd",
- "catalog_relative_path": "meek-stonebraker-0626/another/world.txt",
- "catalog_handler_location": ".catalog",
- "stage": "put"
- }
- ]
- },
- "delete_another_folder": {
- "name": "delete_another_folder",
- "internal_name": "delete_another_folder",
- "status": "SUCCESS",
- "step_type": "task",
- "message": "",
- "mock": false,
- "code_identities": [
- {
- "code_identifier": "6029841c3737fe1163e700b4324d22a469993bb0",
- "code_identifier_type": "git",
- "code_identifier_dependable": true,
- "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
- "code_identifier_message": ""
- }
- ],
- "attempts": [
- {
- "attempt_number": 1,
- "start_time": "2024-01-06 06:26:56.428437",
- "end_time": "2024-01-06 06:26:56.450148",
- "duration": "0:00:00.021711",
- "status": "SUCCESS",
- "message": "",
- "parameters": {}
- }
- ],
- "user_defined_metrics": {},
- "branches": {},
- "data_catalog": [
- {
- "name": "delete_another_folder.execution.log",
- "data_hash": "a9b49c92ed63cb54a8b02c0271a925d9fac254034ed45df83f3ff24c0bd53ef6",
- "catalog_relative_path": "meek-stonebraker-0626/delete_another_folder.execution.log",
- "catalog_handler_location": ".catalog",
- "stage": "put"
- }
- ]
- },
- "retrieve_content_from_both": {
- "name": "retrieve_content_from_both",
- "internal_name": "retrieve_content_from_both",
- "status": "SUCCESS",
- "step_type": "task",
- "message": "",
- "mock": false,
- "code_identities": [
- {
- "code_identifier": "6029841c3737fe1163e700b4324d22a469993bb0",
- "code_identifier_type": "git",
- "code_identifier_dependable": true,
- "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
- "code_identifier_message": ""
- }
- ],
- "attempts": [
- {
- "attempt_number": 1,
- "start_time": "2024-01-06 06:26:56.520948",
- "end_time": "2024-01-06 06:26:56.530135",
- "duration": "0:00:00.009187",
- "status": "SUCCESS",
- "message": "",
- "parameters": {}
- }
- ],
- "user_defined_metrics": {},
- "branches": {},
- "data_catalog": [
- {
- "name": "data/hello.txt",
- "data_hash": "6ccad99847c78bfdc7a459399c9957893675d4fec2d675cec750b50ab4842542",
- "catalog_relative_path": "data/hello.txt",
- "catalog_handler_location": ".catalog",
- "stage": "get"
- },
- {
- "name": "another/world.txt",
- "data_hash": "869ae2ac8365d5353250fc502b084a28b2029f951ea7da0a6948f82172accdfd",
- "catalog_relative_path": "another/world.txt",
- "catalog_handler_location": ".catalog",
- "stage": "get"
- },
- {
- "name": "retrieve_content_from_both.execution.log",
- "data_hash": "0a085cb15df6c70c5859b44cc62bfdc98383600ba4f2983124375a4f64f1ae83",
- "catalog_relative_path": "meek-stonebraker-0626/retrieve_content_from_both.execution.log",
- "catalog_handler_location": ".catalog",
- "stage": "put"
- }
- ]
- },
- "success": {
- "name": "success",
- "internal_name": "success",
- "status": "SUCCESS",
- "step_type": "success",
- "message": "",
- "mock": false,
- "code_identities": [
- {
- "code_identifier": "6029841c3737fe1163e700b4324d22a469993bb0",
- "code_identifier_type": "git",
- "code_identifier_dependable": true,
- "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
- "code_identifier_message": ""
- }
- ],
- "attempts": [
- {
- "attempt_number": 1,
- "start_time": "2024-01-06 06:26:56.591948",
- "end_time": "2024-01-06 06:26:56.592032",
- "duration": "0:00:00.000084",
- "status": "SUCCESS",
- "message": "",
- "parameters": {}
- }
- ],
- "user_defined_metrics": {},
- "branches": {},
- "data_catalog": []
- }
- },
- "parameters": {},
- "run_config": {
- "executor": {
- "service_name": "local",
- "service_type": "executor",
- "enable_parallel": false,
- "placeholders": {}
- },
- "run_log_store": {
- "service_name": "buffered",
- "service_type": "run_log_store"
- },
- "secrets_handler": {
- "service_name": "do-nothing",
- "service_type": "secrets"
- },
- "catalog_handler": {
- "service_name": "file-system",
- "service_type": "catalog"
- },
- "experiment_tracker": {
- "service_name": "do-nothing",
- "service_type": "experiment_tracker"
- },
- "pipeline_file": "",
- "parameters_file": "",
- "configuration_file": "examples/configs/fs-catalog.yaml",
- "tag": "",
- "run_id": "meek-stonebraker-0626",
- "variables": {},
- "use_cached": false,
- "original_run_id": "",
- "dag": {
- "start_at": "create_content_in_data_folder",
- "name": "",
- "description": "",
- "internal_branch_name": "",
- "steps": {
- "create_content_in_data_folder": {
- "type": "task",
- "name": "create_content_in_data_folder",
- "internal_name": "create_content_in_data_folder",
- "internal_branch_name": "",
- "is_composite": false
- },
- "create_content_in_another_folder": {
- "type": "task",
- "name": "create_content_in_another_folder",
- "internal_name": "create_content_in_another_folder",
- "internal_branch_name": "",
- "is_composite": false
- },
- "retrieve_content_from_both": {
- "type": "task",
- "name": "retrieve_content_from_both",
- "internal_name": "retrieve_content_from_both",
- "internal_branch_name": "",
- "is_composite": false
- },
- "delete_another_folder": {
- "type": "task",
- "name": "delete_another_folder",
- "internal_name": "delete_another_folder",
- "internal_branch_name": "",
- "is_composite": false
- },
- "success": {
- "type": "success",
- "name": "success",
- "internal_name": "success",
- "internal_branch_name": "",
- "is_composite": false
- },
- "fail": {
- "type": "fail",
- "name": "fail",
- "internal_name": "fail",
- "internal_branch_name": "",
- "is_composite": false
- }
- }
- },
- "dag_hash": "",
- "execution_plan": "chained"
- }
- }
+ ```yaml linenums="1"
+ --8<-- "examples/04-catalog/catalog.yaml"
```
-
-
-
-## Using python API
-
-Files could also be cataloged using [python API](../interactions.md)
-
-
-This functionality is possible in [python](../concepts/task.md/#python_functions)
-and [notebook](../concepts/task.md/#notebook) tasks.
-
-```python linenums="1" hl_lines="11 23 35 45"
---8<-- "examples/concepts/catalog_api.py"
-```
-
-
-
-
-## Passing Data Objects
-
-Data objects can be shared between [python](../concepts/task.md/#python_functions) or
-[notebook](../concepts/task.md/#notebook) tasks,
-instead of serializing data and deserializing to file structure, using
-[get_object](../interactions.md/#runnable.get_object) and [put_object](../interactions.md/#runnable.put_object).
-
-Internally, we use [pickle](https:/docs.python.org/3/library/pickle.html) to serialize and
-deserialize python objects. Please ensure that the object can be serialized via pickle.
-
-### Example
-
-In the below example, the step ```put_data_object``` puts a pydantic object into the catalog while the step
-```retrieve_object``` retrieves the pydantic object from the catalog and prints it.
-
-You can run this example by ```python run examples/concepts/catalog_object.py```
-
-```python linenums="1" hl_lines="10 30 38"
---8<-- "examples/concepts/catalog_object.py"
-```
diff --git a/docs/concepts/index.md b/docs/concepts/index.md
new file mode 100644
index 00000000..18f9eeb0
--- /dev/null
+++ b/docs/concepts/index.md
@@ -0,0 +1,53 @@
+Without any orchestrator, the simplest pipeline could be the below functions:
+
+
+```python linenums="1"
+def generate():
+ ...
+ # write some files, data.csv
+ ...
+ # return objects or simple python data types.
+ return x, y
+
+def consume(x, y):
+ ...
+ # read from data.csv
+ # do some computation with x and y
+
+
+# Stich the functions together
+# This is the driver pattern.
+x, y = generate()
+consume(x, y)
+```
+
+## Runnable representation
+
+The same workflow in ```runnable``` would be:
+
+```python linenums="1"
+from runnable import PythonTask, pickled, catalog, Pipeline
+
+generate_task = PythonTask(name="generate", function=generate,
+ returns=[pickled("x"), y],
+ catalog=Catalog(put=["data.csv"])
+
+consume_task = PythonTask(name="consume", function=consume,
+ catalog=Catalog(get=["data.csv"])
+
+pipeline = Pipeline(steps=[generate_task, consume_task])
+pipeline.execute()
+
+```
+
+
+- ```runnable``` exposes the functions ```generate``` and ```consume``` as [tasks](task.md).
+- Tasks can [access and return](parameters.md/#access_returns) parameters.
+- Tasks can also share files between them using [catalog](catalog.md).
+- Tasks are stitched together as [pipeline](pipeline.md)
+- The execution environment is configured via # todo
+
+
+## Examples
+
+All the concepts are accompanied by [examples](https://github.com/AstraZeneca/runnable/tree/main/examples).
diff --git a/docs/concepts/parameters.md b/docs/concepts/parameters.md
index 1de3b83e..36af9ec3 100644
--- a/docs/concepts/parameters.md
+++ b/docs/concepts/parameters.md
@@ -1,48 +1,253 @@
-## TODO: Concretly show an example!
+```parameters``` are data that can be passed from one ```task``` to another.
-In runnable, ```parameters``` are python data types that can be passed from one ```task```
-to the next ```task```. These parameters can be accessed by the ```task``` either as
-environment variables, arguments of the ```python function``` or using the
-[API](../interactions.md).
+For example, in the below snippet, the parameters ```x``` and ```y``` are passed from
+```generate``` to ```consume```.
-## Initial parameters
+```python
+x, y = generate() # returns x and y as output
+consume(x, y) # consumes x, y as input arguments.
+```
-The initial parameters of the pipeline can set by using a ```yaml``` file and presented
-during execution
+The data types of ```x``` and ```y``` can be:
-```--parameters-file, -parameters``` while using the [runnable CLI](../usage.md/#usage)
+- JSON serializable: int, string, float, list, dict including pydantic models.
+- Objects: Any [dill](https://dill.readthedocs.io/en/latest/) friendly objects.
-or by using ```parameters_file``` with [the sdk](..//sdk.md/#runnable.Pipeline.execute).
-They can also be set using environment variables which override the parameters defined by the file.
+## Compatibility
+
+Below table summarizes the input/output types of different task types.
+For ex: notebooks can only take JSON serializable parameters as input
+but can return json/pydantic/objects.
+
+| | Input | Output |
+| -------- | :---------------------: | :----------------------: |
+| python | json, pydantic, object via function arguments | json, pydantic, object as ```returns``` |
+| notebook | json via cell tagged with ```parameters``` | json, pydantic, object as ```returns``` |
+| shell | json via environment variables | json environmental variables as ```returns``` |
+
+
+
+## Project parameters
+
+Project parameters can be defined using a ```yaml``` file. These parameters can then be
+over-ridden by tasks of the pipeline.
+
+They can also be provided by environment variables prefixed by ```RUNNABLE_PRM_```.
+Environmental variables over-ride ```yaml``` parameters.
+
+
+!!! warning inline end "Type casting"
+
+ Annotating the arguments of python function ensures the right data type of arguments.
+
+ It is advised to ```cast``` the parameters in notebook tasks or shell.
=== "yaml"
Deeply nested yaml objects are supported.
```yaml
- --8<-- "examples/concepts/parameters.yaml"
+ --8<-- "examples/common/initial_parameters.yaml"
```
=== "environment variables"
- Any environment variables prefixed with ```runnable_PRM_ ``` are interpreted as
- parameters by the ```tasks```.
-
The yaml formatted parameters can also be defined as:
```shell
- export runnable_PRM_spam="hello"
- export runnable_PRM_eggs='{"ham": "Yes, please!!"}'
+ export runnable_PRM_integer="1"
+ export runnable_PRM_floater="3.14"
+ export runnable_PRM_stringer="hello"
+ export runnable_PRM_pydantic_param="{'x': 10, 'foo': bar}"
+ export runnable_PRM_chunks="[1, 2, 3]"
```
Parameters defined by environment variables override parameters defined by
```yaml```. This can be useful to do a quick experimentation without changing code.
-## Parameters flow
+### Accessing parameters
+
+=== "python"
+
+ The functions have arguments that correspond to the project parameters.
+
+ Without annotations for nested params, they are sent in as dictionary.
+
+ ```python
+ --8<-- "examples/03-parameters/static_parameters_python.py"
+ ```
+
+=== "notebook & shell"
+
+ The notebook has cell tagged with ```parameters``` which are substituted at run time.
+
+ The shell script has access to them as environmental variables.
+
+ ```python
+ --8<-- "examples/03-parameters/static_parameters_non_python.py"
+ ```
+
+
+
+## Access & returns
+
+### access
+
+The access of parameters returned by upstream tasks is similar to [project parameters](#project-parameters)
+
+
+### returns
+
+Tasks can return parameters which can then be accessed by downstream tasks.
+
+The syntax is inspired by:
+
+```python
+def generate():
+ ...
+ return x, y
+
+def consume(x, y):
+ ...
+
+x, y = generate() # returns x and y as output
+consume(x, y) # consumes x, y as input arguments.
+```
+
+and implemented in ```runnable``` as:
+
+=== "sdk"
+
+ ```python
+ from runnable import PythonTask
+ # The returns syntax can be used for notebook and shell scripts too.
+ generate_task = PythonTask(function="generate", returns=["x", "y"])
+ consume_task = PythonTask(function="consume")
+
+ ```
+=== "yaml"
+
+ ```yaml
+ generate:
+ type: task
+ command: generate
+ next: consume
+ returns:
+ - name: x
+ - name: y
+ consume:
+ ...
+ ```
+
+!!! warning "order of returns"
+
+ The order of ```returns``` should match the order of the python function returning them.
+
+
+### marking returns as ```metric``` or ```object```
+
+JSON style parameters can be marked as a ```metric``` in
+[python functions](task.md/#python-functions), [notebook](task.md/#notebook), [shell](task.md/#shell). Metric parameters can be accessed as normal parameters in downstream steps.
+
+Returns marked as ```pickled``` in [python functions](task.md/#python-functions), [notebook](task.md/#notebook) are serialized using ```dill```.
+
+### Example
+
+```python
+import pandas as pd
+
+# Assuming a function return a pandas dataframe and a score
+def generate():
+ ...
+ return df, score
+
+# Downstream step consuming the df and score
+def consume(df: pd.Dataframe, score: float):
+ ...
+```
+
+=== "sdk"
+
+ ```python
+ from runnable import metric, pickled, PythonTask
+
+ generate_task = PythonTask(function="generate",
+ returns=[pickled("df"), # pickle df
+ metric("score")]) # mark score as metric
+
+ consume_task = PythonTask(function="consume")
+
+ ```
+
+=== "yaml"
+
+ ```yaml
+ generate:
+ type: task
+ command: generate
+ next: consume
+ returns:
+ - name: df
+ kind: object
+ - name: score
+ kind: metric
+ consume:
+ ...
+ ```
+
+
+## Complete Example
+
+=== "python"
+
+ === "python"
+
+ ```python linenums="1" hl_lines="28-34"
+ --8<-- "examples/03-parameters/passing_parameters_python.py"
+ ```
+
+ === "yaml"
+
+ ```yaml linenums="1" hl_lines="25-32"
+ --8<-- "examples/03-parameters/passing_parameters_python.yaml"
+ ```
+
+=== "notebook"
+
+ To access parameters, the cell should be tagged with ```parameters```. Only
+ JSON style parameters can be injected in.
+
+ Any python variable defined during the execution of the notebook matching the
+ name in ```returns``` is inferred as a parameter. The variable can be either
+ JSON type or objects.
+
+ === "python"
+
+ ```python linenums="1" hl_lines="24-29"
+ --8<-- "examples/03-parameters/passing_parameters_notebook.py"
+ ```
+
+ === "yaml"
+
+ ```yaml linenums="1" hl_lines="21-28"
+ --8<-- "examples/03-parameters/passing_parameters_notebook.yaml"
+ ```
+
+=== "shell"
+
+ Shell tasks can only access/return JSON style parameters
+
+ === "python"
+
+ ```python linenums="1" hl_lines="30-36"
+ --8<-- "examples/03-parameters/passing_parameters_shell.py"
+ ```
+
+ === "yaml"
-Tasks can access and return parameters and the patterns are specific to the
-```command_type``` of the task nodes. Please refer to [tasks](../concepts/task.md)
-for more information.
+ ```yaml linenums="1" hl_lines="26-31"
+ --8<-- "examples/03-parameters/passing_parameters_shell.yaml"
+ ```
diff --git a/docs/concepts/pipeline.md b/docs/concepts/pipeline.md
index 5398eaaa..685582cc 100644
--- a/docs/concepts/pipeline.md
+++ b/docs/concepts/pipeline.md
@@ -1,231 +1,171 @@
-???+ tip inline end "Steps"
-
- In runnable, a step can be a simple ```task``` or ```stub``` or complex nested pipelines like
- ```parallel``` branches, embedded ```dags``` or dynamic workflows.
-
- In this section, we use ```stub``` for convenience. For more in depth information about other types,
- please see the relevant section.
-
-
In **runnable**, we use the words
-- ```dag```, ```workflows``` and ```pipeline``` interchangeably.
+- ```workflows``` and ```pipeline``` interchangeably.
- ```nodes```, ```steps``` interchangeably.
+A ```workflow``` is a sequence of ```steps``` to perform.
-Dag or directed acyclic graphs are a way to define your pipelines.
-Its a graph representation of the list of tasks you want to perform and the order of it.
-
-
-
-
-
-## Example
-Below is an example pipeline.
+!!! info "Composite pipelines"
-
-=== "yaml"
-
- ``` yaml linenums="1"
- --8<-- "examples/concepts/traversal.yaml"
- ```
-
-
-=== "python"
-
- ``` python linenums="1"
- --8<-- "examples/concepts/traversal.py"
- ```
+ ```runnable``` pipelines are composable. For example, a pipeline can have
+ a parallel node which in itself has many pipelines running in parallel.
-A closer look at the example:
+A visual example of a workflow:
-## start_at
-
-- [x] start_at step is the starting node of the traversal.
+```mermaid
+stateDiagram-v2
+ direction lr
+ state "hello stub" as start_at
+ state "hello python" as step_2
+ state "hello notebook" as step_3
+ state "hello shell" as step_4
+ state "Success" as success
-=== "yaml"
+ [*] --> start_at
+ start_at --> step_2 : #9989;
+ step_2 --> step_3 : #9989;
+ step_3 --> step_4 : #9989;
+ step_4 --> success : #9989;
+ success --> [*]
+```
- The value should be valid key in ```steps```
+???+ abstract "Traversal"
- ```yaml linenums="10" hl_lines="1"
- --8<-- "examples/concepts/traversal.yaml:10:12"
- ```
+ Start at ```hello stub```.
-=== "python"
+ If it is successful, go to ```next``` step of the pipeline until we reach the success state.
- The node should be part of ```steps```
+ Any failure in execution of step would, by default, go to the ```fail``` state.
- ```python linenums="32" hl_lines="3"
- --8<-- "examples/concepts/traversal.py:32:36"
- ```
-By using a ```parallel``` node as starting node, you can get the behavior of multi-root graph.
-
+## Syntax
-## Steps
+The above pipeline can be written in runnable as below. It is a mixed bag of
+[python functions](task.md/#python-functions), [notebook](task.md/#notebook), [shell](task.md/#shell)
+and [stub](task.md/#stub).
-- [x] Apart from the terminal nodes (```success``` and ```fail```), the pipeline should have at least
-one more node.
+[API Documentation](../reference.md/#pipeline)
+=== "sdk"
-???+ warning inline end "Step names"
+ ```python linenums="1"
+ --8<-- "examples/02-sequential/traversal.py"
+ ```
- In runnable, the names of steps should not have ```%``` or ```.``` in them.
+ 1. Start the pipeline.
+ 2. The order of the steps is the execution order
- You can name them as descriptive as you want.
+ - [x] The first step of the ```steps``` is the start of the workflow.
+ - [x] The order of execution follows the order of the tasks in the list.
+ - [x] The terminal nodes ```success``` and ```fail``` are added automatically.
=== "yaml"
- ```yaml linenums="12"
- --8<-- "examples/concepts/traversal.yaml:12:21"
+ ```yaml linenums="1"
+ --8<-- "examples/02-sequential/traversal.yaml"
```
-=== "python"
+ 1. Start the pipeline at this step.
+ 2. State the ```next``` node, if it succeeds.
+ 3. Add the success and fail nodes.
- ```python linenums="14" hl_lines="1-6 19-23"
- --8<-- "examples/concepts/traversal.py:14:36"
- ```
+ - [x] The first step is the step corresponding to ```start_at```
+ - [x] The mapping defined in the steps.
+ - [x] The ```next``` step after a successful execution of a ```step```.
+ - [x] Needs explicit definition of ```success``` and ```fail``` nodes.
-## Linking
-
-- [x] All nodes except for ```success``` and ```fail``` nodes need to have a ```next```
-step to execute upon successful execution.
+## on failure
+By default, any failure during the execution of step will traverse to ```fail``` node
+marking the execution as failed.
+The ```fail``` node is implicitly added to the pipeline in python SDK while it
+has to be stated in the yaml.
-Visually, the above pipeline can be seen as:
-???+ abstract inline end "Traversal"
+This behavior can be over-ridden to follow a different path based on expected failures.
- Start at step1.
+### on failure success
- If it is successful, go to ```next``` step of the pipeline until we reach the success state.
- Any failure in execution of step would, by default, go to the fail state.
+```step 1``` fails as the function raises an exception.
+```step 4``` is an alternate node to a successful execution.
+```step 4``` is the step to execution in case of the failure.
+=== "pseudo code"
-```mermaid
-stateDiagram-v2
- state "Start at step 1" as start_at
- state "step 2" as step_2
- state "step 3" as step_3
- state "Success" as success
- state "Fail" as fail
-
-
- [*] --> start_at
- start_at --> step_2 : #9989;
- step_2 --> step_3 : #9989;
- step_3 --> success : #9989;
- start_at --> fail: #10060;
- step_2--> fail: #10060;
- step_3--> fail: #10060;
- success --> [*]
- fail --> [*]
-```
-
+ ```python
-=== "yaml"
+ try:
+ raise_exception()
+ except:
+ # suppress exception
+ do_something()
- ```yaml linenums="15" hl_lines="4 7 10"
- --8<-- "examples/concepts/traversal.yaml:12:21"
```
-=== "python"
+=== "sdk"
-
- ```python linenums="14" hl_lines="7-17"
- --8<-- "examples/concepts/traversal.py:14:36"
+ ```python linenums="1" hl_lines="24 29 34 31"
+ --8<-- "examples/02-sequential/on_failure_succeed.py"
```
+ 1. ```terminate_with_success``` is ```true``` traverses to success node.
-### on failure
-
-By default, any failure during the execution of step will traverse to ```fail``` node
-marking the execution as failed. You can override this behavior by using ```on_failure```
=== "yaml"
- ```yaml hl_lines="21"
- --8<-- "examples/on-failure.yaml"
+ ```yaml linenums="1" hl_lines="23 25 32-34"
+ --8<-- "examples/02-sequential/on_failure_succeed.yaml"
```
-=== "python"
-
- ```python hl_lines="10"
- --8<-- "examples/on_failure.py"
- ```
-
-=== "traversal"
-
- ```mermaid
- stateDiagram-v2
- state "Start at step 1" as start_at
- state "step 2" as step_2
- state "step 3" as step_3
- state "Success" as success
+### On failure fail
- [*] --> start_at
- start_at --> step_2 : #10060;
- start_at --> step_3 : #9989;
- step_3 --> success : #9989;
- success --> [*]
- ```
-
-
-
+```step 1``` fails as the function raises an exception.
+```step 4``` is an alternate node to a successful execution.
-## Terminating
-- [x] All pipelines should have one and only one Success and Fail state
+```step 4``` is the step to execution in case of the failure.
-Reaching one of these states as part of traversal indicates the status of the pipeline.
+=== "pseudo code"
-=== "yaml"
-
- The type determines the node to be a ```success``` or ``fail`` state.
+ ```python
- The name can be anything that you prefer.
+ try:
+ raise_exception()
+ except:
+ # raise exception after doing something.
+ do_something()
+ raise
- ``` yaml linenums="1"
- --8<-- "examples/concepts/traversal.yaml:22:25"
```
-=== "python"
-
- Setting ```add_terminal_nodes``` to be ```true``` during pipeline creation adds
- ```success``` and ```fail``` states with the names success and fail.
+=== "sdk"
- ``` python linenums="1" hl_lines="4"
- --8<-- "examples/concepts/traversal.py:31:35"
+ ```python linenums="1" hl_lines="24 29 34 31"
+ --8<-- "examples/02-sequential/on_failure_fail.py"
```
- Individual steps can link
+ 1. ```terminate_with_failure``` is ```true``` traverses to fail node.
- - success state by setting ```terminate_with_success``` to ```True```
- - fail state by setting ```terminate_with_fail``` to ```True```
- You can, alternatively, create a ```success``` and ```fail``` state and link them together.
-
- ```python
- from runnable import Success, Fail
-
- success = Success(name="Custom Success")
- fail = Fail(name="Custom Failure")
+=== "yaml"
+ ```yaml linenums="1" hl_lines="23 25 32-34"
+ --8<-- "examples/02-sequential/on_failure_fail.yaml"
```
diff --git a/docs/concepts/stub.md b/docs/concepts/stub.md
deleted file mode 100644
index f7e83b30..00000000
--- a/docs/concepts/stub.md
+++ /dev/null
@@ -1,41 +0,0 @@
-Stub nodes in runnable are just like
-[```Pass``` state](https://docs.aws.amazon.com/step-functions/latest/dg/amazon-states-language-pass-state.html)
-in AWS Step Functions or ```pass``` in python code. It is a placeholder and useful when you want to debug or
-design your pipeline.
-
-Stub nodes can take arbitrary number of parameters and is always a success.
-
-## Example
-
-!!! note annotate inline end "Intuition"
-
- Designing a pipeline is similar to writing a modular program. Stub nodes are handy to create a placeholder
- for some step that will be implemented in the future.
-
- During debugging, changing a node to ```stub``` will let you focus on the actual bug without having to
- execute the additional steps.
-
-
-=== "yaml"
-
- In the below example, all the steps are ```stub``` nodes. The only required field is
- the ```next``` which is needed for graph traversal. As seen in ```step 2``` definition,
- they can have arbitrary fields.
-
-
- ``` yaml hl_lines="20-24"
- --8<-- "examples/mocking.yaml"
- ```
-
-=== "python"
-
- In the below example, all the steps are ```stub``` nodes.
-
- ``` python hl_lines="21-24"
- --8<-- "examples/mocking.py"
- ```
-
-The only required field is the ```name```, ```next``` which is needed for graph traversal.
-
-- yaml definition needs ```next``` to be defined as part of the step definition.
-- python SDK can define the ```next``` when linking the nodes as part of the pipeline.
diff --git a/docs/concepts/task.md b/docs/concepts/task.md
index ced25339..6a64511b 100644
--- a/docs/concepts/task.md
+++ b/docs/concepts/task.md
@@ -1,522 +1,144 @@
Task nodes are the execution units of the pipeline.
-In runnable, a ```command``` in a task node can be [python functions](#python_functions),
-[Jupyter notebooks](#notebook) or a [shell scripts](#shell).
-All task nodes can take arguments, retrieve and create files/objects and return
-arguments, though their access patterns are different.
+They can be [python functions](#python_functions), [notebooks](#notebook),
+[shell scripts](#shell) or [stubs](#stub)
-
-In the below examples, we define a pipeline either using python SDK or yaml format but both are equivalent
-and all the pipelines can be expressed in either formats.
+In the below examples, highlighted lines of the code are the relevant bits while
+the rest of the python code (or yaml) defines and executes a pipeline that executes
+the python function/notebook/shell script/stubs.
---
## Python functions
-Python is the default ```command type``` of a task node. The ```command```
-should be the dotted path to the python function.
-
-!!! example "Dotted path"
-
- Assuming the below project structure:
-
- - The ```command``` for the ```outer_function``` should be ```outer_functions.outer_function```
-
- - The ```command``` for ```inner_function``` should be ```module_inner.inner_functions.inner_function```
-
-
- ```
- ..
- ├── outer_functions.py
- │ ├── outer_function()
- ├── module_inner
- │ ├── inner_functions.py
- │ | ├── inner_function()
- ..
-
- ```
+[API Documentation](../reference.md/#pythontask)
### Example
-
-=== "python"
+=== "sdk"
!!! tip inline end "Structuring"
It is best to keep the application specific functions in a different module
than the pipeline definition, if you are using Python SDK.
- In this example, we combined them as one module for convenience.
-
- You can execute this pipeline using ```examples/concepts/simple.py```
-
- ```python linenums="1" hl_lines="4-8"
- --8<-- "examples/concepts/simple.py"
- ```
-
-=== "yaml"
-
- You can execute this by runnable execute -f examples/concepts/simple.yaml
-
- ```yaml linenums="1"
- --8<-- "examples/concepts/simple.yaml"
- ```
-
-
-### Closer look
-
-
-Lines 4-8 in the python code defines the function that we want to execute as
- part of the pipeline. They are *plain old python functions*.
-
-The rest of the python code (or yaml) defines and executes a pipeline that executes a task whose ```command```
-is to execute this function.
-
-
-### Fields
-
-- ```command``` : Should refer to the function in [dotted path notation](#python_functions).
-- ```command_type```: Defaults to python and not needed for python task types.
-- [next](pipeline.md/#linking): is required for any step of the pipeline except for success and fail steps.
-- [on_failure](pipeline.md/#on_failure): Name of the step to execute if the step fails.
-- catalog: Optional required for data access patterns from/to the central storage.
-
-
-### Accessing parameters
-
-!!! tip "Mutability"
-
- Functions mutating the input parameters is idiomatic is python. However, functions as part of runnable
- pipeline should return the mutated parameters for downstream steps to have access to them.
-
- For example, unless the function ```mutating_function``` returns the updated parameters, runnable will
- not know about the change.
-
-
- ```python
- d = {"name": "monty"}
- print(d)
- ">>> {'name': 'monty'}"
-
- def mutating_function(input_dict):
- input_dict["name"] = "python"
-
-
- mutating_function(d)
- print(d)
- ">>>{'name': 'python'}"
+ ```python linenums="1" hl_lines="29-33"
+ --8<-- "examples/01-tasks/python_tasks.py"
```
+
+=== "yaml"
-Please refer to [Initial Parameters](parameters.md/#initial_parameters) for more information about setting
-initial parameters.
-
-Lets assume that the initial parameters are:
-
-```yaml
---8<-- "examples/concepts/parameters.yaml"
-```
-
-- [x] Passing parameters between steps
-
-
-=== "Natively"
-
- Internally, runnable stores the parameters in serialised json format.
-
- ### ^^Input arguments to the function^^
-
- Any arguments passed into the function should be at the root level of the json object.
- Arguments with type annotations will be casted appropriately.
- Arguments with no type annotation will be sent in as ```dict```.
-
- In the below example, in line 13 and 28, arguments ```spam``` and ```eggs``` are at the root level in
- the yaml representation and also are annotated in the function signature. They are sent in to the function
- as arguments with proper type conversion.
-
- !!! warning "Annotation"
-
- Without annotations, runnable cannot determine the type and can cause unexpected behavior.
-
- This is especially true in distributed executors (eg: argo workflows).
-
-
- ### ^^Output arguments of function^^
-
- Only pydantic models are allowed to be return types of a function. There is no need
- for any type annotation for return type but is advised for a cleaner code.
-
- Output arguments are stored in json format by
- [model_dump](https://docs.pydantic.dev/latest/concepts/serialization/#modelmodel_dump),
- respecting the alias.
-
- The model structure of the pydantic model would be added to the root structure. This is
- useful when you want to add or modify parameters at the root level. For example, line 25
- would update all the initial parameters.
-
- To update a subset of existing parameters at the root level, you can either create a new model or
- use [DynamicModel](https://docs.pydantic.dev/latest/concepts/models/#dynamic-model-creation).
- For example, lines 42-45 create a dynamic model to update the ```eggs``` parameter.
-
-
- !!! warning "caution"
-
- Returning "eggs" in line 42 would result in a new parameter "ham" at the root level
- as it looses the nested structure.
-
-
- You can run this example using: ```python run examples/concepts/task_native_parameters.py```
-
- ```python linenums="1"
- --8<-- "examples/concepts/task_native_parameters.py"
- ```
-
-
-=== "Using the API"
-
- runnable also has [python API](../interactions.md) to access parameters.
-
- Use [get_parameter](../interactions.md/#runnable.get_parameter) to access a parameter at the root level.
- You can optionally specify the ```type``` by using ```cast_as``` argument to the API.
- For example, line 19 would cast ```eggs```parameter into ```EggsModel```.
- Native python types do not need any explicit ```cast_as``` argument.
-
- Use [set_parameter](../interactions.md/#runnable.set_parameter) to set parameters at the root level.
- Multiple parameters can be set at the same time, for example, line 26 would set both the ```spam```
- and ```eggs``` in a single call.
-
- The pydantic models would be serialised to json format using
- [model_dump](https://docs.pydantic.dev/latest/concepts/serialization/#modelmodel_dump), respecting the alias.
-
-
- You can run this example by: ```python run examples/concepts/task_api_parameters.py```
-
- ```python linenums="1"
- --8<-- "examples/concepts/task_api_parameters.py"
- ```
+ !!! example "Dotted path"
-=== "Using environment variables"
+ Assuming the below project structure:
- Any environment variable with ```runnable_PRM_``` is understood to be a parameter in runnable.
+ - The ```command``` for the ```outer_function``` should be ```outer_functions.outer_function```
- Before the execution of the ```command```, all the parameters at the root level are set as environment variables
- with the key prefixed by ```runnable_PRM_```. Python functions that are called during the execution of the command
- can also access them as environment variables.
+ - The ```command``` for ```inner_function``` should be ```module_inner.inner_functions.inner_function```
- After the execution of the ```command```, the environment is "scanned" again to identify changes to the existing
- variables prefixed by ```runnable_PRM_```. All updated variables are stored at the root level.
- Parameters set by environment variables over-ride the parameters defined by the initial parameters which can be
- handy to quickly experiment without modifying code or to dynamically adjust behavior when running in
- orchestrators like Argo or AWS step functions.
+ ```
+ ..
+ ├── outer_functions.py
+ │ ├── outer_function()
+ ├── module_inner
+ │ ├── inner_functions.py
+ │ | ├── inner_function()
+ ..
- You can run this example by: ```python run examples/concepts/task_env_parameters.py```
+ ```
- ```python linenums="1"
- --8<-- "examples/concepts/task_env_parameters.py"
+ ```yaml linenums="1" hl_lines="20-23"
+ --8<-- "examples/01-tasks/python_tasks.yaml"
```
-!!! abstract "Verbose?"
-
- We acknowledge that using pydantic models as our
- [Data transfer objects](https://stackoverflow.com/questions/1051182/what-is-a-data-transfer-object-dto) is verbose in comparison to using
- ```dict```.
-
- The advantages of using strongly typed DTO has long term advantages of implicit validation, typing hints
- in editors. This choice is inspired from [FastAPI's](https://fastapi.tiangolo.com/features/#pydantic-features)
- ways of working.
-
-
-### Passing data and execution logs
-
-Please refer to [catalog](../concepts/catalog.md) for more details and examples on passing
-data between tasks and the storage of execution logs.
-
----
+