diff --git a/docs/concepts/dag.md b/docs/concepts/dag.md
deleted file mode 100644
index e69de29b..00000000
diff --git a/docs/concepts/executor.md b/docs/concepts/executor.md
deleted file mode 100644
index d9dc3208..00000000
--- a/docs/concepts/executor.md
+++ /dev/null
@@ -1,311 +0,0 @@
-
-## TODO: Simplify
-
-Executors are the heart of runnable, they traverse the workflow and execute the tasks within the
-workflow while coordinating with different services
-(eg. [run log](../concepts/run-log.md), [catalog](../concepts/catalog.md), [secrets](../concepts/secrets.md) etc)
-
-To enable workflows run in varied computational environments, we distinguish between two core functions of
-any workflow engine.
-
-
-`Graph Traversal`
-
-: Involves following the user-defined workflow graph to its eventual conclusion.
- The navigation process encompasses the sequential execution of tasks or complex tasks
- such as parallel paths. It also includes decision-making regarding the
- pathway to follow in case of task failure and the upkeep of the
- overall status of graph execution.
-
-`Executing Individual Steps`
-
-: This refers to the concrete execution of the task as specified by the user
- along with allowing for data flow between tasks.
- This could involve activities such as launching a container or initiating a SQL query,
- among others.
-
-## Graph Traversal
-
-In runnable, the graph traversal can be performed by runnable itself or can be handed over to other
-orchestration frameworks (e.g Argo workflows, AWS step functions).
-
-### Example
-
-Below is a simple pipeline definition that does one task of printing "Hello World".
-
-```yaml linenums="1"
---8<-- "examples/concepts/task_shell_simple.yaml"
-```
-
-The above pipeline can be executed by the *default* config to execute it locally or could be
-translated to argo specification just by changing the configuration.
-
-=== "Default Configuration"
-
- The configuration defines the local compute to the execution environment with the ```run log```
- being completely in memory and buffered with no other services active.
-
- You can execute the pipeline in default configuration by:
-
- ```runnable execute -f examples/concepts/task_shell_simple.yaml```
-
- ``` yaml linenums="1"
- --8<-- "examples/configs/default.yaml"
- ```
-
- 1. Run the pipeline in local environment.
- 2. Use the buffer as run log, this will not persist the run log to disk.
- 3. Do not move any files to central storage.
- 4. Do not use any secrets manager.
- 5. Do not integrate with any experiment tracking tools
-
-=== "Argo Configuration"
-
- In this configuration, we are using [argo workflows](https://argoproj.github.io/argo-workflows/)
- as our workflow engine. We are also instructing the workflow engine to use a docker image,
- ```runnable:demo``` defined in line #4, as our execution environment. Please read
- [containerised environments](../configurations/executors/container-environments.md) for more information.
-
- Since runnable needs to track the execution status of the workflow, we are using a ```run log```
- which is persistent and available in for jobs in kubernetes environment.
-
-
- You can execute the pipeline in argo configuration by:
-
- ```runnable execute -f examples/concepts/task_shell_simple.yaml -c examples/configs/argo-config.yaml```
-
- ``` yaml linenums="1"
- --8<-- "examples/configs/argo-config.yaml"
- ```
-
- 1. Use argo workflows as the execution engine to run the pipeline.
- 2. Run this docker image for every step of the pipeline. The docker image should have the same directory structure
- as the project directory.
- 3. Mount the volume from Kubernetes persistent volumes (runnable-volume) to /mnt directory.
- 4. Resource constraints for the container runtime.
- 5. Since every step runs in a container, the run log should be persisted. Here we are using the file-system as our
- run log store.
- 6. Kubernetes PVC is mounted to every container as ```/mnt```, use that to surface the run log to every step.
-
-
-=== "Transpiled Workflow"
-
- In the below generated argo workflow template:
-
- - Lines 10-17 define a ```dag``` with tasks that corresponding to the tasks in
- the example workflow.
- - The graph traversal rules follow the the same rules as our workflow. The
- step ```success-success-ou7qlf``` in line #15 only happens if the step ```shell-task-dz3l3t```
- defined in line #12 succeeds.
- - The execution fails if any of the tasks fail. Both argo workflows and runnable ```run log```
- mark the execution as failed.
-
-
- ```yaml linenums="1"
- apiVersion: argoproj.io/v1alpha1
- kind: Workflow
- metadata:
- generateName: runnable-dag-
- annotations: {}
- labels: {}
- spec:
- activeDeadlineSeconds: 172800
- entrypoint: runnable-dag
- podGC:
- strategy: OnPodCompletion
- retryStrategy:
- limit: '0'
- retryPolicy: Always
- backoff:
- duration: '120'
- factor: 2
- maxDuration: '3600'
- serviceAccountName: default-editor
- templates:
- - name: runnable-dag
- failFast: true
- dag:
- tasks:
- - name: shell-task-4jy8pl
- template: shell-task-4jy8pl
- depends: ''
- - name: success-success-djhm6j
- template: success-success-djhm6j
- depends: shell-task-4jy8pl.Succeeded
- - name: shell-task-4jy8pl
- container:
- image: runnable:demo
- command:
- - runnable
- - execute_single_node
- - '{{workflow.parameters.run_id}}'
- - shell
- - --log-level
- - WARNING
- - --file
- - examples/concepts/task_shell_simple.yaml
- - --config-file
- - examples/configs/argo-config.yaml
- volumeMounts:
- - name: executor-0
- mountPath: /mnt
- imagePullPolicy: ''
- resources:
- limits:
- memory: 1Gi
- cpu: 250m
- requests:
- memory: 1Gi
- cpu: 250m
- - name: success-success-djhm6j
- container:
- image: runnable:demo
- command:
- - runnable
- - execute_single_node
- - '{{workflow.parameters.run_id}}'
- - success
- - --log-level
- - WARNING
- - --file
- - examples/concepts/task_shell_simple.yaml
- - --config-file
- - examples/configs/argo-config.yaml
- volumeMounts:
- - name: executor-0
- mountPath: /mnt
- imagePullPolicy: ''
- resources:
- limits:
- memory: 1Gi
- cpu: 250m
- requests:
- memory: 1Gi
- cpu: 250m
- templateDefaults:
- activeDeadlineSeconds: 7200
- timeout: 10800s
- arguments:
- parameters:
- - name: run_id
- value: '{{workflow.uid}}'
- volumes:
- - name: executor-0
- persistentVolumeClaim:
- claimName: runnable-volume
-
-
- ```
-
-
-As seen from the above example, once a [pipeline is defined in runnable](../concepts/pipeline.md) either via yaml or SDK, we can
-run the pipeline in different environments just by providing a different configuration. Most often, there is
-no need to change the code or deviate from standard best practices while coding.
-
-
-## Step Execution
-
-!!! note
-
- This section is to understand the internal mechanism of runnable and not required if you just want to
- use different executors.
-
-
-Independent of traversal, all the tasks are executed within the ```context``` of runnable.
-
-A closer look at the actual task implemented as part of transpiled workflow in argo
-specification details the inner workings. Below is a snippet of the argo specification from
-lines 18 to 34.
-
-```yaml linenums="18"
-- name: shell-task-dz3l3t
- container:
- image: runnable-example:latest
- command:
- - runnable
- - execute_single_node
- - '{{workflow.parameters.run_id}}'
- - shell
- - --log-level
- - WARNING
- - --file
- - examples/concepts/task_shell_simple.yaml
- - --config-file
- - examples/configs/argo-config.yaml
- volumeMounts:
- - name: executor-0
- mountPath: /mnt
-```
-
-The actual ```command``` to run is not the ```command``` defined in the workflow,
-i.e ```echo hello world```, but a command in the CLI of runnable which specifies the workflow file,
-the step name and the configuration file.
-
-### Context of runnable
-
-Any ```task``` defined by the user as part of the workflow always runs as a *sub-command* of
-runnable. In that sense, runnable follows the
-[decorator pattern](https://en.wikipedia.org/wiki/Decorator_pattern) without being part of the
-application codebase.
-
-In a very simplistic sense, the below stubbed-code explains the context of runnable during
-execution of a task.
-
-```python linenums="1"
-
-def execute_single_node(workflow, step_name, configuration):
-
- ##### PRE EXECUTION #####
- # Instantiate the service providers of run_log and catalog
- # These are provided as part of the configuration.
- run_log = configuration.get_run_log() # (1)
- catalog = configuration.get_catalog() # (2)
-
- step = workflow.get_step(step_name) # (3)
-
- # Get the current parameters set by the initial parameters
- # or by previous steps.
- existing_parameters = run_log.get_parameters()
- # Get the data requested by the step and populate
- # the data folder defined in the catalog configuration
- catalog.get_data(step.get_from_catalog) # (4)
-
- # Choose the parameters to pass into the function and
- # the right data type.
- task_parameters = filter_and_cast_parameters(existing_parameters, step.task) # (5)
-
- ##### END PRE EXECUTION #####
- try:
- # We call the actual task here!!
- updated_parameters = step.task(**task_parameters) # (6)
- except:
- update_status_in_run_log(step, FAIL)
- send_error_response() # (7)
-
- ##### POST EXECUTION #####
- run_log.update_parameters(updated_parameters) # (8)
- catalog.put_data(step.put_into_catalog) # (9)
- update_status_in_run_log(step, SUCCESS)
- send_success_response() # (10)
- ##### END POST EXECUTION #####
-```
-
-1. The [run log](../concepts/run-log.md) maintains the state of the execution of the tasks and subsequently the pipeline. It also
-holds the latest state of parameters along with captured metrics.
-2. The [catalog](../concepts/catalog.md) contains the information about the data flowing through the pipeline. You can get/put
-artifacts generated during the current execution of the pipeline to a central storage.
-3. Read the workflow and get the [step definition](../concepts/task.md) which holds the ```command``` or ```function``` to
-execute along with the other optional information.
-4. Any artifacts from previous steps that are needed to execute the current step can be
-[retrieved from the catalog](../concepts/catalog.md).
-5. The current function or step might need only some of the
-[parameters casted as pydantic models](../concepts/task.md/#accessing_parameters), filter and cast them appropriately.
-6. At this point in time, we have the required parameters and data to execute the actual command. The command can
-internally request for more data using the [python API](..//interactions.md) or record
-[experiment tracking metrics](../concepts/experiment-tracking.md).
-7. If the task failed, we update the run log with that information and also raise an exception for the
-workflow engine to handle. Any [on-failure](../concepts/pipeline.md/#on_failure) traversals are already handled
-as part of the workflow definition.
-8. Upon successful execution, we update the run log with current state of parameters for downstream steps.
-9. Any artifacts generated from this step are [put into the central storage](../concepts/catalog.md) for downstream steps.
-10. We send a success message to the workflow engine and mark the step as completed.
diff --git a/docs/concepts/experiment-tracking.md b/docs/concepts/experiment-tracking.md
deleted file mode 100644
index 9c47ff93..00000000
--- a/docs/concepts/experiment-tracking.md
+++ /dev/null
@@ -1,468 +0,0 @@
-# Overview
-
-[Run log](../concepts/run-log.md) stores a lot of information about the execution along with the metrics captured
-during the execution of the pipeline.
-
-
-## Example
-
-
-=== "Using the API"
-
- The highlighted lines in the below example show how to [use the API](../interactions.md/#runnable.track_this)
-
- Any pydantic model as a value would be dumped as a dict, respecting the alias, before tracking it.
-
- You can run this example by ```python run examples/concepts/experiment_tracking_api.py```
-
- ```python linenums="1" hl_lines="10 24-26"
- --8<-- "examples/concepts/experiment_tracking_api.py"
- ```
-
-
-=== "Using environment variables"
-
- The highlighted lines in the below example show how to use environment variables to track metrics.
-
- Only string values are allowed to be environment variables. Numeric values sent in as strings are converted
- to int/float before storing them as metrics.
-
- There is no support for boolean values in environment variables.
-
- ```yaml linenums="1" hl_lines="16-18"
- --8<-- "examples/concepts/experiment_tracking_env.yaml"
- ```
-
-=== "Run log entry"
-
- Any experiment tracking metrics found during the execution of the task are stored in
- ```user_defined_metrics``` field of the step log.
-
- For example, below is the content for the shell execution.
-
- ```json linenums="1" hl_lines="36-42"
- {
- "run_id": "blazing-colden-0544",
- "dag_hash": "4494aeb907ef950934fbcc34b226f72134d06687",
- "use_cached": false,
- "tag": "",
- "original_run_id": "",
- "status": "SUCCESS",
- "steps": {
- "shell": {
- "name": "shell",
- "internal_name": "shell",
- "status": "SUCCESS",
- "step_type": "task",
- "message": "",
- "mock": false,
- "code_identities": [
- {
- "code_identifier": "793b052b8b603760ff1eb843597361219832b61c",
- "code_identifier_type": "git",
- "code_identifier_dependable": true,
- "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
- "code_identifier_message": ""
- }
- ],
- "attempts": [
- {
- "attempt_number": 1,
- "start_time": "2024-01-09 05:44:42.841295",
- "end_time": "2024-01-09 05:44:42.849938",
- "duration": "0:00:00.008643",
- "status": "SUCCESS",
- "message": "",
- "parameters": {}
- }
- ],
- "user_defined_metrics": {
- "eggs": {
- "ham": "world"
- },
- "answer": 42.0,
- "spam": "hello"
- },
- "branches": {},
- "data_catalog": [
- {
- "name": "shell.execution.log",
- "data_hash": "07723e6188e7893ac79e8f07b7cc15dd1a62d2974335f173a0b5a6e58a3735d6",
- "catalog_relative_path": "blazing-colden-0544/shell.execution.log",
- "catalog_handler_location": ".catalog",
- "stage": "put"
- }
- ]
- },
- "success": {
- "name": "success",
- "internal_name": "success",
- "status": "SUCCESS",
- "step_type": "success",
- "message": "",
- "mock": false,
- "code_identities": [
- {
- "code_identifier": "793b052b8b603760ff1eb843597361219832b61c",
- "code_identifier_type": "git",
- "code_identifier_dependable": true,
- "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
- "code_identifier_message": ""
- }
- ],
- "attempts": [
- {
- "attempt_number": 1,
- "start_time": "2024-01-09 05:44:42.913905",
- "end_time": "2024-01-09 05:44:42.913963",
- "duration": "0:00:00.000058",
- "status": "SUCCESS",
- "message": "",
- "parameters": {}
- }
- ],
- "user_defined_metrics": {},
- "branches": {},
- "data_catalog": []
- }
- },
- "parameters": {},
- "run_config": {
- "executor": {
- "service_name": "local",
- "service_type": "executor",
- "enable_parallel": false,
- "placeholders": {}
- },
- "run_log_store": {
- "service_name": "buffered",
- "service_type": "run_log_store"
- },
- "secrets_handler": {
- "service_name": "do-nothing",
- "service_type": "secrets"
- },
- "catalog_handler": {
- "service_name": "file-system",
- "service_type": "catalog"
- },
- "experiment_tracker": {
- "service_name": "do-nothing",
- "service_type": "experiment_tracker"
- },
- "pipeline_file": "examples/concepts/experiment_tracking_env.yaml",
- "parameters_file": null,
- "configuration_file": null,
- "tag": "",
- "run_id": "blazing-colden-0544",
- "variables": {},
- "use_cached": false,
- "original_run_id": "",
- "dag": {
- "start_at": "shell",
- "name": "",
- "description": "An example pipeline to demonstrate setting experiment tracking metrics\nusing environment variables. Any environment variable with
- prefix\n'runnable_TRACK_' will be recorded as a metric captured during the step.\n\nYou can run this pipeline as:\n runnable execute -f
- examples/concepts/experiment_tracking_env.yaml\n",
- "internal_branch_name": "",
- "steps": {
- "shell": {
- "type": "task",
- "name": "shell",
- "internal_name": "shell",
- "internal_branch_name": "",
- "is_composite": false
- },
- "success": {
- "type": "success",
- "name": "success",
- "internal_name": "success",
- "internal_branch_name": "",
- "is_composite": false
- },
- "fail": {
- "type": "fail",
- "name": "fail",
- "internal_name": "fail",
- "internal_branch_name": "",
- "is_composite": false
- }
- }
- },
- "dag_hash": "4494aeb907ef950934fbcc34b226f72134d06687",
- "execution_plan": "chained"
- }
- }
- ```
-
-
-## Incremental tracking
-
-It is possible to track metrics over time within a task. To do so, use the ```step``` parameter in the API
-or post-fixing ```_STEP_``` and the increment when using environment variables.
-
-The step is defaulted to be 0.
-
-### Example
-
-=== "Using the API"
-
- The highlighted lines in the below example show how to [use the API](../interactions.md/#runnable.track_this) with
- the step parameter.
-
- You can run this example by ```python run examples/concepts/experiment_tracking_step.py```
-
- ```python linenums="1" hl_lines="11 25-28"
- --8<-- "examples/concepts/experiment_tracking_step.py"
- ```
-
-=== "Using environment variables"
-
- The highlighted lines in the below example show how to use environment variables to track metrics.
-
- ```yaml linenums="1" hl_lines="16-20"
- --8<-- "examples/concepts/experiment_tracking_env_step.yaml"
- ```
-
-=== "Run log entry"
-
- ```json linenums="1" hl_lines="36-51"
- {
- "run_id": "blocking-stonebraker-1545",
- "dag_hash": "",
- "use_cached": false,
- "tag": "",
- "original_run_id": "",
- "status": "SUCCESS",
- "steps": {
- "Emit Metrics": {
- "name": "Emit Metrics",
- "internal_name": "Emit Metrics",
- "status": "SUCCESS",
- "step_type": "task",
- "message": "",
- "mock": false,
- "code_identities": [
- {
- "code_identifier": "858c4df44f15d81139341641c63ead45042e0d89",
- "code_identifier_type": "git",
- "code_identifier_dependable": true,
- "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
- "code_identifier_message": ""
- }
- ],
- "attempts": [
- {
- "attempt_number": 1,
- "start_time": "2024-01-09 15:45:34.940999",
- "end_time": "2024-01-09 15:45:34.943648",
- "duration": "0:00:00.002649",
- "status": "SUCCESS",
- "message": "",
- "parameters": {}
- }
- ],
- "user_defined_metrics": {
- "spam": {
- "0": "hello",
- "1": "hey"
- },
- "eggs": {
- "0": {
- "ham": "world"
- },
- "1": {
- "ham": "universe"
- }
- },
- "answer": 42.0,
- "is_it_true": false
- },
- "branches": {},
- "data_catalog": [
- {
- "name": "Emit_Metrics.execution.log",
- "data_hash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
- "catalog_relative_path": "blocking-stonebraker-1545/Emit_Metrics.execution.log",
- "catalog_handler_location": ".catalog",
- "stage": "put"
- }
- ]
- },
- "success": {
- "name": "success",
- "internal_name": "success",
- "status": "SUCCESS",
- "step_type": "success",
- "message": "",
- "mock": false,
- "code_identities": [
- {
- "code_identifier": "858c4df44f15d81139341641c63ead45042e0d89",
- "code_identifier_type": "git",
- "code_identifier_dependable": true,
- "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
- "code_identifier_message": ""
- }
- ],
- "attempts": [
- {
- "attempt_number": 1,
- "start_time": "2024-01-09 15:45:35.126659",
- "end_time": "2024-01-09 15:45:35.126745",
- "duration": "0:00:00.000086",
- "status": "SUCCESS",
- "message": "",
- "parameters": {}
- }
- ],
- "user_defined_metrics": {},
- "branches": {},
- "data_catalog": []
- }
- },
- "parameters": {},
- "run_config": {
- "executor": {
- "service_name": "local",
- "service_type": "executor",
- "enable_parallel": false,
- "placeholders": {}
- },
- "run_log_store": {
- "service_name": "buffered",
- "service_type": "run_log_store"
- },
- "secrets_handler": {
- "service_name": "do-nothing",
- "service_type": "secrets"
- },
- "catalog_handler": {
- "service_name": "file-system",
- "service_type": "catalog"
- },
- "experiment_tracker": {
- "service_name": "do-nothing",
- "service_type": "experiment_tracker"
- },
- "pipeline_file": "",
- "parameters_file": "",
- "configuration_file": "",
- "tag": "",
- "run_id": "blocking-stonebraker-1545",
- "variables": {},
- "use_cached": false,
- "original_run_id": "",
- "dag": {
- "start_at": "Emit Metrics",
- "name": "",
- "description": "",
- "internal_branch_name": "",
- "steps": {
- "Emit Metrics": {
- "type": "task",
- "name": "Emit Metrics",
- "internal_name": "Emit Metrics",
- "internal_branch_name": "",
- "is_composite": false
- },
- "success": {
- "type": "success",
- "name": "success",
- "internal_name": "success",
- "internal_branch_name": "",
- "is_composite": false
- },
- "fail": {
- "type": "fail",
- "name": "fail",
- "internal_name": "fail",
- "internal_branch_name": "",
- "is_composite": false
- }
- }
- },
- "dag_hash": "",
- "execution_plan": "chained"
- }
- }
- ```
-
-## Experiment tracking tools
-
-!!! note "Opt out"
-
- Pipelines need not use the ```experiment-tracking``` if the preferred tools of choice is
- not implemented in runnable. The default configuration of ```do-nothing``` is no-op by design.
- We kindly request to raise a feature request to make us aware of the eco-system.
-
-
-The default experiment tracking tool of runnable is a no-op as the ```run log``` captures all the
-required details. To make it compatible with other experiment tracking tools like
-[mlflow](https://mlflow.org/docs/latest/tracking.html) or
-[Weights and Biases](https://wandb.ai/site/experiment-tracking), we map attributes of runnable
-to the underlying tool.
-
-For example, for mlflow:
-
-- Any numeric (int/float) observation is logged as
-[a metric](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.log_metric)
-with a step.
-
-- Any non numeric observation is logged as
-[a parameter](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.log_param).
-Since mlflow does not support step wise logging of parameters, the key name is formatted as
-```key_step```.
-
-- The tag associate with an execution is used as the
-[experiment name](https://mlflow.org/docs/latest/tracking/tracking-api.html#organizing-runs-in-experiments).
-
-
-!!! note inline end "Shortcomings"
-
- Experiment tracking capabilities of runnable are inferior in integration with
- popular python frameworks like pytorch and tensorflow as compared to other
- experiment tracking tools.
-
- We strongly advise to use them if you need advanced capabilities.
-
-
-=== "Example configuration"
-
- In the below configuration, the mlflow tracking server is a local instance listening on port 8080.
-
- ```yaml linenums="1" hl_lines="13-16"
- --8<-- "examples/configs/mlflow-config.yaml"
- ```
-
-=== "Pipeline"
-
- As with other examples, we are using the ```track_this``` python API to capture metrics. During the pipeline
- execution in line #39, we use the configuration of ```mlflow``` as experiment tracking tool.
-
- The tag provided during the execution is used as a experiment name in mlflow.
-
- You can run this example by ```python run examples/concepts/experiment_tracking_integration.py```
-
- ```python linenums="1" hl_lines="13 27-33 49"
- --8<-- "examples/concepts/experiment_tracking_integration.py"
- ```
-
-
-=== "In mlflow UI"
-
-
-
-
-
-
-
-To provide implementation specific capabilities, we also provide a
-[python API](../interactions.md/#runnable.get_experiment_tracker_context) to obtain the client context. The default
-client context is a [null context manager](https://docs.python.org/3/library/contextlib.html#contextlib.nullcontext).
diff --git a/docs/concepts/index.md b/docs/concepts/index.md
index c44aeaba..a5b7162d 100644
--- a/docs/concepts/index.md
+++ b/docs/concepts/index.md
@@ -23,7 +23,7 @@ consume(x, y)
## Runnable representation
-The same workflow in ```runnable``` would be:
+The workflow in ```runnable``` would be:
```python linenums="1"
from runnable import PythonTask, pickled, catalog, Pipeline
@@ -41,6 +41,8 @@ pipeline.execute()
```
+
+
- ```runnable``` wraps the functions ```generate``` and ```consume``` as [tasks](task.md).
- Tasks can [access and return](parameters.md/#access_returns) parameters.
- Tasks can also share files between them using [catalog](catalog.md).
diff --git a/docs/configurations/executors/local.md b/docs/configurations/executors/local.md
index 01252a50..48adbeb6 100644
--- a/docs/configurations/executors/local.md
+++ b/docs/configurations/executors/local.md
@@ -4,14 +4,7 @@ as it was triggered.
- [x] Provides the most comfortable environment for experimentation and development.
- [ ] The scalability is constrained by the local compute environment.
- [ ] Not possible to provide specialized compute environments for different steps of the pipeline.
-
-
-!!! warning inline end "parallel executions"
-
- Run logs that use a single json (eg. file-system) are not compatible with parallel
- executions due to race conditions to write the same file by different processes.
-
- Use ```chunked``` run log stores (eg. chunked-fs).
+- [ ] All the steps within ```parallel``` or ```map``` nodes are executed sequentially.
@@ -19,13 +12,6 @@ as it was triggered.
```yaml
executor: local
-config:
- enable_parallel: false # (1)
```
-1. By default, all tasks are sequentially executed. Provide ```true``` to enable tasks within
-[parallel](../..//concepts/parallel.md) or [map](../../concepts/map.md) to be executed in parallel.
-
-
-
All the examples in the concepts section are executed using ```local``` executors.
diff --git a/docs/configurations/experiment-tracking.md b/docs/configurations/experiment-tracking.md
deleted file mode 100644
index e69de29b..00000000
diff --git a/docs/configurations/overview.md b/docs/configurations/overview.md
index fed9ee7f..f34a5c96 100644
--- a/docs/configurations/overview.md
+++ b/docs/configurations/overview.md
@@ -1,2 +1,80 @@
-**runnable** is designed to enable the pipeline execution in varied computational environments without changing the
-infrastructure patterns.
+Once a [pipeline is defined](../concepts/index.md), ```runnable``` can execute the pipeline in different environments
+by changing a configuration. Neither the pipeline definition or the data science code needs to change at all.
+
+
+## Concept
+
+Consider the example:
+
+```python linenums="1"
+import os
+
+def generate():
+ ...
+ # write some files, data.csv
+ secret = os.environ["secret_key"]
+ ...
+ # return objects or simple python data types.
+ return x, y
+
+def consume(x, y):
+ ...
+ # read from data.csv
+ # do some computation with x and y
+
+
+# Stich the functions together
+# This is the driver pattern.
+x, y = generate()
+consume(x, y)
+```
+
+To execute the functions, we need:
+
+- Compute environment with defined resources (CPU, memory, GPU): configured by ```executor```.
+- Mechanism to make variables, ```x``` and ```y```, available to functions: achieved by ```run_log_store```.
+- Mechanism to recreate the file system structure for accessing ```data```: achieved by ```catalog```.
+- Populate secrets as environment variables: configured by ```secrets```.
+
+
+
+By default, ```runnable``` uses:
+
+- local compute to run the pipeline.
+- local file system for storing the the run log.
+- local file system for cataloging data flowing through the pipeline.
+- wrapper around system environment variables for accessing secrets.
+
+This can be over-ridden by ```configuration```. For example, the below configuration uses
+
+- argo workflows as execution engine.
+- mounted pvc for storing the run log.
+- mounted pvc for storing the catalog.
+- kubernetes secrets exposed to the container as secrets provider.
+
+```yaml
+executor:
+ type: argo
+ config:
+ image: image_to_use
+ persistent_volumes: # mount a pvc to every container as /mnt
+ - name: runnable-volume
+ mount_path: /mnt
+ secrets_from_k8s: # expose postgres/connection string to container.
+ - environment_variable: connection_string
+ secret_name: postgres
+ secret_key: connection_string
+
+run_log_store:
+ type: file-system
+ config:
+ log_folder: /mnt/run_log_store # /mnt is a pvc
+
+catalog:
+ type: file-system
+ config:
+ catalog_location: /mnt/catalog # /mnt is a pvc
+
+secrets: # Kubernetes exposes secrets as environment variables
+ type: env-secrets-manager
+```
diff --git a/docs/example/dataflow.md b/docs/example/dataflow.md
deleted file mode 100644
index 9a24aadf..00000000
--- a/docs/example/dataflow.md
+++ /dev/null
@@ -1,223 +0,0 @@
-In **runnable**, we distinguish between 2 types of data that steps can communicate with each other.
-
-[`Parameters`](#flow_of_parameters)
-
-: Parameters can be thought of input and output arguments of functions. runnable supports
-pydantic models both as input and return types of functions.
-
-[`Files`](#flow_of_files)
-
-: Data files or objects created by individual tasks of the pipeline can be passed to downstream tasks
-using catalog. This can be controlled either by the configuration or by python API.
-
-
-## Flow of Parameters
-
-The [initial parameters](../concepts/parameters.md) of the pipeline can set by using a ```yaml``` file and presented
-during execution
-
-```--parameters-file, -parameters``` while using the [runnable CLI](../usage.md/#usage)
-
-or by using ```parameters_file``` with [the sdk](../sdk.md/#runnable.Pipeline.execute).
-
-=== "Initial Parameters"
-
- ```yaml title="Defining initial parameters"
- # The below is assumed to be examples/parameters_initial.yaml # (2)
- simple: 1
- inner: # (1)
- x: 3
- y: "hello"
- ```
-
- 1. You can create deeply nested parameter structures.
- 2. You can name it as you want.
-
-=== "Pydantic model representation"
-
- The parameter structure can be represented as a pydantic model within your code.
-
- ```python title="Pydantic model representation"
-
- from pydantic import BaseModel
-
- class InnerModel(BaseModel): # (1)
- x: int
- y: str
-
- class NestedModel(BaseModel): # (2)
- simple: int
- inner: InnerModel
-
- ```
-
- 1. Represents the ```inner``` nested model of parameters.
- 2. Represents all parameters defined in initial parameters.
-
-
-### Accessing parameters
-
-
-=== "Application native way"
-
- !!! info annotate inline end "No ```import runnable``` !!!"
-
- A lot of design emphasis is to avoid "import runnable" and keep the function signature native to the application.
- runnable also has API's get_parameter and set_parameter if they are handy.
-
-
-
- ```python linenums="1" hl_lines="34-53"
- --8<-- "examples/parameters.py"
- ```
-
- 1. Create a pydantic model to represent the parameters.
- 2. Access those parameters by name. The annotations are used to cast to correct models.
- 3. Return the modified parameters for downstream steps. The return type should be always a pydantic model.
-
-
-=== "Using the python API"
-
- !!! info annotate inline end "Using API"
-
- Using the python API gives you access to the parameters without changing the
- signature of the functions. Also, this the preferred way to access the parameters in
- notebooks. (1)
-
- 1. We use parameters in notebooks but they can only support simple types while the
- API supports rich pydantic models.
-
-
- ```python linenums="1" hl_lines="45-72"
- --8<-- "examples/parameters_api.py"
- ```
-
- 1. To get the parameters as pydantic models, you can hint the type using ```cast_as```
- 2. Downstream steps could access the modified parameters.
-
-
-=== "Using environment variables"
-
- !!! info annotate inline end "Using Env"
-
- Tasks of type shell use this mechanism to access parameters.
-
- There are richer ways to pass parameters in runnable if you are using only
- python in your application. This mechanism helps when you have non-python code
- as part of your application.
-
-
- ```yaml title="Using shell to access parameters" linenums="1"
- --8<-- "examples/parameters_env.yaml"
- ```
-
- 1. Show all the parameters prefixed by runnable_PRM_
- 2. Set new values of the parameters as environment variables prefixed by runnable_PRM_
- 3. Consume the parameters like you would using python.
-
-
-
-## Flow of Files
-
-
-**runnable** stores all the artifacts/files/logs generated by ```task``` nodes in a central storage called
-[catalog](../concepts/catalog.md).
-The catalog is indexed by the ```run_id``` of the pipeline and is unique for every execution of the pipeline.
-
-Any ```task``` of the pipeline can interact with the ```catalog``` to get and put artifacts/files
-as part of the execution.
-
-Conceptually, the flow is:
-
-
-```mermaid
-flowchart LR
- subgraph Task
- direction LR
- get("get
- 📁 data folder")
- exe(Execute code)
- put("put
- 📁 data folder")
- end
-
- subgraph Catalog
- direction BT
- Data[📁 run id]
- end
-Data --> get
-put --> Data
-get --> exe
-exe --> put
-```
-
-
-The ```catalog``` for an execution has the same structure as the ```root``` of the project.
-You can access content as if you are accessing files relative to the project root.
-
-=== "Example Configuration"
-
- ``` yaml
- --8<-- "examples/configs/fs-catalog.yaml"
- ```
-
- 1. Use local file system as a central catalog, defaults to ```.catalog```
- 2. By default, runnable uses ```data``` folder as the directory containing the user data.
-
-=== "pipeline in yaml"
-
- !!! info annotate "Python functions"
-
- We have used shell for these operations for convenience but you can use python functions to
- create content and retrieve content.
-
- For example, the below functions can be used in steps Create Content and Retrieve Content.
- ```python
- def create_content():
- with open("data/hello.txt") as f:
- f.write("hello from runnable")
-
- def retrieve_content():
- with open("data/hello.txt") as f:
- print(f.read())
- ```
-
-
- ``` yaml linenums="1"
- --8<-- "examples/catalog.yaml"
- ```
-
- 1. Make a ```data``` folder if it does not already exist.
- 2. As the ```compute_data_folder``` is defined to ```.```, all paths should be relative to ```.```. Put the file ```hello.txt``` in ```data``` folder into the catalog.
- 3. We have intentionally made this ```stub``` node to prevent accidentally deleting your content. Please make it a ```task``` to actually delete the ```data``` folder.
- 4. Should print "Hello from runnable" as the content of the ```hello.txt```.
- 5. Override the default ```.``` as ```compute_data_folder``` to ```data```. All interactions should then be relative to ```data``` folder.
- 6. Same as above, make it a ```task``` to actually delete the ```data``` folder
-
-=== "python sdk"
-
- !!! info annotate "Python functions"
-
- We have used shell for these operations for convenience but you can use python functions to
- create content and retrieve content.
-
- For example, the below functions can be used in steps create and retrieve.
- ```python
- def create_content():
- with open("data/hello.txt") as f:
- f.write("hello from runnable")
-
- def retrieve_content():
- with open("data/hello.txt") as f:
- print(f.read())
- ```
-
- ```python linenums="1"
- --8<-- "examples/catalog.py"
- ```
-
-=== "python API"
-
- ```python linenums="1"
- --8<-- "examples/catalog_api.py"
- ```
diff --git a/docs/example/example.md b/docs/example/example.md
deleted file mode 100644
index 585b4b4a..00000000
--- a/docs/example/example.md
+++ /dev/null
@@ -1,397 +0,0 @@
-
-
-runnable revolves around the concept of [pipelines or workflows](../concepts/pipeline.md).
-Pipelines defined in runnable are translated into
-other workflow engine definitions like [Argo workflows](https://argoproj.github.io/workflows/) or
-[AWS step functions](https://aws.amazon.com/step-functions/).
-
-## Example Pipeline definition
-
-A contrived example of data science workflow without any implementation.
-
-!!! info annotate inline end "Simple pipeline"
-
- In this extremely reduced example, we acquire data from different sources, clean it and shape it for analysis.
- Features are then engineered from the clean data to run data science modelling.
-
-
-``` mermaid
-%%{ init: { 'flowchart': { 'curve': 'linear' } } }%%
-flowchart TD
-
- step1:::green
- step1([Acquire data]) --> step2:::green
- step2([Prepare data]) --> step3:::green
- step3([Extract features]) --> step4:::green
- step4([Model]) --> suc([success]):::green
-
- classDef green stroke:#0f0
-
-```
-
-
-This pipeline can be represented in **runnable** as below:
-
-
-=== "yaml"
-
- ``` yaml linenums="1"
- --8<-- "examples/contrived.yaml"
- ```
-
- 1. ```stub``` nodes are mock nodes and always succeed.
- 2. Execute the ```next``` node if it succeeds.
- 3. This marks the pipeline to be be successfully completed.
- 4. Any failure in the execution of the node will, by default, reach this step.
-
-=== "python"
-
- ``` python linenums="1"
- --8<-- "examples/contrived.py"
- ```
-
- 1. You can specify dependencies by using the ```next``` while creating the node or defer it for later.
- 2. ```terminate_with_success``` indicates the pipeline to be successfully complete.
- 3. Alternative ways to define dependencies, ```>>```, ```<<```, ```depends_on```. Choose the style that you
- prefer.
- 4. ```add_terminal_nodes``` adds success and fail states to the pipeline.
- 5. A very rich run log that captures different properties of the run for maximum reproducibility.
-
-
-=== "Run log"
-
- Please see [Run log](../concepts/run-log.md) for more detailed information about the structure.
-
- ```json linenums="1"
- {
- "run_id": "vain-hopper-0731", // (1)
- "dag_hash": "",
- "use_cached": false,
- "tag": "",
- "original_run_id": "",
- "status": "SUCCESS", / (2)
- "steps": {
- "Acquire Data": {
- "name": "Acquire Data", // (3)
- "internal_name": "Acquire Data",
- "status": "SUCCESS",
- "step_type": "stub",
- "message": "",
- "mock": false,
- "code_identities": [
- {
- "code_identifier": "399b0d42f4f28aaeeb2e062bb0b938d50ff1595c", // (4)
- "code_identifier_type": "git",
- "code_identifier_dependable": true,
- "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
- "code_identifier_message": ""
- }
- ],
- "attempts": [
- {
- "attempt_number": 1,
- "start_time": "2023-11-16 07:31:39.929797",
- "end_time": "2023-11-16 07:31:39.929815",
- "duration": "0:00:00.000018",
- "status": "SUCCESS",
- "message": "",
- "parameters": {}
- }
- ],
- "user_defined_metrics": {}, // (5)
- "branches": {},
- "data_catalog": [] // (6)
- },
- "Prepare Data": {
- "name": "Prepare Data",
- "internal_name": "Prepare Data",
- "status": "SUCCESS",
- "step_type": "stub",
- "message": "",
- "mock": false,
- "code_identities": [
- {
- "code_identifier": "399b0d42f4f28aaeeb2e062bb0b938d50ff1595c",
- "code_identifier_type": "git",
- "code_identifier_dependable": true,
- "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
- "code_identifier_message": ""
- }
- ],
- "attempts": [
- {
- "attempt_number": 1,
- "start_time": "2023-11-16 07:31:39.993807",
- "end_time": "2023-11-16 07:31:39.993828",
- "duration": "0:00:00.000021",
- "status": "SUCCESS",
- "message": "",
- "parameters": {}
- }
- ],
- "user_defined_metrics": {},
- "branches": {},
- "data_catalog": []
- },
- "Extract Features": {
- "name": "Extract Features",
- "internal_name": "Extract Features",
- "status": "SUCCESS",
- "step_type": "stub",
- "message": "",
- "mock": false,
- "code_identities": [
- {
- "code_identifier": "399b0d42f4f28aaeeb2e062bb0b938d50ff1595c",
- "code_identifier_type": "git",
- "code_identifier_dependable": true,
- "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
- "code_identifier_message": ""
- }
- ],
- "attempts": [
- {
- "attempt_number": 1,
- "start_time": "2023-11-16 07:31:40.056403",
- "end_time": "2023-11-16 07:31:40.056420",
- "duration": "0:00:00.000017",
- "status": "SUCCESS",
- "message": "",
- "parameters": {}
- }
- ],
- "user_defined_metrics": {},
- "branches": {},
- "data_catalog": []
- },
- "Model": {
- "name": "Model",
- "internal_name": "Model",
- "status": "SUCCESS",
- "step_type": "stub",
- "message": "",
- "mock": false,
- "code_identities": [
- {
- "code_identifier": "399b0d42f4f28aaeeb2e062bb0b938d50ff1595c",
- "code_identifier_type": "git",
- "code_identifier_dependable": true,
- "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
- "code_identifier_message": ""
- }
- ],
- "attempts": [
- {
- "attempt_number": 1,
- "start_time": "2023-11-16 07:31:40.118268",
- "end_time": "2023-11-16 07:31:40.118285",
- "duration": "0:00:00.000017",
- "status": "SUCCESS",
- "message": "",
- "parameters": {}
- }
- ],
- "user_defined_metrics": {},
- "branches": {},
- "data_catalog": []
- },
- "success": {
- "name": "success",
- "internal_name": "success",
- "status": "SUCCESS",
- "step_type": "success",
- "message": "",
- "mock": false,
- "code_identities": [
- {
- "code_identifier": "399b0d42f4f28aaeeb2e062bb0b938d50ff1595c",
- "code_identifier_type": "git",
- "code_identifier_dependable": true,
- "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
- "code_identifier_message": ""
- }
- ],
- "attempts": [
- {
- "attempt_number": 1,
- "start_time": "2023-11-16 07:31:40.176718",
- "end_time": "2023-11-16 07:31:40.176774",
- "duration": "0:00:00.000056",
- "status": "SUCCESS",
- "message": "",
- "parameters": {}
- }
- ],
- "user_defined_metrics": {},
- "branches": {},
- "data_catalog": []
- }
- },
- "parameters": {},
- "run_config": { // (7)
- "executor": {
- "service_name": "local",
- "service_type": "executor",
- "enable_parallel": false,
- "placeholders": {}
- },
- "run_log_store": {
- "service_name": "buffered",
- "service_type": "run_log_store"
- },
- "secrets_handler": {
- "service_name": "do-nothing",
- "service_type": "secrets"
- },
- "catalog_handler": {
- "service_name": "file-system",
- "service_type": "catalog",
- "compute_data_folder": "data"
- },
- "experiment_tracker": {
- "service_name": "do-nothing",
- "service_type": "experiment_tracker"
- },
- "pipeline_file": "",
- "parameters_file": "",
- "configuration_file": "",
- "tag": "",
- "run_id": "vain-hopper-0731",
- "variables": {},
- "use_cached": false,
- "original_run_id": "",
- "dag": { // (8)
- "start_at": "Acquire Data",
- "name": "",
- "description": "",
- "max_time": 86400,
- "internal_branch_name": "",
- "steps": {
- "Acquire Data": {
- "type": "stub",
- "name": "Acquire Data",
- "internal_name": "Acquire Data",
- "internal_branch_name": "",
- "is_composite": false
- },
- "Prepare Data": {
- "type": "stub",
- "name": "Prepare Data",
- "internal_name": "Prepare Data",
- "internal_branch_name": "",
- "is_composite": false
- },
- "Extract Features": {
- "type": "stub",
- "name": "Extract Features",
- "internal_name": "Extract Features",
- "internal_branch_name": "",
- "is_composite": false
- },
- "Model": {
- "type": "stub",
- "name": "Model",
- "internal_name": "Model",
- "internal_branch_name": "",
- "is_composite": false
- },
- "success": {
- "type": "success",
- "name": "success",
- "internal_name": "success",
- "internal_branch_name": "",
- "is_composite": false
- },
- "fail": {
- "type": "fail",
- "name": "fail",
- "internal_name": "fail",
- "internal_branch_name": "",
- "is_composite": false
- }
- }
- },
- "dag_hash": "",
- "execution_plan": "chained"
- }
- }
- ```
-
- 1. Unique execution id or run id for every run of the pipeline.
- 2. The status of the execution, one of success, fail or processing.
- 3. Steps as defined in the pipeline configuration.
- 4. git hash of the code that was used to run the pipeline.
- 5. Optional user defined metrics during the step execution. These are also made available to the experiment tracking
- tool, if they are configured.
- 6. Data files that are ```get``` or ```put``` into a central storage during execution of the step.
- 7. The configuration used to run the pipeline.
- 8. The pipeline definition.
-
-
-Independent of the platform it is run on,
-
-
-- [x] The [pipeline definition](..//concepts/pipeline.md) remains the same from an author point of view.
-The data scientists are always part of the process and contribute to the development even in production environments.
-
-- [x] The [run log](../concepts/run-log.md) remains the same except for the execution configuration enabling users
-to debug the pipeline execution in lower environments for failed executions or to validate the
-expectation of the execution.
-
-
-
-
-## Example configuration
-
-To run the pipeline in different environments, we just provide the
-[required configuration](../configurations/overview.md).
-
-=== "Default Configuration"
-
- ``` yaml linenums="1"
- --8<-- "examples/configs/default.yaml"
- ```
-
- 1. Run the pipeline in local environment.
- 2. Use the buffer as run log, this will not persist the run log to disk.
- 3. Do not move any files to central storage.
- 4. Do not use any secrets manager.
- 5. Do not integrate with any experiment tracking tools
-
-=== "Argo Configuration"
-
- To render the pipeline in [argo specification](../configurations/executors/argo.md), mention the
- configuration during execution.
-
- yaml:
-
- ```runnable execute -f examples/contrived.yaml -c examples/configs/argo-config.yaml```
-
-
- python:
-
- Please refer to [containerised environments](../configurations/executors/container-environments.md) for more information.
-
- runnable_CONFIGURATION_FILE=examples/configs/argo-config.yaml python examples/contrived.py && runnable execute -f runnable-pipeline.yaml -c examples/configs/argo-config.yaml
-
- ``` yaml linenums="1" title="Argo Configuration"
- --8<-- "examples/configs/argo-config.yaml"
- ```
-
- 1. Use argo workflows as the execution engine to run the pipeline.
- 2. Run this docker image for every step of the pipeline. Please refer to
- [containerised environments](../configurations/executors/container-environments.md) for more details.
- 3. Mount the volume from Kubernetes persistent volumes (runnable-volume) to /mnt directory.
- 4. Resource constraints for the container runtime.
- 5. Since every step runs in a container, the run log should be persisted. Here we are using the file-system as our
- run log store.
- 6. Kubernetes PVC is mounted to every container as ```/mnt```, use that to surface the run log to every step.
-
-
-=== "Transpiled Workflow"
-
- The below is the same workflow definition in argo specification.
-
- ```yaml linenums="1"
- --8<-- "examples/generated-argo-pipeline.yaml"
- ```
diff --git a/docs/example/experiment-tracking.md b/docs/example/experiment-tracking.md
deleted file mode 100644
index 70ae1be9..00000000
--- a/docs/example/experiment-tracking.md
+++ /dev/null
@@ -1,202 +0,0 @@
-Metrics in data science projects summarize important information about the execution and performance of the
-experiment.
-
-runnable captures [this information as part of the run log](../concepts/experiment-tracking.md) and also provides
-an [interface to experiment tracking tools](../concepts/experiment-tracking.md/#experiment_tracking_tools)
-like [mlflow](https://mlflow.org/docs/latest/tracking.html) or
-[Weights and Biases](https://wandb.ai/site/experiment-tracking).
-
-
-### Example
-
-
-=== "python"
-
- ```python linenums="1"
- --8<-- "examples/experiment_tracking_api.py"
- ```
-
- 1. Nested metrics are possible as pydantic models.
- 2. Using mlflow as experiment tracking tool.
-
-=== "yaml"
-
- ```yaml linenums="1"
- --8<-- "examples/experiment_tracking_env.yaml"
- ```
-
-=== "configuration"
-
- Assumed to be present in ```examples/configs/mlflow-config.yaml```
-
- ```yaml linenums="1"
- --8<-- "examples/configs/mlflow-config.yaml"
- ```
-
-=== "Run log"
-
- The captured metrics as part of the run log are highlighted.
-
- ```json linenums="1" hl_lines="36-43"
- {
- "run_id": "clean-ride-1048",
- "dag_hash": "",
- "use_cached": false,
- "tag": "",
- "original_run_id": "",
- "status": "SUCCESS",
- "steps": {
- "Emit Metrics": {
- "name": "Emit Metrics",
- "internal_name": "Emit Metrics",
- "status": "SUCCESS",
- "step_type": "task",
- "message": "",
- "mock": false,
- "code_identities": [
- {
- "code_identifier": "0b62e4c661a4b4a2187afdf44a7c64520374202d",
- "code_identifier_type": "git",
- "code_identifier_dependable": true,
- "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
- "code_identifier_message": ""
- }
- ],
- "attempts": [
- {
- "attempt_number": 1,
- "start_time": "2024-01-10 10:48:10.089266",
- "end_time": "2024-01-10 10:48:10.092541",
- "duration": "0:00:00.003275",
- "status": "SUCCESS",
- "message": "",
- "parameters": {}
- }
- ],
- "user_defined_metrics": {
- "spam": "hello",
- "eggs": {
- "ham": "world"
- },
- "answer": 42.0,
- "is_it_true": false
- },
- "branches": {},
- "data_catalog": [
- {
- "name": "Emit_Metrics.execution.log",
- "data_hash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
- "catalog_relative_path": "clean-ride-1048/Emit_Metrics.execution.log",
- "catalog_handler_location": ".catalog",
- "stage": "put"
- }
- ]
- },
- "success": {
- "name": "success",
- "internal_name": "success",
- "status": "SUCCESS",
- "step_type": "success",
- "message": "",
- "mock": false,
- "code_identities": [
- {
- "code_identifier": "0b62e4c661a4b4a2187afdf44a7c64520374202d",
- "code_identifier_type": "git",
- "code_identifier_dependable": true,
- "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
- "code_identifier_message": ""
- }
- ],
- "attempts": [
- {
- "attempt_number": 1,
- "start_time": "2024-01-10 10:48:10.585832",
- "end_time": "2024-01-10 10:48:10.585937",
- "duration": "0:00:00.000105",
- "status": "SUCCESS",
- "message": "",
- "parameters": {}
- }
- ],
- "user_defined_metrics": {},
- "branches": {},
- "data_catalog": []
- }
- },
- "parameters": {},
- "run_config": {
- "executor": {
- "service_name": "local",
- "service_type": "executor",
- "enable_parallel": false,
- "placeholders": {}
- },
- "run_log_store": {
- "service_name": "buffered",
- "service_type": "run_log_store"
- },
- "secrets_handler": {
- "service_name": "do-nothing",
- "service_type": "secrets"
- },
- "catalog_handler": {
- "service_name": "file-system",
- "service_type": "catalog"
- },
- "experiment_tracker": {
- "service_name": "mlflow",
- "service_type": "experiment_tracker"
- },
- "pipeline_file": "",
- "parameters_file": "",
- "configuration_file": "examples/configs/mlflow-config.yaml",
- "tag": "",
- "run_id": "clean-ride-1048",
- "variables": {},
- "use_cached": false,
- "original_run_id": "",
- "dag": {
- "start_at": "Emit Metrics",
- "name": "",
- "description": "",
- "internal_branch_name": "",
- "steps": {
- "Emit Metrics": {
- "type": "task",
- "name": "Emit Metrics",
- "internal_name": "Emit Metrics",
- "internal_branch_name": "",
- "is_composite": false
- },
- "success": {
- "type": "success",
- "name": "success",
- "internal_name": "success",
- "internal_branch_name": "",
- "is_composite": false
- },
- "fail": {
- "type": "fail",
- "name": "fail",
- "internal_name": "fail",
- "internal_branch_name": "",
- "is_composite": false
- }
- }
- },
- "dag_hash": "",
- "execution_plan": "chained"
- }
- }
- ```
-
-
-=== "mlflow"
-
- The metrics are also sent to mlflow.
-
-
diff --git a/docs/example/reproducibility.md b/docs/example/reproducibility.md
deleted file mode 100644
index 39bdfd31..00000000
--- a/docs/example/reproducibility.md
+++ /dev/null
@@ -1,231 +0,0 @@
-runnable stores a variety of information about the current execution in [run log](../concepts/run-log.md).
-The run log is internally used
-for keeping track of the execution (status of different steps, parameters, etc) but also has rich information
-for reproducing the state at the time of pipeline execution.
-
-
-The following are "invisibly" captured as part of the run log:
-
-- Code: The ```git``` commit hash of the code used to run a pipeline is stored as part of the run log against
-every step.
-- Data hash: The data hash of the file passing through the catalog is stored as part of the run log. Since the
-catalog itself is indexed against the execution id, it is easy to recreate the exact state of the data used
-in the pipeline execution.
-- Configuration: The configuration of the pipeline (dag definition, execution configuration) is also stored
-as part of the run log.
-
-
-
-!!! info annotate "Invisible?"
-
- Reproducibility should not be a "nice to have" but is a must in data science projects. We believe that
- it should not be left to the data scientist to be conscious of it but should be done without any active
- intervention.
-
-
-Below we show an example pipeline and the different layers of the run log.
-
-
-=== "Example pipeline"
-
- !!! info annotate "Example"
-
- This example pipeline is the same as the data flow pipeline showcasing flow of files.
- The create content step creates writes a new file which is stored in the catalog and the retrieve content
- gets it from the catalog.
-
-
- ```python title="simple data passing pipeline" linenums="1"
- --8<-- "examples/catalog_api.py"
- ```
-=== "General run log attributes"
-
- !!! info annotate
-
- This section of the run log is about the over all status of the execution. It has information
- about the run_id, the execution status, re-run indicators and the final state of the parameters.
-
-
- ```json linenums="1"
- {
- "run_id": "greedy-yonath-1608", // (1)
- "dag_hash": "",
- "use_cached": false,
- "tag": "",
- "original_run_id": "",
- "status": "SUCCESS",
- ...
- "parameters": {}, // (2)
- }
- ```
-
- 1. The unique run_id of the execution.
- 2. The parameters at the end of the pipeline.
-
-
-=== "Logs captured against a step"
-
- !!! info annotate
-
- The information stored against an execution of a step. We capture the git commit id's, data hashes,
- parameters at the point of execution. The execution logs are also stored in the catalog against the
- run id.
-
-
- ```json linenums="1"
- "create_content": { // (1)
- "name": "create_content",
- "internal_name": "create_content",
- "status": "SUCCESS", // (2)
- "step_type": "task",
- "message": "",
- "mock": false,
- "code_identities": [
- {
- "code_identifier": "ff60e7fa379c38adaa03755977057cd10acc4baa", // (3)
- "code_identifier_type": "git",
- "code_identifier_dependable": true, // (4)
- "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
- "code_identifier_message": ""
- }
- ],
- "attempts": [
- {
- "attempt_number": 1,
- "start_time": "2023-12-15 16:08:51.869129",
- "end_time": "2023-12-15 16:08:51.878428",
- "duration": "0:00:00.009299",
- "status": "SUCCESS",
- "message": "",
- "parameters": {} // (5)
- }
- ],
- "user_defined_metrics": {},
- "branches": {},
- "data_catalog": [
- {
- "name": "data/hello.txt", // (6)
- "data_hash": "c2e6b3d23c045731bf40a036aa6f558c9448da247e0cbb4ee3fcf10d3660ef18", // (7)
- "catalog_relative_path": "greedy-yonath-1608/data/hello.txt",
- "catalog_handler_location": ".catalog",
- "stage": "put"
- },
- {
- "name": "create_content", // (8)
- "data_hash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
- "catalog_relative_path": "greedy-yonath-1608/create_content",
- "catalog_handler_location": ".catalog",
- "stage": "put"
- }
- ]
- },
- ```
-
- 1. The name of step.
- 2. The status of the execution of the step.
- 3. The git sha of the code at the point of execution of the pipeline.
- 4. is True if the branch is clean, false otherwise.
- 5. The parameters at the point of execution of the step.
- 6. The name of the file that was "put" in the catalog by the step.
- 7. The hash of the dataset put in the catalog.
- 8. The execution logs of the step put in the catalog.
-
-
-=== "Captured configuration"
-
- !!! info annotate
-
- The information about the configuration used to run the pipeline. It includes the configuration of the
- different ```services``` used, the pipeline definition and state of variables used at the time of
- execution of the pipeline.
-
-
- ```json linenums="1"
- "run_config": {
- "executor": { // (1)
- "service_name": "local",
- "service_type": "executor",
- "enable_parallel": false,
- "placeholders": {}
- },
- "run_log_store": { // (2)
- "service_name": "buffered",
- "service_type": "run_log_store"
- },
- "secrets_handler": { // (3)
- "service_name": "do-nothing",
- "service_type": "secrets"
- },
- "catalog_handler": { // (4)
- "service_name": "file-system",
- "service_type": "catalog",
- "compute_data_folder": "."
- },
- "experiment_tracker": { // (5)
- "service_name": "do-nothing",
- "service_type": "experiment_tracker"
- },
- "pipeline_file": "", // (6
- "parameters_file": "", // (7)
- "configuration_file": "examples/configs/fs-catalog.yaml", // (8)
- "tag": "",
- "run_id": "greedy-yonath-1608",
- "variables": {},
- "use_cached": false,
- "original_run_id": "",
- "dag": { // (9)
- "start_at": "create_content",
- "name": "",
- "description": "",
- "max_time": 86400,
- "internal_branch_name": "",
- "steps": {
- "create_content": {
- "type": "task",
- "name": "create_content",
- "internal_name": "create_content",
- "internal_branch_name": "",
- "is_composite": false
- },
- "retrieve_content": {
- "type": "task",
- "name": "retrieve_content",
- "internal_name": "retrieve_content",
- "internal_branch_name": "",
- "is_composite": false
- },
- "success": {
- "type": "success",
- "name": "success",
- "internal_name": "success",
- "internal_branch_name": "",
- "is_composite": false
- },
- "fail": {
- "type": "fail",
- "name": "fail",
- "internal_name": "fail",
- "internal_branch_name": "",
- "is_composite": false
- }
- }
- },
- "dag_hash": "",
- "execution_plan": "chained"
- }
- ```
-
- 1. The configuration of the ```executor```
- 2. The configuration of ```run log store```. The location where these logs are stored.
- 3. The configuration of the secrets manager.
- 4. The configuration of the catalog manager.
- 5. The configuration of experiment tracker.
- 6. The pipeline definition file, empty in this case as we use the SDK.
- 7. The initial parameters file used for the execution.
- 8. The configuration file used for the execution.
- 9. The definition of the DAG being executed.
-
-
-
-This [structure of the run log](../concepts/run-log.md) is the same independent of where the pipeline was executed.
-This enables you to reproduce a failed execution in complex environments on local environments for easier debugging.
diff --git a/docs/example/retry-after-failure.md b/docs/example/retry-after-failure.md
deleted file mode 100644
index 7f83bff3..00000000
--- a/docs/example/retry-after-failure.md
+++ /dev/null
@@ -1,593 +0,0 @@
-runnable allows you to [debug and recover](../concepts/run-log.md/#retrying_failures) from a
-failure during the execution of pipeline. The pipeline can be
-restarted in any suitable environment for debugging.
-
-
-!!! example annotate
-
- A pipeline that is transpiled to argo workflows can be re-run on your local compute
- for debugging purposes. The only caveat is that, your local compute should have access to run log of the failed
- execution (1), generated catalog artifacts (2) from the the failed execution.
-
-1. Access to the run log can be as simple as copy the json file to your local compute.
-2. Generated catalog artifacts can be sourced from ```file-system``` which is your local folder.
-
-
-
-Below is an example of retrying a pipeline that failed.
-
-
-=== "Failed pipeline"
-
- !!! note
-
- You can run this pipeline on your local machine by
-
- ```runnable execute -f examples/retry-fail.yaml -c examples/configs/fs-catalog-run_log.yaml --run-id wrong-file-name```
-
- Note that we have specified the ```run_id``` to be something we can use later.
- The execution logs of the steps in the catalog will show the reason of the failure.
-
- ```yaml title="Pipeline that fails"
- --8<-- "examples/retry-fail.yaml"
- ```
-
- 1. We make a data folder to store content.
- 2. Puts a file in the data folder and catalogs it for downstream steps.
- 3. It will fail here as there is no file called ```hello1.txt``` in the data folder.
- 4. Get the file, ```hello.txt``` generated from previous steps into data folder.
-
-
-=== "Failed run log"
-
- Please note the overall status of the pipeline in line #7 to be ```FAIL```.
- The step log of the failed step is also marked with status ```FAIL```.
-
- ```json linenums="1" hl_lines="7 94-139"
- {
- "run_id": "wrong-file-name",
- "dag_hash": "13f7c1b29ebb07ce058305253171ceae504e1683",
- "use_cached": false,
- "tag": "",
- "original_run_id": "",
- "status": "FAIL",
- "steps": {
- "Setup": {
- "name": "Setup",
- "internal_name": "Setup",
- "status": "SUCCESS",
- "step_type": "task",
- "message": "",
- "mock": false,
- "code_identities": [
- {
- "code_identifier": "f94e49a4fcecebac4d5eecbb5b691561b08e45c0",
- "code_identifier_type": "git",
- "code_identifier_dependable": true,
- "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
- "code_identifier_message": ""
- }
- ],
- "attempts": [
- {
- "attempt_number": 1,
- "start_time": "2024-02-07 06:08:45.330918",
- "end_time": "2024-02-07 06:08:45.348227",
- "duration": "0:00:00.017309",
- "status": "SUCCESS",
- "message": "",
- "parameters": {}
- }
- ],
- "user_defined_metrics": {},
- "branches": {},
- "data_catalog": [
- {
- "name": "Setup.execution.log",
- "data_hash": "e1f8eaa5d49d88fae21fd8a34ff9774bcd4136bdbc3aa613f88a986261bac694",
- "catalog_relative_path": "wrong-file-name/Setup.execution.log",
- "catalog_handler_location": ".catalog",
- "stage": "put"
- }
- ]
- },
- "Create Content": {
- "name": "Create Content",
- "internal_name": "Create Content",
- "status": "SUCCESS",
- "step_type": "task",
- "message": "",
- "mock": false,
- "code_identities": [
- {
- "code_identifier": "f94e49a4fcecebac4d5eecbb5b691561b08e45c0",
- "code_identifier_type": "git",
- "code_identifier_dependable": true,
- "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
- "code_identifier_message": ""
- }
- ],
- "attempts": [
- {
- "attempt_number": 1,
- "start_time": "2024-02-07 06:08:45.422420",
- "end_time": "2024-02-07 06:08:45.438199",
- "duration": "0:00:00.015779",
- "status": "SUCCESS",
- "message": "",
- "parameters": {}
- }
- ],
- "user_defined_metrics": {},
- "branches": {},
- "data_catalog": [
- {
- "name": "Create_Content.execution.log",
- "data_hash": "e1f8eaa5d49d88fae21fd8a34ff9774bcd4136bdbc3aa613f88a986261bac694",
- "catalog_relative_path": "wrong-file-name/Create_Content.execution.log",
- "catalog_handler_location": ".catalog",
- "stage": "put"
- },
- {
- "name": "data/hello.txt",
- "data_hash": "108ecead366a67c2bb17e223032e12629bcc21b4ab0fff77cf48a5b784f208c7",
- "catalog_relative_path": "wrong-file-name/data/hello.txt",
- "catalog_handler_location": ".catalog",
- "stage": "put"
- }
- ]
- },
- "Retrieve Content": {
- "name": "Retrieve Content",
- "internal_name": "Retrieve Content",
- "status": "FAIL",
- "step_type": "task",
- "message": "",
- "mock": false,
- "code_identities": [
- {
- "code_identifier": "f94e49a4fcecebac4d5eecbb5b691561b08e45c0",
- "code_identifier_type": "git",
- "code_identifier_dependable": true,
- "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
- "code_identifier_message": ""
- }
- ],
- "attempts": [
- {
- "attempt_number": 1,
- "start_time": "2024-02-07 06:08:45.525924",
- "end_time": "2024-02-07 06:08:45.605381",
- "duration": "0:00:00.079457",
- "status": "FAIL",
- "message": "Command failed",
- "parameters": {}
- }
- ],
- "user_defined_metrics": {},
- "branches": {},
- "data_catalog": [
- {
- "name": "data/hello.txt",
- "data_hash": "108ecead366a67c2bb17e223032e12629bcc21b4ab0fff77cf48a5b784f208c7",
- "catalog_relative_path": "data/hello.txt",
- "catalog_handler_location": ".catalog",
- "stage": "get"
- },
- {
- "name": "Retrieve_Content.execution.log",
- "data_hash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
- "catalog_relative_path": "wrong-file-name/Retrieve_Content.execution.log",
- "catalog_handler_location": ".catalog",
- "stage": "put"
- }
- ]
- },
- "fail": {
- "name": "fail",
- "internal_name": "fail",
- "status": "SUCCESS",
- "step_type": "fail",
- "message": "",
- "mock": false,
- "code_identities": [
- {
- "code_identifier": "f94e49a4fcecebac4d5eecbb5b691561b08e45c0",
- "code_identifier_type": "git",
- "code_identifier_dependable": true,
- "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
- "code_identifier_message": ""
- }
- ],
- "attempts": [
- {
- "attempt_number": 1,
- "start_time": "2024-02-07 06:08:45.701371",
- "end_time": "2024-02-07 06:08:45.701954",
- "duration": "0:00:00.000583",
- "status": "SUCCESS",
- "message": "",
- "parameters": {}
- }
- ],
- "user_defined_metrics": {},
- "branches": {},
- "data_catalog": []
- }
- },
- "parameters": {},
- "run_config": {
- "executor": {
- "service_name": "local",
- "service_type": "executor",
- "enable_parallel": false,
- "overrides": {}
- },
- "run_log_store": {
- "service_name": "file-system",
- "service_type": "run_log_store",
- "log_folder": ".run_log_store"
- },
- "secrets_handler": {
- "service_name": "do-nothing",
- "service_type": "secrets"
- },
- "catalog_handler": {
- "service_name": "file-system",
- "service_type": "catalog",
- "catalog_location": ".catalog"
- },
- "experiment_tracker": {
- "service_name": "do-nothing",
- "service_type": "experiment_tracker"
- },
- "pipeline_file": "examples/retry-fail.yaml",
- "parameters_file": null,
- "configuration_file": "examples/configs/fs-catalog-run_log.yaml",
- "tag": "",
- "run_id": "wrong-file-name",
- "variables": {
- "argo_docker_image": "harbor.csis.astrazeneca.net/mlops/runnable:latest"
- },
- "use_cached": false,
- "original_run_id": "",
- "dag": {
- "start_at": "Setup",
- "name": "",
- "description": "This is a simple pipeline that demonstrates retrying failures.\n\n1. Setup: We setup a data folder, we ignore if it is already present\n2. Create Content: We create a \"hello.txt\" and \"put\" the file in catalog\n3. Retrieve Content: We \"get\" the file \"hello.txt\" from the catalog and show the contents\n5. Cleanup: We remove the data folder. Note that this is stubbed to prevent accidental deletion.\n\n\nYou can run this pipeline by:\n runnable execute -f examples/catalog.yaml -c examples/configs/fs-catalog.yaml\n",
- "steps": {
- "Setup": {
- "type": "task",
- "name": "Setup",
- "next": "Create Content",
- "on_failure": "",
- "overrides": {},
- "catalog": null,
- "max_attempts": 1,
- "command_type": "shell",
- "command": "mkdir -p data",
- "node_name": "Setup"
- },
- "Create Content": {
- "type": "task",
- "name": "Create Content",
- "next": "Retrieve Content",
- "on_failure": "",
- "overrides": {},
- "catalog": {
- "get": [],
- "put": [
- "data/hello.txt"
- ]
- },
- "max_attempts": 1,
- "command_type": "shell",
- "command": "echo \"Hello from runnable\" >> data/hello.txt\n",
- "node_name": "Create Content"
- },
- "Retrieve Content": {
- "type": "task",
- "name": "Retrieve Content",
- "next": "success",
- "on_failure": "",
- "overrides": {},
- "catalog": {
- "get": [
- "data/hello.txt"
- ],
- "put": []
- },
- "max_attempts": 1,
- "command_type": "shell",
- "command": "cat data/hello1.txt",
- "node_name": "Retrieve Content"
- },
- "success": {
- "type": "success",
- "name": "success"
- },
- "fail": {
- "type": "fail",
- "name": "fail"
- }
- }
- },
- "dag_hash": "13f7c1b29ebb07ce058305253171ceae504e1683",
- "execution_plan": "chained"
- }
- }
- ```
-
-
-=== "Fixed pipeline"
-
- !!! note
-
- You can run this pipeline on your local machine by
-
- ```runnable execute -f examples/retry-fixed.yaml -c examples/configs/fs-catalog-run_log.yaml --use-cached wrong-file-name```
-
- Note that we have specified the run_id of the failed execution to be ```use-cached``` for the new execution.
-
-
- ```yaml title="Pipeline that restarts"
- --8<-- "examples/retry-fixed.yaml"
- ```
-
- 1. Though this step is identical to the failed pipeline, this step does not execute in retry.
- 2. We mark this step to be stub to demonstrate a re-run using cached does not execute the
- successful task.
-
-
-
-=== "Fixed Run log"
-
- The retry pipeline is executed with success state.
-
- Note the execution of step ```Setup``` has been marked as ```mock: true```, this step
- has not been executed but passed through.
-
- The step ```Create Content``` has been modified to ```stub``` to prevent execution in the
- fixed pipeline.
-
- ```json linenums="1" hl_lines="15 34 51-96"
- {
- "run_id": "naive-wilson-0625",
- "dag_hash": "148de99f96565bb1b276db2baf23eba682615c76",
- "use_cached": true,
- "tag": "",
- "original_run_id": "wrong-file-name",
- "status": "SUCCESS",
- "steps": {
- "Setup": {
- "name": "Setup",
- "internal_name": "Setup",
- "status": "SUCCESS",
- "step_type": "stub",
- "message": "",
- "mock": true,
- "code_identities": [
- {
- "code_identifier": "f94e49a4fcecebac4d5eecbb5b691561b08e45c0",
- "code_identifier_type": "git",
- "code_identifier_dependable": true,
- "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
- "code_identifier_message": ""
- }
- ],
- "attempts": [],
- "user_defined_metrics": {},
- "branches": {},
- "data_catalog": []
- },
- "Create Content": {
- "name": "Create Content",
- "internal_name": "Create Content",
- "status": "SUCCESS",
- "step_type": "stub",
- "message": "",
- "mock": true,
- "code_identities": [
- {
- "code_identifier": "f94e49a4fcecebac4d5eecbb5b691561b08e45c0",
- "code_identifier_type": "git",
- "code_identifier_dependable": true,
- "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
- "code_identifier_message": ""
- }
- ],
- "attempts": [],
- "user_defined_metrics": {},
- "branches": {},
- "data_catalog": []
- },
- "Retrieve Content": {
- "name": "Retrieve Content",
- "internal_name": "Retrieve Content",
- "status": "SUCCESS",
- "step_type": "task",
- "message": "",
- "mock": false,
- "code_identities": [
- {
- "code_identifier": "f94e49a4fcecebac4d5eecbb5b691561b08e45c0",
- "code_identifier_type": "git",
- "code_identifier_dependable": true,
- "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
- "code_identifier_message": ""
- }
- ],
- "attempts": [
- {
- "attempt_number": 1,
- "start_time": "2024-02-07 06:25:13.506657",
- "end_time": "2024-02-07 06:25:13.527603",
- "duration": "0:00:00.020946",
- "status": "SUCCESS",
- "message": "",
- "parameters": {}
- }
- ],
- "user_defined_metrics": {},
- "branches": {},
- "data_catalog": [
- {
- "name": "data/hello.txt",
- "data_hash": "108ecead366a67c2bb17e223032e12629bcc21b4ab0fff77cf48a5b784f208c7",
- "catalog_relative_path": "data/hello.txt",
- "catalog_handler_location": ".catalog",
- "stage": "get"
- },
- {
- "name": "Retrieve_Content.execution.log",
- "data_hash": "bd8e06cb7432666dc3b1b0db8034966c034397863c7ff629c98ffd13966681d7",
- "catalog_relative_path": "naive-wilson-0625/Retrieve_Content.execution.log",
- "catalog_handler_location": ".catalog",
- "stage": "put"
- }
- ]
- },
- "success": {
- "name": "success",
- "internal_name": "success",
- "status": "SUCCESS",
- "step_type": "success",
- "message": "",
- "mock": false,
- "code_identities": [
- {
- "code_identifier": "f94e49a4fcecebac4d5eecbb5b691561b08e45c0",
- "code_identifier_type": "git",
- "code_identifier_dependable": true,
- "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
- "code_identifier_message": ""
- }
- ],
- "attempts": [
- {
- "attempt_number": 1,
- "start_time": "2024-02-07 06:25:13.597125",
- "end_time": "2024-02-07 06:25:13.597694",
- "duration": "0:00:00.000569",
- "status": "SUCCESS",
- "message": "",
- "parameters": {}
- }
- ],
- "user_defined_metrics": {},
- "branches": {},
- "data_catalog": []
- }
- },
- "parameters": {},
- "run_config": {
- "executor": {
- "service_name": "local",
- "service_type": "executor",
- "enable_parallel": false,
- "overrides": {}
- },
- "run_log_store": {
- "service_name": "file-system",
- "service_type": "run_log_store",
- "log_folder": ".run_log_store"
- },
- "secrets_handler": {
- "service_name": "do-nothing",
- "service_type": "secrets"
- },
- "catalog_handler": {
- "service_name": "file-system",
- "service_type": "catalog",
- "catalog_location": ".catalog"
- },
- "experiment_tracker": {
- "service_name": "do-nothing",
- "service_type": "experiment_tracker"
- },
- "pipeline_file": "examples/retry-fixed.yaml",
- "parameters_file": null,
- "configuration_file": "examples/configs/fs-catalog-run_log.yaml",
- "tag": "",
- "run_id": "naive-wilson-0625",
- "variables": {
- "argo_docker_image": "harbor.csis.astrazeneca.net/mlops/runnable:latest"
- },
- "use_cached": true,
- "original_run_id": "wrong-file-name",
- "dag": {
- "start_at": "Setup",
- "name": "",
- "description": "This is a simple pipeline that demonstrates passing data between steps.\n\n1. Setup: We setup a data folder, we ignore if it is already
- present\n2. Create Content: We create a \"hello.txt\" and \"put\" the file in catalog\n3. Clean up to get again: We remove the data folder. Note that this is stubbed
- to prevent\n accidental deletion of your contents. You can change type to task to make really run.\n4. Retrieve Content: We \"get\" the file \"hello.txt\" from the
- catalog and show the contents\n5. Cleanup: We remove the data folder. Note that this is stubbed to prevent accidental deletion.\n\n\nYou can run this pipeline by:\n
- runnable execute -f examples/catalog.yaml -c examples/configs/fs-catalog.yaml\n",
- "steps": {
- "Setup": {
- "type": "stub",
- "name": "Setup",
- "next": "Create Content",
- "on_failure": "",
- "overrides": {},
- "catalog": null,
- "max_attempts": 1,
- "command_type": "shell",
- "command": "mkdir -p data"
- },
- "Create Content": {
- "type": "stub",
- "name": "Create Content",
- "next": "Retrieve Content",
- "on_failure": "",
- "overrides": {},
- "catalog": {
- "get": [],
- "put": [
- "data/hello.txt"
- ]
- },
- "max_attempts": 1,
- "command_type": "shell",
- "command": "echo \"Hello from runnable\" >> data/hello.txt\n"
- },
- "Retrieve Content": {
- "type": "task",
- "name": "Retrieve Content",
- "next": "success",
- "on_failure": "",
- "overrides": {},
- "catalog": {
- "get": [
- "data/hello.txt"
- ],
- "put": []
- },
- "max_attempts": 1,
- "command_type": "shell",
- "command": "cat data/hello.txt",
- "node_name": "Retrieve Content"
- },
- "success": {
- "type": "success",
- "name": "success"
- },
- "fail": {
- "type": "fail",
- "name": "fail"
- }
- }
- },
- "dag_hash": "148de99f96565bb1b276db2baf23eba682615c76",
- "execution_plan": "chained"
- }
- }
- ```
-
-
-runnable also supports [```mocked``` executor](../configurations/executors/mocked.md) which can
-patch and mock tasks to isolate and focus on the failed task. Since python functions and notebooks
-are run in the same shell, it is possible to use
-[python debugger](https://docs.python.org/3/library/pdb.html) and
-[ploomber debugger](https://engine.ploomber.io/en/docs/user-guide/debugging/debuglater.html)
-to debug failed tasks.
diff --git a/docs/example/secrets.md b/docs/example/secrets.md
deleted file mode 100644
index 0a651870..00000000
--- a/docs/example/secrets.md
+++ /dev/null
@@ -1,46 +0,0 @@
-Secrets are required assets as the complexity of the application increases. runnable provides a
-[python API](../interactions.md/#runnable.get_secret) to get secrets from various sources.
-
-!!! info annotate inline end "from runnable import get_secret"
-
- Secrets is the only interface that you are required to "import runnable" in your python application.
-
- Native python and Jupyter notebooks can use this API. We currently do not support shell tasks with
- secrets from this interface. (1)
-
-1. Using environment variables to access secrets is one pattern works in all environments.
-
-=== "dotenv format"
-
- The dotenv format for providing secrets. Ideally, this file should not be part of the
- version control but present during development phase.
-
- The file is assumed to be present in ```examples/secrets.env``` for this example.
-
- ```shell linenums="1"
- --8<-- "examples/secrets.env"
- ```
-
- 1. Shell scripts style are supported.
- 2. Key value based format is also supported.
-
-
-=== "Example configuration"
-
- Configuration to use the dotenv format file.
-
- ```yaml linenums="1"
- --8<-- "examples/configs/dotenv.yaml"
- ```
-
- 1. Use dotenv secrets manager.
- 2. Location of the dotenv file, defaults to ```.env``` in project root.
-
-
-=== "Pipeline in python"
-
- ```python linenums="1" hl_lines="12-13"
- --8<-- "examples/secrets.py"
- ```
-
- 1. The key of the secret that you want to retrieve.
diff --git a/docs/example/steps.md b/docs/example/steps.md
deleted file mode 100644
index fd1d6175..00000000
--- a/docs/example/steps.md
+++ /dev/null
@@ -1,78 +0,0 @@
-runnable provides a rich definition of of step types.
-
-
-
-- [stub](../concepts/stub.md): A mock step which is handy during designing and debugging pipelines.
-- [task](../concepts/task.md): To execute python functions, jupyter notebooks, shell scripts.
-- [parallel](../concepts/parallel.md): To execute many tasks in parallel.
-- [map](../concepts/map.md): To execute the same task over a list of parameters. (1)
-
-
-
-1. Similar to ```map``` state in AWS step functions or ```loops``` in Argo workflows.
-
-
-## stub
-
-Used as a mock node or a placeholder before the actual implementation (1).
-{ .annotate }
-
-1. :raised_hand: Equivalent to ```pass``` or ```...``` in python.
-
-
-=== "yaml"
-
- ``` yaml
- --8<-- "examples/mocking.yaml"
- ```
-
-=== "python"
-
- ```python
- --8<-- "examples/mocking.py"
- ```
-
- 1. The name of the node can be as descriptive as you want. Only ```.``` or ```%``` are not allowed.
- 2. Stub nodes can take arbitrary parameters; useful to temporarily mock a node. You can define the dependency on step1 using ```depends_on```
- 3. ```terminate_with_success``` indicates that the dag is completed successfully. You can also use ```terminate_with_failure``` to indicate the dag failed.
- 4. Add ```success``` and ```fail``` nodes to the dag.
-
-
-## task
-
-Used to execute a single unit of work. You can use [python](../concepts/task.md/#python_functions),
-[shell](../concepts/task.md/#shell), [notebook](../concepts/task.md/#notebook) as command types.
-
-!!! note annotate "Execution logs"
-
- You can view the execution logs of the tasks in the [catalog](../concepts/catalog.md) without digging through the
- logs from the underlying executor.
-
-
-=== "Example functions"
-
- The below content is assumed to be ```examples/functions.py```
-
- ```python
- --8<-- "examples/functions.py"
- ```
-
-=== "yaml"
-
- ``` yaml
- --8<-- "examples/python-tasks.yaml"
- ```
-
- 1. Note that the ```command``` is the [path to the python function](../concepts/task.md/#python_functions).
- 2. ```python``` is default command type, you can use ```shell```, ```notebook``` too.
-
-=== "python"
-
- ```python
- --8<-- "examples/python-tasks.py"
- ```
-
- 1. Note that the command is the [path to the function](../concepts/task.md/#python_functions).
- 2. There are many ways to define dependencies within nodes, step1 >> step2, step1 << step2 or during the definition of step1, we can define a next step.
- 3. ```terminate_with_success``` indicates that the dag is completed successfully. You can also use ```terminate_with_failure``` to indicate the dag failed.
- 4. Add ```success``` and ```fail``` nodes to the dag.
diff --git a/docs/image.png b/docs/image.png
deleted file mode 100644
index 61b12597..00000000
Binary files a/docs/image.png and /dev/null differ
diff --git a/docs/index.md b/docs/index.md
index 228fd719..fe3abbe3 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -116,6 +116,7 @@ or [metaflow](https://metaflow.org/).
```runnable``` could also function as an SDK for _native_ orchestrators as it always compiles pipeline definitions
to _native_ orchestrators.
+
@@ -164,6 +165,9 @@ to _native_ orchestrators.
Unit test your code and pipelines.
+ - mock/patch the steps of the pipeline
+ - test your functions as you normally do.
+
[:octicons-arrow-right-24: Test](#)
diff --git a/docs/reference.md b/docs/reference.md
index 6d162633..c7313158 100644
--- a/docs/reference.md
+++ b/docs/reference.md
@@ -1,5 +1,5 @@
Please accompany the reference with ```examples``` from
-[the repo](https://github.com/AstraZeneca/runnable-core).
+[the repo](https://github.com/AstraZeneca/runnable).
@@ -16,27 +16,7 @@ Please accompany the reference with ```examples``` from
=== "yaml"
- Attributes:
-
- - ```name```: the name of the task
- - ```command```: the dotted path reference to the function.
- - ```next```: the next node to call if the function succeeds. Use ```success``` to terminate
- the pipeline successfully or ```fail``` to terminate with fail.
- - ```on_failure```: The next node in case of failure.
- - ```catalog```: mapping of cataloging items
- - ```overrides```: mapping of step overrides from global configuration.
-
- ```yaml
- dag:
- steps:
- name: <>
- type: task
- command: <>
- next: <>
- on_failure: <>
- catalog: # Any cataloging to be done.
- overrides: # mapping of overrides of global configuration
- ```
+ --8<-- "runnable/tasks.py:python_reference"
@@ -54,27 +34,7 @@ Please accompany the reference with ```examples``` from
=== "yaml"
- Attributes:
-
- - ```name```: the name of the task
- - ```command```: the path to the notebook relative to the project root.
- - ```next```: the next node to call if the function succeeds. Use ```success``` to terminate
- the pipeline successfully or ```fail``` to terminate with fail.
- - ```on_failure```: The next node in case of failure.
- - ```catalog```: mapping of cataloging items
- - ```overrides```: mapping of step overrides from global configuration.
-
- ```yaml
- dag:
- steps:
- name: <>
- type: task
- command: <>
- next: <>
- on_failure: <>
- catalog: # Any cataloging to be done.
- overrides: # mapping of overrides of global configuration
- ```
+ --8<-- "runnable/tasks.py:notebook_reference"
@@ -93,27 +53,7 @@ Please accompany the reference with ```examples``` from
=== "yaml"
- Attributes:
-
- - ```name```: the name of the task
- - ```command```: the path to the notebook relative to the project root.
- - ```next```: the next node to call if the function succeeds. Use ```success``` to terminate
- the pipeline successfully or ```fail``` to terminate with fail.
- - ```on_failure```: The next node in case of failure.
- - ```catalog```: mapping of cataloging items
- - ```overrides```: mapping of step overrides from global configuration.
-
- ```yaml
- dag:
- steps:
- name: <>
- type: task
- command: <>
- next: <>
- on_failure: <>
- catalog: # Any cataloging to be done.
- overrides: # mapping of overrides of global configuration
- ```
+ --8<-- "runnable/tasks.py:notebook_reference"
@@ -130,6 +70,8 @@ Please accompany the reference with ```examples``` from
=== "yaml"
+ --8<-- "runnable/extensions/nodes.py:stub_reference"
+
diff --git a/docs/roadmap.md b/docs/roadmap.md
deleted file mode 100644
index 25a85f72..00000000
--- a/docs/roadmap.md
+++ /dev/null
@@ -1,25 +0,0 @@
-## AWS environments
-
-Bring in native AWS services to orchestrate workflows. The stack should be:
-
-- AWS step functions.
-- Sagemaker jobs - Since they can take dynamic image name, AWS batch needs job definition and can be tricky.
-- S3 for Run log and Catalog: Already tested and working prototype.
-- AWS secrets manager: Access to AWS secrets manager via the RBAC of the execution role.
-
-
-## HPC environment using SLURM executor.
-
-- Without native orchestration tools, the preferred way is to run it as local but use SLURM to schedule jobs.
-
-## Database based Run log store.
-
-## Better integrations with experiment tracking tools.
-
-Currently, the implementation of experiment tracking tools within runnable is limited. It might be better to
-choose a good open source implementation and stick with it.
-
-
-## Model registry service
-
-Could be interesting to bring in a model registry to catalog models.
diff --git a/docs/sdk.md b/docs/sdk.md
deleted file mode 100644
index e989fc11..00000000
--- a/docs/sdk.md
+++ /dev/null
@@ -1,75 +0,0 @@
-::: runnable.Catalog
- options:
- show_root_heading: true
- show_bases: false
-
-
-
-::: runnable.Stub
- options:
- show_root_heading: true
- show_bases: false
-
-
-
-::: runnable.PythonTask
- options:
- show_root_heading: true
- show_bases: false
- show_docstring_description: true
-
-
-
-::: runnable.ShellTask
- options:
- show_root_heading: true
- show_bases: false
- show_docstring_description: true
-
-
-
-::: runnable.NotebookTask
- options:
- show_root_heading: true
- show_bases: false
- show_docstring_description: true
-
-
-
-::: runnable.Parallel
- options:
- show_root_heading: true
- show_bases: false
- show_docstring_description: true
-
-
-
-::: runnable.Map
- options:
- show_root_heading: true
- show_bases: false
- show_docstring_description: true
-
-
-
-::: runnable.Success
- options:
- show_root_heading: true
- show_bases: false
- show_docstring_description: true
-
-
-
-::: runnable.Fail
- options:
- show_root_heading: true
- show_bases: false
- show_docstring_description: true
-
-
-
-::: runnable.Pipeline
- options:
- show_root_heading: true
- show_bases: false
- show_docstring_description: true
diff --git a/docs/why-runnable.md b/docs/why-runnable.md
index 1ec6e91a..0b2ad4e1 100644
--- a/docs/why-runnable.md
+++ b/docs/why-runnable.md
@@ -1,11 +1,36 @@
-# Why runnable
+Obviously, there are a lot of orchestration tools. A well maintained and curated [list is
+available here](https://github.com/EthicalML/awesome-production-machine-learning/).
-**runnable** allows the data scientists/engineers to hook into production stack without
-knowledge of them. It offers a simpler abstraction of the concepts found in
-production stack thereby aligning to the production standards even during development.
+Broadly, they could be classed into ```native``` or ```meta``` orchestrators.
-**runnable** is not a end to end deployment platform but limited to be an aid during
-the development phase without modifying the production stack or application code.
+
+
+
+### __native orchestrators__
+
+- Focus on resource management, job scheduling, robustness and scalability.
+- Have less features on domain (data engineering, data science) activities.
+- Difficult to run locally.
+- Not ideal for quick experimentation or research activities.
+
+### __meta orchestrators__
+
+- An abstraction over native orchestrators.
+- Oriented towards domain (data engineering, data science) features.
+- Easy to get started and run locally.
+- Ideal for quick experimentation or research activities.
+
+```runnable``` is a _meta_ orchestrator with simple API, geared towards data engineering, data science projects.
+It works in conjunction with _native_ orchestrators and an alternative to [kedro](https://docs.kedro.org/en/stable/index.html)
+or [metaflow](https://metaflow.org/), in the design philosophy.
+
+```runnable``` could also function as an SDK for _native_ orchestrators as it always compiles pipeline definitions
+to _native_ orchestrators.
+
+
@@ -15,13 +40,17 @@ the development phase without modifying the production stack or application code
Your application code remains as it is. Runnable exists outside of it.
- [:octicons-arrow-right-24: Getting started](concepts/the-big-picture.md)
+ - No API's or decorators or any imposed structure.
+
+ [:octicons-arrow-right-24: Getting started](concepts/index.md)
- :building_construction:{ .lg .middle } __Bring your infrastructure__
---
- Runnable can be adapted to your infrastructure stack instead of dictating it.
+ ```runnable``` is not a platform. It works with your platforms.
+
+ - ```runnable``` composes pipeline definitions suited to your infrastructure.
[:octicons-arrow-right-24: Infrastructure](configurations/overview.md)
@@ -29,7 +58,8 @@ the development phase without modifying the production stack or application code
---
- Runnable tracks key information to reproduce the execution.
+ Runnable tracks key information to reproduce the execution. All this happens without
+ any additional code.
[:octicons-arrow-right-24: Run Log](concepts/run-log.md)
@@ -49,6 +79,9 @@ the development phase without modifying the production stack or application code
Unit test your code and pipelines.
+ - mock/patch the steps of the pipeline
+ - test your functions as you normally do.
+
[:octicons-arrow-right-24: Test](#)
@@ -59,79 +92,13 @@ the development phase without modifying the production stack or application code
Moving away from runnable is as simple as deleting relevant files.
+ - Your application code remains as it is.
-
-
-
-## Alternatives
-
-**runnable** as an SDK competes with
-
-[Kedro](https://github.com/kedro-org/kedro) and [metaflow](https://metaflow.org/) are also
-based on similar ideas and have established presence in this field. We took a lot of
-inspiration from these excellent projects when writing runnable.
-
-!!! note "Caveat"
-
- The scope of runnable is limited in comparison to metaflow. The below points are on
- the design philosophy rather that implementation specifics.
-
- The highlighted differences are subjective opinions and should be taken as preferences
- rather than criticisms.
+
+
+## Comparisons
-### Infrastructure
-
-Metaflow stipulates [infrastructure prerequisites](https://docs.metaflow.org/getting-started/infrastructure) that are established and validated across numerous scenarios.
-
-In contrast, runnable empowers engineering teams to define infrastructure specifications through a configuration file tailored to the stack they maintain. This versatility enables specialized teams to leverage their domain expertise, thereby enhancing the project's overall efficacy.
-
-As runnable is mostly responsible for translating workflows to infrastructure patterns, it can
-adapt to different environments.
-
-### Project structure
-
-Kedro and metaflow come with their own predefined project structures, which might be
-appealing to some users while others might find them restrictive.
-
-runnable, on the other hand, offers a more flexible approach. It doesn't impose a specific
-structure on your project. Whether you're working with Python functions, Jupyter notebooks,
-or shell scripts, runnable allows you to organize your work as you see fit. Even the location
-of the data folder can be tailored for each step, avoiding a one-size-fits-all design and
-providing the freedom to structure your project in a way that suits your preferences and
-requirements.
-
-
-### Notebook support
-
-Both metaflow and kedro do not support notebooks as tasks. Notebooks are great during the iterative
-phase of the project allowing for interactive development.
-
-runnable supports notebooks as tasks and has the ability to pass data/parameters between them
-to allow orchestrating notebooks.
-
-### Testing pipelines
-
-runnable supports patching and mocking tasks to test the end to end execution of the
-pipeline. It is not clear on how to achieve the same in kedro or metaflow.
-
-### Learning curve
-
-runnable allows tasks to stand on their own, separate from the orchestration system. Explaining and
-understanding these tasks is made easy through the use of simple "driver" functions. This approach
-makes it easier for anyone working on the project to get up to speed and maintain it, as the
-orchestration part of runnable remains distinct and straightforward.
-
-In contrast, learning to use Kedro and Metaflow can take more time because they have their own
-specific ways of structuring projects and code that users need to learn.
-
-### Language support
-
-Kedro and metaflow only support python based pipeline definitions. It is possible to
-run the non-python tasks as ```subprocesses``` in the pipeline tasks but the definition
-is only possible using the python API.
-
-runnable supports ```yaml``` based pipeline definitions and has ```shell``` tasks which
-can be used for non-python tasks.
+--8<-- "examples/comparisons/README.md"
diff --git a/examples/comparisons/README.md b/examples/comparisons/README.md
index 4e466ecf..a6f5dfb6 100644
--- a/examples/comparisons/README.md
+++ b/examples/comparisons/README.md
@@ -37,8 +37,11 @@ the below are the best of our understanding of the frameworks, please let us
know if there are better implementations.
-Along with the observations, we have implemented [MNIST example in pytorch](https://github.com/pytorch/examples/blob/main/mnist/main.py)
-in multiple frameworks for comparing actual implementations against popular examples.
+Along with the observations,
+
+- We have implemented [MNIST example in pytorch](https://github.com/pytorch/examples/blob/main/mnist/main.py)
+in multiple frameworks for easier practical comparison.
+- The tutorials are inspired from tutorials of popular frameworks to give a flavor of ```runnable```.
diff --git a/mkdocs.yml b/mkdocs.yml
index ab5ba67b..19a3c587 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -67,7 +67,7 @@ markdown_extensions:
alternate_style: true
- pymdownx.snippets:
base_path: "."
- dedent_subsections: true
+ # dedent_subsections: true
- pymdownx.inlinehilite
- pymdownx.highlight:
anchor_linenums: true
diff --git a/pyproject.toml b/pyproject.toml
index a068cdb8..22c8e770 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -82,7 +82,7 @@ pyflame = "^0.3.1"
# Plugins for Executors
[tool.poetry.plugins."executor"]
-"local" = "runnable.extensions.executor.local.implementation:LocalExecutor"
+"local" = "runnable.extensions.executor.local:LocalExecutor"
"local-container" = "runnable.extensions.executor.local_container.implementation:LocalContainerExecutor"
"argo" = "runnable.extensions.executor.argo.implementation:ArgoExecutor"
"mocked" = "runnable.extensions.executor.mocked.implementation:MockedExecutor"
diff --git a/runnable/extensions/executor/local/implementation.py b/runnable/extensions/executor/local.py
similarity index 97%
rename from runnable/extensions/executor/local/implementation.py
rename to runnable/extensions/executor/local.py
index c13ffec3..2b8b565c 100644
--- a/runnable/extensions/executor/local/implementation.py
+++ b/runnable/extensions/executor/local.py
@@ -19,8 +19,6 @@ class LocalExecutor(GenericExecutor):
Example config:
execution:
type: local
- config:
- enable_parallel: True or False to enable parallel.
"""
diff --git a/runnable/extensions/executor/local/__init__.py b/runnable/extensions/executor/local/__init__.py
deleted file mode 100644
index e69de29b..00000000
diff --git a/runnable/extensions/nodes.py b/runnable/extensions/nodes.py
index 45850311..a3b32d21 100644
--- a/runnable/extensions/nodes.py
+++ b/runnable/extensions/nodes.py
@@ -797,12 +797,27 @@ def fan_in(self, map_variable: TypeMapVariable = None, **kwargs):
class StubNode(ExecutableNode):
"""
Stub is a convenience design node.
-
It always returns success in the attempt log and does nothing.
This node is very similar to pass state in Step functions.
This node type could be handy when designing the pipeline and stubbing functions
+ --8<-- [start:stub_reference]
+ An stub execution node of the pipeline.
+ Please refer to define pipeline/tasks/stub for more information.
+
+ As part of the dag definition, a stub task is defined as follows:
+
+ dag:
+ steps:
+ stub_task: # The name of the node
+ type: stub
+ on_failure: The name of the step to traverse in case of failure
+ next: The next node to execute after this task, use "success" to terminate the pipeline successfully
+ or "fail" to terminate the pipeline with an error.
+
+ It can take arbritary number of parameters, which is handy to temporarily silence a task node.
+ --8<-- [end:stub_reference]
"""
node_type: str = Field(default="stub", serialization_alias="type")
diff --git a/runnable/tasks.py b/runnable/tasks.py
index 6d877b2d..a821a579 100644
--- a/runnable/tasks.py
+++ b/runnable/tasks.py
@@ -188,7 +188,56 @@ def task_return_to_parameter(task_return: TaskReturns, value: Any) -> Parameter:
class PythonTaskType(BaseTaskType): # pylint: disable=too-few-public-methods
- """The task class for python command."""
+ """
+ --8<-- [start:python_reference]
+ An execution node of the pipeline of python functions.
+ Please refer to define pipeline/tasks/python for more information.
+
+ As part of the dag definition, a python task is defined as follows:
+
+ dag:
+ steps:
+ python_task: # The name of the node
+ type: task
+ command_type: python # this is default
+ command: my_module.my_function # the dotted path to the function. Please refer to the yaml section of
+ define pipeline/tasks/python for concrete details.
+ returns:
+ - name: # The name to assign the return value
+ kind: json # the default value is json,
+ can be object for python objects and metric for metrics
+ secrets:
+ - my_secret_key # A list of secrets to expose by secrets manager
+ catalog:
+ get:
+ - A list of glob patterns to get from the catalog to the local file system
+ put:
+ - A list of glob patterns to put to the catalog from the local file system
+ on_failure: The name of the step to traverse in case of failure
+ overrides:
+ Individual tasks can override the global configuration config by referring to the
+ specific override.
+
+ For example,
+ #Global configuration
+ executor:
+ type: local-container
+ config:
+ docker_image: "runnable/runnable:latest"
+ overrides:
+ custom_docker_image:
+ docker_image: "runnable/runnable:custom"
+
+ ## In the node definition
+ overrides:
+ local-container:
+ docker_image: "runnable/runnable:custom"
+
+ This instruction will override the docker image for the local-container executor.
+ next: The next node to execute after this task, use "success" to terminate the pipeline successfully
+ or "fail" to terminate the pipeline with an error.
+ --8<-- [end:python_reference]
+ """
task_type: str = Field(default="python", serialization_alias="command_type")
command: str
@@ -277,7 +326,56 @@ def execute_command(
class NotebookTaskType(BaseTaskType):
- """The task class for Notebook based execution."""
+ """
+ --8<-- [start:notebook_reference]
+ An execution node of the pipeline of notebook execution.
+ Please refer to define pipeline/tasks/notebook for more information.
+
+ As part of the dag definition, a notebook task is defined as follows:
+
+ dag:
+ steps:
+ notebook_task: # The name of the node
+ type: task
+ command_type: notebook
+ command: the path to the notebook relative to project root.
+ optional_ploomber_args: a dictionary of arguments to be passed to ploomber engine
+ returns:
+ - name: # The name to assign the return value
+ kind: json # the default value is json,
+ can be object for python objects and metric for metrics
+ secrets:
+ - my_secret_key # A list of secrets to expose by secrets manager
+ catalog:
+ get:
+ - A list of glob patterns to get from the catalog to the local file system
+ put:
+ - A list of glob patterns to put to the catalog from the local file system
+ on_failure: The name of the step to traverse in case of failure
+ overrides:
+ Individual tasks can override the global configuration config by referring to the
+ specific override.
+
+ For example,
+ #Global configuration
+ executor:
+ type: local-container
+ config:
+ docker_image: "runnable/runnable:latest"
+ overrides:
+ custom_docker_image:
+ docker_image: "runnable/runnable:custom"
+
+ ## In the node definition
+ overrides:
+ local-container:
+ docker_image: "runnable/runnable:custom"
+
+ This instruction will override the docker image for the local-container executor.
+ next: The next node to execute after this task, use "success" to terminate the pipeline successfully
+ or "fail" to terminate the pipeline with an error.
+ --8<-- [end:notebook_reference]
+ """
task_type: str = Field(default="notebook", serialization_alias="command_type")
command: str
@@ -410,7 +508,54 @@ def execute_command(
class ShellTaskType(BaseTaskType):
"""
- The task class for shell based commands.
+ --8<-- [start:shell_reference]
+ An execution node of the pipeline of shell execution.
+ Please refer to define pipeline/tasks/shell for more information.
+
+ As part of the dag definition, a shell task is defined as follows:
+
+ dag:
+ steps:
+ shell_task: # The name of the node
+ type: task
+ command_type: shell
+ command: The command to execute, it could be multiline
+ optional_ploomber_args: a dictionary of arguments to be passed to ploomber engine
+ returns:
+ - name: # The name to assign the return value
+ kind: json # the default value is json,
+ can be object for python objects and metric for metrics
+ secrets:
+ - my_secret_key # A list of secrets to expose by secrets manager
+ catalog:
+ get:
+ - A list of glob patterns to get from the catalog to the local file system
+ put:
+ - A list of glob patterns to put to the catalog from the local file system
+ on_failure: The name of the step to traverse in case of failure
+ overrides:
+ Individual tasks can override the global configuration config by referring to the
+ specific override.
+
+ For example,
+ #Global configuration
+ executor:
+ type: local-container
+ config:
+ docker_image: "runnable/runnable:latest"
+ overrides:
+ custom_docker_image:
+ docker_image: "runnable/runnable:custom"
+
+ ## In the node definition
+ overrides:
+ local-container:
+ docker_image: "runnable/runnable:custom"
+
+ This instruction will override the docker image for the local-container executor.
+ next: The next node to execute after this task, use "success" to terminate the pipeline successfully
+ or "fail" to terminate the pipeline with an error.
+ --8<-- [end:shell_reference]
"""
task_type: str = Field(default="shell", serialization_alias="command_type")
diff --git a/tests/runnable/extensions/executor/test_local_executor.py b/tests/runnable/extensions/executor/test_local_executor.py
index b45df8bb..1f0b6121 100644
--- a/tests/runnable/extensions/executor/test_local_executor.py
+++ b/tests/runnable/extensions/executor/test_local_executor.py
@@ -1,4 +1,4 @@
-from runnable.extensions.executor.local.implementation import LocalExecutor
+from runnable.extensions.executor.local import LocalExecutor
def test_local_executor_execute_node_just_calls___execute_node(mocker, monkeypatch):