Logging (#134)

* feat: moving to runnable * feat: Bringing in docs * feat: Bring in more renaming * feat: SDK execute will work * feat: simplified data passing * feat: making local executor only serial * fix: bug with parallel execution * feat: SDK function can be a function insted of dotted path * docs: fixing docs * feat: removing re-run from entrypoints * feat: retry executor * docs: updating docs * ci: no PR check on docs * docs: updating readme * docs: updating readme * docs: updating readme * docs: still working through it * fix: getting the json parameter working * feat: Tasks can send back objects now * feat: returns of all tasks is complete * feat: tasks can return objects * docs: fixing docs * fix: fixing parameters * feat: working parameters * feat: working parameters * fix: removing tutorial * fix: removing tutorial * fix: map can return params * fix: map can return params * feat: Notebooks can pass objects between themselves * fix: notebooks can return object parameters but cannot consume * feat: map nodes and reducer functionality * chore: isort * chore: isort * chore: isort * chore: isort * fix: examples, sdk execute * docs: working on improving the docs * feat: simpler sdk traversals * docs: still working on it * fix: removing tracker and fixing some bugs * feat: metrics added * feat: Improved logging and presentation
AstraZeneca · Apr 9, 2024 · 94e3839 · 94e3839
1 parent 9215f88
commit 94e3839
Show file tree

Hide file tree

Showing 262 changed files with 7,993 additions and 7,832 deletions.
diff --git a/.dockerignore b/.dockerignore
@@ -0,0 +1,8 @@
+docs/
+.github/
+.mypy_cache/
+.pytest_cache/
+.ruff_cache/
+.tox/
+.scripts/
+.tests/
diff --git a/.github/workflows/pr.yaml b/.github/workflows/pr.yaml
@@ -1,5 +1,10 @@
 on:
   pull_request:
+    paths-ignore:
+      - "docs/**"
+      - "**.md"
+      - "examples/**"
+      - "mkdocs.yml"
     branches:
       - "main"
 

diff --git a/.github/workflows/release.yaml b/.github/workflows/release.yaml
@@ -1,5 +1,9 @@
 on:
   push:
+    paths-ignore:
+      - "docs/**"
+      - "**.md"
+      - "examples/**"
     branches:
       - "main"
       - "rc"

diff --git a/.gitignore b/.gitignore
@@ -152,5 +152,3 @@ cov.xml
 .DS_Store
 
 data/
-
-example_bak/
diff --git a/.pylintrc b/.pylintrc
diff --git a/.python-version b/.python-version
@@ -0,0 +1 @@
+3.9
diff --git a/README.md b/README.md
diff --git a/assets/favicon.png b/assets/favicon.png
diff --git a/assets/logo-readme.png b/assets/logo-readme.png
diff --git a/assets/logo.png b/assets/logo.png
diff --git a/assets/work.png b/assets/work.png
diff --git a/docs/.DS_Store b/docs/.DS_Store
diff --git a/docs/assets/cropped.png b/docs/assets/cropped.png
diff --git a/docs/assets/favicon.png b/docs/assets/favicon.png
diff --git a/docs/assets/logo.png b/docs/assets/logo.png
diff --git a/docs/assets/logo1.png b/docs/assets/logo1.png
diff --git a/docs/assets/speed.png b/docs/assets/speed.png
diff --git a/docs/assets/sport.png b/docs/assets/sport.png
diff --git a/docs/assets/whatdo.png b/docs/assets/whatdo.png
diff --git a/docs/assets/work.png b/docs/assets/work.png
diff --git a/docs/concepts/catalog.md b/docs/concepts/catalog.md
@@ -4,6 +4,8 @@
     data between tasks. The default configuration of ```do-nothing``` is no-op by design.
     We kindly request to raise a feature request to make us aware of the eco-system.
 
+# TODO: Simplify this
+
 Catalog provides a way to store and retrieve data generated by the individual steps of the dag to downstream
 steps of the dag. It can be any storage system that indexes its data by a unique identifier.
 
@@ -20,7 +22,7 @@ The directory structure within a partition is the same as the project directory
 get/put data in the catalog as if you are working with local directory structure. Every interaction with the catalog
 (either by API or configuration) results in an entry in the [```run log```](../concepts/run-log.md/#step_log)
 
-Internally, magnus also uses the catalog to store execution logs of tasks i.e stdout and stderr from
+Internally, runnable also uses the catalog to store execution logs of tasks i.e stdout and stderr from
 [python](../concepts/task.md/#python) or [shell](../concepts/task.md/#shell) and executed notebook
 from [notebook tasks](../concepts/task.md/#notebook).
 
@@ -153,7 +155,7 @@ The execution results in the ```catalog``` populated with the artifacts and the
                     "code_identifier": "6029841c3737fe1163e700b4324d22a469993bb0",
                     "code_identifier_type": "git",
                     "code_identifier_dependable": true,
-                    "code_identifier_url": "https://github.com/AstraZeneca/magnus-core.git",
+                    "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
                     "code_identifier_message": ""
                 }
             ],
@@ -199,7 +201,7 @@ The execution results in the ```catalog``` populated with the artifacts and the
                     "code_identifier": "6029841c3737fe1163e700b4324d22a469993bb0",
                     "code_identifier_type": "git",
                     "code_identifier_dependable": true,
-                    "code_identifier_url": "https://github.com/AstraZeneca/magnus-core.git",
+                    "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
                     "code_identifier_message": ""
                 }
             ],
@@ -245,7 +247,7 @@ The execution results in the ```catalog``` populated with the artifacts and the
                     "code_identifier": "6029841c3737fe1163e700b4324d22a469993bb0",
                     "code_identifier_type": "git",
                     "code_identifier_dependable": true,
-                    "code_identifier_url": "https://github.com/AstraZeneca/magnus-core.git",
+                    "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
                     "code_identifier_message": ""
                 }
             ],
@@ -284,7 +286,7 @@ The execution results in the ```catalog``` populated with the artifacts and the
                     "code_identifier": "6029841c3737fe1163e700b4324d22a469993bb0",
                     "code_identifier_type": "git",
                     "code_identifier_dependable": true,
-                    "code_identifier_url": "https://github.com/AstraZeneca/magnus-core.git",
+                    "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
                     "code_identifier_message": ""
                 }
             ],
@@ -337,7 +339,7 @@ The execution results in the ```catalog``` populated with the artifacts and the
                     "code_identifier": "6029841c3737fe1163e700b4324d22a469993bb0",
                     "code_identifier_type": "git",
                     "code_identifier_dependable": true,
-                    "code_identifier_url": "https://github.com/AstraZeneca/magnus-core.git",
+                    "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
                     "code_identifier_message": ""
                 }
             ],
@@ -467,7 +469,7 @@ and [notebook](../concepts/task.md/#notebook) tasks.
 Data objects can be shared between [python](../concepts/task.md/#python_functions) or
 [notebook](../concepts/task.md/#notebook) tasks,
 instead of serializing data and deserializing to file structure, using
-[get_object](../interactions.md/#magnus.get_object) and [put_object](../interactions.md/#magnus.put_object).
+[get_object](../interactions.md/#runnable.get_object) and [put_object](../interactions.md/#runnable.put_object).
 
 Internally, we use [pickle](https:/docs.python.org/3/library/pickle.html) to serialize and
 deserialize python objects. Please ensure that the object can be serialized via pickle.

diff --git a/docs/concepts/executor.md b/docs/concepts/executor.md
@@ -1,4 +1,7 @@
-Executors are the heart of magnus, they traverse the workflow and execute the tasks within the
+
+## TODO: Simplify
+
+Executors are the heart of runnable, they traverse the workflow and execute the tasks within the
 workflow while coordinating with different services
 (eg. [run log](../concepts/run-log.md), [catalog](../concepts/catalog.md), [secrets](../concepts/secrets.md) etc)
 
@@ -23,7 +26,7 @@ any workflow engine.
 
 ## Graph Traversal
 
-In magnus, the graph traversal can be performed by magnus itself or can be handed over to other
+In runnable, the graph traversal can be performed by runnable itself or can be handed over to other
 orchestration frameworks (e.g Argo workflows, AWS step functions).
 
 ### Example
@@ -44,7 +47,7 @@ translated to argo specification just by changing the configuration.
 
     You can execute the pipeline in default configuration by:
 
-    ```magnus execute -f examples/concepts/task_shell_simple.yaml```
+    ```runnable execute -f examples/concepts/task_shell_simple.yaml```
 
     ``` yaml linenums="1"
     --8<-- "examples/configs/default.yaml"
@@ -60,16 +63,16 @@ translated to argo specification just by changing the configuration.
 
     In this configuration, we are using [argo workflows](https://argoproj.github.io/argo-workflows/)
     as our workflow engine. We are also instructing the workflow engine to use a docker image,
-    ```magnus:demo``` defined in line #4, as our execution environment. Please read
+    ```runnable:demo``` defined in line #4, as our execution environment. Please read
     [containerised environments](../configurations/executors/container-environments.md) for more information.
 
-    Since magnus needs to track the execution status of the workflow, we are using a ```run log```
+    Since runnable needs to track the execution status of the workflow, we are using a ```run log```
     which is persistent and available in for jobs in kubernetes environment.
 
 
     You can execute the pipeline in argo configuration by:
 
-    ```magnus execute -f examples/concepts/task_shell_simple.yaml -c examples/configs/argo-config.yaml```
+    ```runnable execute -f examples/concepts/task_shell_simple.yaml -c examples/configs/argo-config.yaml```
 
     ``` yaml linenums="1"
     --8<-- "examples/configs/argo-config.yaml"
@@ -78,7 +81,7 @@ translated to argo specification just by changing the configuration.
     1. Use argo workflows as the execution engine to run the pipeline.
     2. Run this docker image for every step of the pipeline. The docker image should have the same directory structure
     as the project directory.
-    3. Mount the volume from Kubernetes persistent volumes (magnus-volume) to /mnt directory.
+    3. Mount the volume from Kubernetes persistent volumes (runnable-volume) to /mnt directory.
     4. Resource constraints for the container runtime.
     5. Since every step runs in a container, the run log should be persisted. Here we are using the file-system as our
     run log store.
@@ -94,20 +97,20 @@ translated to argo specification just by changing the configuration.
     - The graph traversal rules follow the the same rules as our workflow. The
     step ```success-success-ou7qlf``` in line #15 only happens if the step ```shell-task-dz3l3t```
     defined in line #12 succeeds.
-    - The execution fails if any of the tasks fail. Both argo workflows and magnus ```run log```
+    - The execution fails if any of the tasks fail. Both argo workflows and runnable ```run log```
     mark the execution as failed.
 
 
     ```yaml linenums="1"
     apiVersion: argoproj.io/v1alpha1
     kind: Workflow
     metadata:
-      generateName: magnus-dag-
+      generateName: runnable-dag-
       annotations: {}
       labels: {}
     spec:
       activeDeadlineSeconds: 172800
-      entrypoint: magnus-dag
+      entrypoint: runnable-dag
       podGC:
         strategy: OnPodCompletion
       retryStrategy:
@@ -119,7 +122,7 @@ translated to argo specification just by changing the configuration.
           maxDuration: '3600'
       serviceAccountName: default-editor
       templates:
-        - name: magnus-dag
+        - name: runnable-dag
           failFast: true
           dag:
             tasks:
@@ -131,9 +134,9 @@ translated to argo specification just by changing the configuration.
                 depends: shell-task-4jy8pl.Succeeded
         - name: shell-task-4jy8pl
           container:
-            image: magnus:demo
+            image: runnable:demo
             command:
-              - magnus
+              - runnable
               - execute_single_node
               - '{{workflow.parameters.run_id}}'
               - shell
@@ -156,9 +159,9 @@ translated to argo specification just by changing the configuration.
                 cpu: 250m
         - name: success-success-djhm6j
           container:
-            image: magnus:demo
+            image: runnable:demo
             command:
-              - magnus
+              - runnable
               - execute_single_node
               - '{{workflow.parameters.run_id}}'
               - success
@@ -189,13 +192,13 @@ translated to argo specification just by changing the configuration.
       volumes:
         - name: executor-0
           persistentVolumeClaim:
-            claimName: magnus-volume
+            claimName: runnable-volume
 
 
     ```
 
 
-As seen from the above example, once a [pipeline is defined in magnus](../concepts/pipeline.md) either via yaml or SDK, we can
+As seen from the above example, once a [pipeline is defined in runnable](../concepts/pipeline.md) either via yaml or SDK, we can
 run the pipeline in different environments just by providing a different configuration. Most often, there is
 no need to change the code or deviate from standard best practices while coding.
 
@@ -204,11 +207,11 @@ no need to change the code or deviate from standard best practices while coding.
 
 !!! note
 
-    This section is to understand the internal mechanism of magnus and not required if you just want to
+    This section is to understand the internal mechanism of runnable and not required if you just want to
     use different executors.
 
 
-Independent of traversal, all the tasks are executed within the ```context``` of magnus.
+Independent of traversal, all the tasks are executed within the ```context``` of runnable.
 
 A closer look at the actual task implemented as part of transpiled workflow in argo
 specification details the inner workings. Below is a snippet of the argo specification from
@@ -217,9 +220,9 @@ lines 18 to 34.
 ```yaml linenums="18"
 - name: shell-task-dz3l3t
   container:
-    image: magnus-example:latest
+    image: runnable-example:latest
     command:
-    - magnus
+    - runnable
     - execute_single_node
     - '{{workflow.parameters.run_id}}'
     - shell
@@ -235,17 +238,17 @@ lines 18 to 34.
 ```
 
 The actual ```command``` to run is not the ```command``` defined in the workflow,
-i.e ```echo hello world```, but a command in the CLI of magnus which specifies the workflow file,
+i.e ```echo hello world```, but a command in the CLI of runnable which specifies the workflow file,
 the step name and the configuration file.
 
-### Context of magnus
+### Context of runnable
 
 Any ```task``` defined by the user as part of the workflow always runs as a *sub-command* of
-magnus. In that sense, magnus follows the
+runnable. In that sense, runnable follows the
 [decorator pattern](https://en.wikipedia.org/wiki/Decorator_pattern) without being part of the
 application codebase.
 
-In a very simplistic sense, the below stubbed-code explains the context of magnus during
+In a very simplistic sense, the below stubbed-code explains the context of runnable during
 execution of a task.
 
 ```python linenums="1"

diff --git a/docs/concepts/experiment-tracking.md b/docs/concepts/experiment-tracking.md
@@ -9,7 +9,7 @@ during the execution of the pipeline.
 
 === "Using the API"
 
-    The highlighted lines in the below example show how to [use the API](../interactions.md/#magnus.track_this)
+    The highlighted lines in the below example show how to [use the API](../interactions.md/#runnable.track_this)
 
     Any pydantic model as a value would be dumped as a dict, respecting the alias, before tracking it.
 
@@ -61,7 +61,7 @@ during the execution of the pipeline.
                         "code_identifier": "793b052b8b603760ff1eb843597361219832b61c",
                         "code_identifier_type": "git",
                         "code_identifier_dependable": true,
-                        "code_identifier_url": "https://github.com/AstraZeneca/magnus-core.git",
+                        "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
                         "code_identifier_message": ""
                     }
                 ],
@@ -106,7 +106,7 @@ during the execution of the pipeline.
                         "code_identifier": "793b052b8b603760ff1eb843597361219832b61c",
                         "code_identifier_type": "git",
                         "code_identifier_dependable": true,
-                        "code_identifier_url": "https://github.com/AstraZeneca/magnus-core.git",
+                        "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
                         "code_identifier_message": ""
                     }
                 ],
@@ -162,7 +162,7 @@ during the execution of the pipeline.
                 "start_at": "shell",
                 "name": "",
                 "description": "An example pipeline to demonstrate setting experiment tracking metrics\nusing environment variables. Any environment variable with
-                prefix\n'MAGNUS_TRACK_' will be recorded as a metric captured during the step.\n\nYou can run this pipeline as:\n  magnus execute -f
+                prefix\n'runnable_TRACK_' will be recorded as a metric captured during the step.\n\nYou can run this pipeline as:\n  runnable execute -f
                 examples/concepts/experiment_tracking_env.yaml\n",
                 "internal_branch_name": "",
                 "steps": {
@@ -207,7 +207,7 @@ The step is defaulted to be 0.
 
 === "Using the API"
 
-    The highlighted lines in the below example show how to [use the API](../interactions.md/#magnus.track_this) with
+    The highlighted lines in the below example show how to [use the API](../interactions.md/#runnable.track_this) with
     the step parameter.
 
     You can run this example by ```python run examples/concepts/experiment_tracking_step.py```
@@ -247,7 +247,7 @@ The step is defaulted to be 0.
                         "code_identifier": "858c4df44f15d81139341641c63ead45042e0d89",
                         "code_identifier_type": "git",
                         "code_identifier_dependable": true,
-                        "code_identifier_url": "https://github.com/AstraZeneca/magnus-core.git",
+                        "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
                         "code_identifier_message": ""
                     }
                 ],
@@ -301,7 +301,7 @@ The step is defaulted to be 0.
                         "code_identifier": "858c4df44f15d81139341641c63ead45042e0d89",
                         "code_identifier_type": "git",
                         "code_identifier_dependable": true,
-                        "code_identifier_url": "https://github.com/AstraZeneca/magnus-core.git",
+                        "code_identifier_url": "https://github.com/AstraZeneca/runnable-core.git",
                         "code_identifier_message": ""
                     }
                 ],
@@ -393,14 +393,14 @@ The step is defaulted to be 0.
 !!! note "Opt out"
 
     Pipelines need not use the ```experiment-tracking``` if the preferred tools of choice is
-    not implemented in magnus. The default configuration of ```do-nothing``` is no-op by design.
+    not implemented in runnable. The default configuration of ```do-nothing``` is no-op by design.
     We kindly request to raise a feature request to make us aware of the eco-system.
 
 
-The default experiment tracking tool of magnus is a no-op as the ```run log``` captures all the
+The default experiment tracking tool of runnable is a no-op as the ```run log``` captures all the
 required details. To make it compatible with other experiment tracking tools like
 [mlflow](https://mlflow.org/docs/latest/tracking.html) or
-[Weights and Biases](https://wandb.ai/site/experiment-tracking), we map attributes of magnus
+[Weights and Biases](https://wandb.ai/site/experiment-tracking), we map attributes of runnable
 to the underlying tool.
 
 For example, for mlflow:
@@ -420,7 +420,7 @@ Since mlflow does not support step wise logging of parameters, the key name is f
 
 !!! note inline end "Shortcomings"
 
-    Experiment tracking capabilities of magnus are inferior in integration with
+    Experiment tracking capabilities of runnable are inferior in integration with
     popular python frameworks like pytorch and tensorflow as compared to other
     experiment tracking tools.
 
@@ -453,7 +453,7 @@ Since mlflow does not support step wise logging of parameters, the key name is f
 
     <figure markdown>
         ![Image](../assets/screenshots/mlflow.png){ width="800" height="600"}
-        <figcaption>mlflow UI for the execution. The run_id remains the same as the run_id of magnus</figcaption>
+        <figcaption>mlflow UI for the execution. The run_id remains the same as the run_id of runnable</figcaption>
     </figure>
 
     <figure markdown>
@@ -464,5 +464,5 @@ Since mlflow does not support step wise logging of parameters, the key name is f
 
 
 To provide implementation specific capabilities, we also provide a
-[python API](../interactions.md/#magnus.get_experiment_tracker_context) to obtain the client context. The default
+[python API](../interactions.md/#runnable.get_experiment_tracker_context) to obtain the client context. The default
 client context is a [null context manager](https://docs.python.org/3/library/contextlib.html#contextlib.nullcontext).