Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: docs and code #132

Merged
merged 1 commit into from
Feb 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -397,3 +397,4 @@ Execute a pipeline over an iterable parameter.
[![](https://mermaid.ink/img/pako:eNqVlF1rwjAUhv9KyG4qKNR-3AS2m8nuBgN3Z0Sy5tQG20SSdE7E_76kVVEr2CY3Ied9Tx6Sk3PAmeKACc5LtcsKpi36nlGZFbXciHwfLN79CuWiBLMcEULWGkBSaeosA2OCxbxdXMd89Get2bZASsLiSyuvQE2mJZXIjW27t2rOmQZ3Gp9rD6UjatWnwy7q6zPPukd50WTydmemEiS_QbQ79RwxGoQY9UaMuojRA8TCXexzyHgQZNwbMu5Cxl3IXNX6OWMyiDHpzZh0GZMHjOK3xz2mgxjT3oxplzG9MPp5_nVOhwJjteDwOg3HyFj3L1dCcvh7DUc-iftX18n6Waet1xX8cG908vpKHO6OW7cvkeHm5GR2b3drdvaSGTODHLW37mxabYC8fLgRhlfxpjNdwmEets-Dx7gCXTHBXQc8-D2KbQEVUEzckjO9oZjKo9Ox2qr5XmaYWF3DGNdbzizMBHOVVWGSs9K4XeDCKv3ZttSmsx7_AYa341E?type=png)](https://mermaid.live/edit#pako:eNqVlF1rwjAUhv9KyG4qKNR-3AS2m8nuBgN3Z0Sy5tQG20SSdE7E_76kVVEr2CY3Ied9Tx6Sk3PAmeKACc5LtcsKpi36nlGZFbXciHwfLN79CuWiBLMcEULWGkBSaeosA2OCxbxdXMd89Get2bZASsLiSyuvQE2mJZXIjW27t2rOmQZ3Gp9rD6UjatWnwy7q6zPPukd50WTydmemEiS_QbQ79RwxGoQY9UaMuojRA8TCXexzyHgQZNwbMu5Cxl3IXNX6OWMyiDHpzZh0GZMHjOK3xz2mgxjT3oxplzG9MPp5_nVOhwJjteDwOg3HyFj3L1dCcvh7DUc-iftX18n6Waet1xX8cG908vpKHO6OW7cvkeHm5GR2b3drdvaSGTODHLW37mxabYC8fLgRhlfxpjNdwmEets-Dx7gCXTHBXQc8-D2KbQEVUEzckjO9oZjKo9Ox2qr5XmaYWF3DGNdbzizMBHOVVWGSs9K4XeDCKv3ZttSmsx7_AYa341E)

### [Arbitrary nesting](https://astrazeneca.github.io/magnus-core/concepts/nesting/)
Any nesting of parallel within map and so on.
16 changes: 9 additions & 7 deletions docs/concepts/catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,11 @@ For example, a local directory structure partitioned by a ```run_id``` or S3 buc

The directory structure within a partition is the same as the project directory structure. This enables you to
get/put data in the catalog as if you are working with local directory structure. Every interaction with the catalog
(either by API or configuration) results in an entry in the [```run log```](/concepts/run-log/#step_log)
(either by API or configuration) results in an entry in the [```run log```](../concepts/run-log.md/#step_log)

Internally, magnus also uses the catalog to store execution logs of tasks i.e stdout and stderr from
[python](/concepts/task/#python) or [shell](/concepts/task/#shell) and executed notebook from [notebook tasks](/concepts/task/#notebook).
[python](../concepts/task.md/#python) or [shell](../concepts/task.md/#shell) and executed notebook
from [notebook tasks](../concepts/task.md/#notebook).

Since the catalog captures the data files flowing through the pipeline and the execution logs, it enables you
to debug failed pipelines or keep track of data lineage.
Expand Down Expand Up @@ -448,11 +449,11 @@ The execution results in the ```catalog``` populated with the artifacts and the

## Using python API

Files could also be cataloged using [python API](/interactions)
Files could also be cataloged using [python API](../interactions.md)


This functionality is possible in [python](/concepts/task/#python_functions)
and [notebook](/concepts/task/#notebook) tasks.
This functionality is possible in [python](../concepts/task.md/#python_functions)
and [notebook](../concepts/task.md/#notebook) tasks.

```python linenums="1" hl_lines="11 23 35 45"
--8<-- "examples/concepts/catalog_api.py"
Expand All @@ -463,9 +464,10 @@ and [notebook](/concepts/task/#notebook) tasks.

## Passing Data Objects

Data objects can be shared between [python](/concepts/task/#python_functions) or [notebook](/concepts/task/#notebook) tasks,
Data objects can be shared between [python](../concepts/task.md/#python_functions) or
[notebook](../concepts/task.md/#notebook) tasks,
instead of serializing data and deserializing to file structure, using
[get_object](/interactions/#magnus.get_object) and [put_object](/interactions/#magnus.put_object).
[get_object](../interactions.md/#magnus.get_object) and [put_object](../interactions.md/#magnus.put_object).

Internally, we use [pickle](https:/docs.python.org/3/library/pickle.html) to serialize and
deserialize python objects. Please ensure that the object can be serialized via pickle.
Expand Down
24 changes: 12 additions & 12 deletions docs/concepts/executor.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Executors are the heart of magnus, they traverse the workflow and execute the tasks within the
workflow while coordinating with different services
(eg. [run log](/concepts/run-log), [catalog](/concepts/catalog), [secrets](/concepts/secrets) etc)
(eg. [run log](../concepts/run-log.md), [catalog](../concepts/catalog.md), [secrets](../concepts/secrets.md) etc)

To enable workflows run in varied computational environments, we distinguish between two core functions of
any workflow engine.
Expand Down Expand Up @@ -61,7 +61,7 @@ translated to argo specification just by changing the configuration.
In this configuration, we are using [argo workflows](https://argoproj.github.io/argo-workflows/)
as our workflow engine. We are also instructing the workflow engine to use a docker image,
```magnus:demo``` defined in line #4, as our execution environment. Please read
[containerised environments](/configurations/executors/container-environments) for more information.
[containerised environments](../configurations/executors/container-environments.md) for more information.

Since magnus needs to track the execution status of the workflow, we are using a ```run log```
which is persistent and available in for jobs in kubernetes environment.
Expand Down Expand Up @@ -195,7 +195,7 @@ translated to argo specification just by changing the configuration.
```


As seen from the above example, once a [pipeline is defined in magnus](/concepts/pipeline) either via yaml or SDK, we can
As seen from the above example, once a [pipeline is defined in magnus](../concepts/pipeline.md) either via yaml or SDK, we can
run the pipeline in different environments just by providing a different configuration. Most often, there is
no need to change the code or deviate from standard best practices while coding.

Expand Down Expand Up @@ -287,22 +287,22 @@ def execute_single_node(workflow, step_name, configuration):
##### END POST EXECUTION #####
```

1. The [run log](/concepts/run-log) maintains the state of the execution of the tasks and subsequently the pipeline. It also
1. The [run log](../concepts/run-log.md) maintains the state of the execution of the tasks and subsequently the pipeline. It also
holds the latest state of parameters along with captured metrics.
2. The [catalog](/concepts/catalog) contains the information about the data flowing through the pipeline. You can get/put
2. The [catalog](../concepts/catalog.md) contains the information about the data flowing through the pipeline. You can get/put
artifacts generated during the current execution of the pipeline to a central storage.
3. Read the workflow and get the [step definition](/concepts/task) which holds the ```command``` or ```function``` to
3. Read the workflow and get the [step definition](../concepts/task.md) which holds the ```command``` or ```function``` to
execute along with the other optional information.
4. Any artifacts from previous steps that are needed to execute the current step can be
[retrieved from the catalog](/concepts/catalog).
[retrieved from the catalog](../concepts/catalog.md).
5. The current function or step might need only some of the
[parameters casted as pydantic models](/concepts/task/#accessing_parameters), filter and cast them appropriately.
[parameters casted as pydantic models](../concepts/task.md/#accessing_parameters), filter and cast them appropriately.
6. At this point in time, we have the required parameters and data to execute the actual command. The command can
internally request for more data using the [python API](/interactions) or record
[experiment tracking metrics](/concepts/experiment-tracking).
internally request for more data using the [python API](..//interactions.md) or record
[experiment tracking metrics](../concepts/experiment-tracking.md).
7. If the task failed, we update the run log with that information and also raise an exception for the
workflow engine to handle. Any [on-failure](/concepts/pipeline/#on_failure) traversals are already handled
workflow engine to handle. Any [on-failure](../concepts/pipeline.md/#on_failure) traversals are already handled
as part of the workflow definition.
8. Upon successful execution, we update the run log with current state of parameters for downstream steps.
9. Any artifacts generated from this step are [put into the central storage](/concepts/catalog) for downstream steps.
9. Any artifacts generated from this step are [put into the central storage](../concepts/catalog.md) for downstream steps.
10. We send a success message to the workflow engine and mark the step as completed.
12 changes: 6 additions & 6 deletions docs/concepts/experiment-tracking.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Overview

[Run log](/concepts/run-log) stores a lot of information about the execution along with the metrics captured
[Run log](../concepts/run-log.md) stores a lot of information about the execution along with the metrics captured
during the execution of the pipeline.


Expand All @@ -9,7 +9,7 @@ during the execution of the pipeline.

=== "Using the API"

The highlighted lines in the below example show how to [use the API](/interactions/#magnus.track_this)
The highlighted lines in the below example show how to [use the API](../interactions.md/#magnus.track_this)

Any pydantic model as a value would be dumped as a dict, respecting the alias, before tracking it.

Expand Down Expand Up @@ -207,7 +207,7 @@ The step is defaulted to be 0.

=== "Using the API"

The highlighted lines in the below example show how to [use the API](/interactions/#magnus.track_this) with
The highlighted lines in the below example show how to [use the API](../interactions.md/#magnus.track_this) with
the step parameter.

You can run this example by ```python run examples/concepts/experiment_tracking_step.py```
Expand Down Expand Up @@ -452,17 +452,17 @@ Since mlflow does not support step wise logging of parameters, the key name is f
=== "In mlflow UI"

<figure markdown>
![Image](/assets/screenshots/mlflow.png){ width="800" height="600"}
![Image](../assets/screenshots/mlflow.png){ width="800" height="600"}
<figcaption>mlflow UI for the execution. The run_id remains the same as the run_id of magnus</figcaption>
</figure>

<figure markdown>
![Image title](/assets/screenshots/mlflow_step.png){ width="800" height="600"}
![Image title](../assets/screenshots/mlflow_step.png){ width="800" height="600"}
<figcaption>The step wise metric plotted as a graph in mlflow</figcaption>
</figure>



To provide implementation specific capabilities, we also provide a
[python API](/interactions/#magnus.get_experiment_tracker_context) to obtain the client context. The default
[python API](../interactions.md/#magnus.get_experiment_tracker_context) to obtain the client context. The default
client context is a [null context manager](https://docs.python.org/3/library/contextlib.html#contextlib.nullcontext).
4 changes: 2 additions & 2 deletions docs/concepts/map.md
Original file line number Diff line number Diff line change
Expand Up @@ -829,7 +829,7 @@ of the files to process.
## Traversal

A branch of a map step is considered success only if the ```success``` step is reached at the end.
The steps of the pipeline can fail and be handled by [on failure](/concepts/pipeline/#on_failure) and
The steps of the pipeline can fail and be handled by [on failure](../concepts/pipeline.md/#on_failure) and
redirected to ```success``` if that is the desired behavior.

The map step is considered successful only if all the branches of the step have terminated successfully.
Expand All @@ -838,7 +838,7 @@ The map step is considered successful only if all the branches of the step have
## Parameters

All the tasks defined in the branches of the map pipeline can
[access to parameters and data as usual](/concepts/task).
[access to parameters and data as usual](../concepts/task.md).


!!! warning
Expand Down
3 changes: 2 additions & 1 deletion docs/concepts/nesting.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
As seen from the definitions of [parallel](/concepts/parallel) or [map](/concepts/map), the branches are pipelines
As seen from the definitions of [parallel](../concepts/parallel.md) or
[map](../concepts/map.md), the branches are pipelines
themselves. This allows for deeply nested workflows in **magnus**.

Technically there is no limit in the depth of nesting but there are some practical considerations.
Expand Down
8 changes: 4 additions & 4 deletions docs/concepts/parallel.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Parallel nodes in magnus allows you to run multiple pipelines in parallel and us
All the steps in the below example are ```stubbed``` for convenience. The functionality is similar
even if the steps are execution units like ```tasks``` or any other nodes.

We support deeply [nested steps](/concepts/nesting). For example, a step in the parallel branch can be a ```map``` which internally
We support deeply [nested steps](../concepts/nesting.md). For example, a step in the parallel branch can be a ```map``` which internally
loops over a ```dag``` and so on. Though this functionality is useful, it can be difficult to debug and
understand in large code bases.

Expand Down Expand Up @@ -549,15 +549,15 @@ ensemble model happens only after both models are (successfully) trained.


All pipelines, nested or parent, have the same structure as defined in
[pipeline definition](/concepts/pipeline).
[pipeline definition](../concepts/pipeline.md).

The parent pipeline defines a step ```Train models``` which is a parallel step.
The branches, XGBoost and RF model, are pipelines themselves.

## Traversal

A branch of a parallel step is considered success only if the ```success``` step is reached at the end.
The steps of the pipeline can fail and be handled by [on failure](/concepts/pipeline/#on_failure) and
The steps of the pipeline can fail and be handled by [on failure](../concepts/pipeline.md/#on_failure) and
redirected to ```success``` if that is the desired behavior.

The parallel step is considered successful only if all the branches of the step have terminated successfully.
Expand All @@ -566,7 +566,7 @@ The parallel step is considered successful only if all the branches of the step
## Parameters

All the tasks defined in the branches of the parallel pipeline can
[access to parameters and data as usual](/concepts/task).
[access to parameters and data as usual](../concepts/task.md).


!!! warning
Expand Down
8 changes: 4 additions & 4 deletions docs/concepts/parameters.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
In magnus, ```parameters``` are python data types that can be passed from one ```task```
to the next ```task```. These parameters can be accessed by the ```task``` either as
environment variables, arguments of the ```python function``` or using the
[API](/interactions).
[API](../interactions.md).

## Initial parameters

The initial parameters of the pipeline can set by using a ```yaml``` file and presented
during execution

```--parameters-file, -parameters``` while using the [magnus CLI](/usage/#usage)
```--parameters-file, -parameters``` while using the [magnus CLI](../usage.md/#usage)

or by using ```parameters_file``` with [the sdk](/sdk/#magnus.Pipeline.execute).
or by using ```parameters_file``` with [the sdk](..//sdk.md/#magnus.Pipeline.execute).

They can also be set using environment variables which override the parameters defined by the file.

Expand Down Expand Up @@ -42,5 +42,5 @@ They can also be set using environment variables which override the parameters d
## Parameters flow

Tasks can access and return parameters and the patterns are specific to the
```command_type``` of the task nodes. Please refer to [tasks](/concepts/task)
```command_type``` of the task nodes. Please refer to [tasks](../concepts/task.md)
for more information.
14 changes: 7 additions & 7 deletions docs/concepts/run-log.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ when running the ```command``` of a task.

=== "pipeline"

This is the same example [described in tasks](/concepts/task/#shell).
This is the same example [described in tasks](../concepts/task.md/#shell).

tl;dr a pipeline that consumes some initial parameters and passes them
to the next step. Both the steps are ```shell``` based tasks.
Expand Down Expand Up @@ -389,7 +389,7 @@ A snippet from the above example:
- For non-nested steps, the key is the name of the step. For example, the first entry
in the steps mapping is "access initial" which corresponds to the name of the task in
the pipeline. For nested steps, the step log is also nested and shown in more detail for
[parallel](/concepts/parallel), [map](/concepts/map).
[parallel](../concepts/parallel.md), [map](../concepts/map.md).

- ```status```: In line #5 is the status of the step with three possible states,
```SUCCESS```, ```PROCESSING``` or ```FAILED```
Expand Down Expand Up @@ -426,12 +426,12 @@ end time, duration of the execution and the parameters at the time of execution
}
```

- ```user_defined_metrics```: are any [experiment tracking metrics](/concepts/task/#experiment_tracking)
- ```user_defined_metrics```: are any [experiment tracking metrics](../concepts/task.md/#experiment_tracking)
captured during the execution of the step.

- ```branches```: This only applies to parallel, map or dag steps and shows the logs captured during the
execution of the branch.
- ```data_catalog```: Captures any data flowing through the tasks by the [catalog](/concepts/catalog).
- ```data_catalog```: Captures any data flowing through the tasks by the [catalog](../concepts/catalog.md).
By default, the execution logs of the task are put in the catalog for easier debugging purposes.

For example, the below lines from the snippet specifies one entry into the catalog which is the execution log
Expand Down Expand Up @@ -463,7 +463,7 @@ reproduced in local environments and fixed.
- non-nested, linear pipelines
- non-chunked run log store

[mocked executor](/configurations/executors/mocked) provides better support in debugging failures.
[mocked executor](../configurations/executors/mocked.md) provides better support in debugging failures.


### Example
Expand Down Expand Up @@ -1237,10 +1237,10 @@ reproduced in local environments and fixed.
## API

Tasks can access the ```run log``` during the execution of the step
[using the API](/interactions/#magnus.get_run_log). The run log returned by this method is a deep copy
[using the API](../interactions.md/#magnus.get_run_log). The run log returned by this method is a deep copy
to prevent any modifications.


Tasks can also access the ```run_id``` of the current execution either by
[using the API](/interactions/#magnus.get_run_id) or by the environment
[using the API](../interactions.md/#magnus.get_run_id) or by the environment
variable ```MAGNUS_RUN_ID```.
4 changes: 2 additions & 2 deletions docs/concepts/secrets.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Most complex pipelines require secrets to hold sensitive information during task
They could be database credentials, API keys or any information that need to present at
the run-time but invisible at all other times.

Magnus provides a [clean API](/interactions/#magnus.get_secret) to access secrets
Magnus provides a [clean API](../interactions.md/#magnus.get_secret) to access secrets
and independent of the actual secret provider, the interface remains the same.

A typical example would be a task requiring the database connection string to connect
Expand All @@ -29,7 +29,7 @@ class CustomObject:
# Do something with the secrets
```

Please refer to [configurations](/configurations/secrets) for available implementations.
Please refer to [configurations](../configurations/secrets.md) for available implementations.

## Example

Expand Down
Loading
Loading