Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added docs for trigger interface #2806

Merged
merged 22 commits into from
Jun 27, 2024
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .typos.toml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ ret = "ret"
daa = "daa"
arange = "arange"
cachable = "cachable"
OT = "OT"

[default]
locale = "en-us"
Binary file added docs/book/.gitbook/assets/rest_api_step_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/book/.gitbook/assets/rest_api_step_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/book/component-guide/container-registries/aws.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ The ECR registry is automatically activated once you create an AWS account. Howe
* Go to the [ECR website](https://console.aws.amazon.com/ecr).
* Make sure the correct region is selected on the top right.
* Click on `Create repository`.
* Create a private repository. The name of the repository depends on the \[orchestrator] (../orchestrators/orchestrators.md or [step operator](../step-operators/step-operators.md) you're using in your stack.
* Create a private repository. The name of the repository depends on the [orchestrator](../orchestrators/orchestrators.md) or [step operator](../step-operators/step-operators.md) you're using in your stack.

### URI format

Expand Down
4 changes: 2 additions & 2 deletions docs/book/how-to/auth-management/docker-service-connector.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ The Docker Service Connector only supports authenticating to and granting access

The resource name identifies a Docker/OCI registry using one of the following formats (the repository name is optional and ignored).

* DockerHub: docker.io or \[https://]index.docker.io/v1/\[/\<repository-name>]
* generic OCI registry URI: http\[s]://host\[:port]\[/\<repository-name>]
* DockerHub: docker.io or `https://index.docker.io/v1/<repository-name>`
* generic OCI registry URI: `https://host:port/<repository-name>`

## Authentication Methods

Expand Down
39 changes: 39 additions & 0 deletions docs/book/how-to/build-pipelines/compose-pipelines.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
---
description: Reuse steps between pipelines.
---

# Compose pipelines

Sometimes it can be useful to extract some common functionality into separate functions
in order to avoid code duplication. To facilitate this, ZenML allows you to compose your pipelines:

```python
from zenml import pipeline

@pipeline
def data_loading_pipeline(mode: str):
if mode == "train":
data = training_data_loader_step()
else:
data = test_data_loader_step()

processed_data = preprocessing_step(data)
return processed_data


@pipeline
def training_pipeline():
training_data = data_loading_pipeline(mode="train")
model = training_step(data=training_data)
test_data = data_loading_pipeline(mode="test")
evaluation_step(model=model, data=test_data)
```

{% hint style="info" %}
strickvl marked this conversation as resolved.
Show resolved Hide resolved
Here we are calling one pipeline from within another pipeline, so functionally the `data_loading_pipeline` is functioning as a step within the `training_pipeline`, i.e. the steps of the former are added to the latter. Only the parent pipeline will be visible in the dashboard. In order to actually trigger a pipeline from another, see [here](../trigger-pipelines/trigger-a-pipeline-from-another.md)
{% endhint %}

<table data-view="cards"><thead><tr><th></th><th></th><th></th><th data-hidden data-card-target data-type="content-ref"></th></tr></thead><tbody><tr><td>Learn more about orchestrators here</td><td></td><td></td><td><a href="../../component-guide/orchestrators/orchestrators.md">orchestrators.md</a></td></tr></tbody></table>

<!-- For scarf -->
<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
description: Configuring a pipeline at runtime.
---

# Runtime configuration of a pipeline run

It is often the case that there is a need to run a pipeline with a different configuration.
In this case, you should in most cases use the [`pipeline.with_options`](../use-configuration-files/README.md) method. You can do this:

1. Either by explicitly configuring options like `with_options(steps="trainer": {"parameters": {"param1": 1}})`
2. Or by passing a YAML file using `with_options(config_file="path_to_yaml_file")`.

You can learn more about these options [here](../use-configuration-files/README.md).

However, there is one exception: if you would like to trigger a pipeline from the client
or another pipeline, you would need to pass the `PipelineRunConfiguration` object.
Learn more about this [here](../trigger-pipelines/trigger-a-pipeline-from-another.md).

<table data-view="cards"><thead><tr><th></th><th></th><th></th><th data-hidden data-card-target data-type="content-ref"></th></tr></thead><tbody><tr><td>Using config files</td><td></td><td></td><td><a href="../use-configuration-files/README.md">../use-configuration-files/README.md</a></td></tr></tbody></table>

<!-- For scarf -->
<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -302,82 +302,7 @@ If you would like to disable artifact metadata extraction altogether, you can se

## Skipping materialization

{% hint style="warning" %}
Skipping materialization might have unintended consequences for downstream tasks that rely on materialized artifacts. Only skip materialization if there is no other way to do what you want to do.
{% endhint %}

While materializers should in most cases be used to control how artifacts are returned and consumed from pipeline steps, you might sometimes need to have a completely unmaterialized artifact in a step, e.g., if you need to know the exact path to where your artifact is stored.

An unmaterialized artifact is a `zenml.materializers.UnmaterializedArtifact`. Among others, it has a property `uri` that points to the unique path in the artifact store where the artifact is persisted. One can use an unmaterialized artifact by specifying `UnmaterializedArtifact` as the type in the step:

```python
from zenml.artifacts.unmaterialized_artifact import UnmaterializedArtifact
from zenml import step


@step
def my_step(my_artifact: UnmaterializedArtifact): # rather than pd.DataFrame
pass
```

#### Example

The following shows an example of how unmaterialized artifacts can be used in the steps of a pipeline. The pipeline we define will look like this:

```shell
s1 -> s3
s2 -> s4
```

`s1` and `s2` produce identical artifacts, however `s3` consumes materialized artifacts while `s4` consumes unmaterialized artifacts. `s4` can now use the `dict_.uri` and `list_.uri` paths directly rather than their materialized counterparts.

```python
from typing_extensions import Annotated # or `from typing import Annotated on Python 3.9+
from typing import Dict, List, Tuple

from zenml.artifacts.unmaterialized_artifact import UnmaterializedArtifact
from zenml import pipeline, step


@step
def step_1() -> Tuple[
Annotated[Dict[str, str], "dict_"],
Annotated[List[str], "list_"],
]:
return {"some": "data"}, []


@step
def step_2() -> Tuple[
Annotated[Dict[str, str], "dict_"],
Annotated[List[str], "list_"],
]:
return {"some": "data"}, []


@step
def step_3(dict_: Dict, list_: List) -> None:
assert isinstance(dict_, dict)
assert isinstance(list_, list)


@step
def step_4(
dict_: UnmaterializedArtifact,
list_: UnmaterializedArtifact,
) -> None:
print(dict_.uri)
print(list_.uri)


@pipeline
def example_pipeline():
step_3(*step_1())
step_4(*step_2())


example_pipeline()
```
You can learn more about skipping materialization [here](unmaterialized-artifacts.md).

## Interaction with custom artifact stores

Expand Down
94 changes: 94 additions & 0 deletions docs/book/how-to/handle-data-artifacts/unmaterialized-artifacts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
---
description: Skip materialization of artifacts.
---

# Unmaterialized artifacts

A ZenML pipeline is built in a data-centric way. The outputs and inputs of steps define how steps are connected and the order in which they are executed. Each step should be considered as its very own process that reads and writes its inputs and outputs from and to the [artifact store](../../component-guide/artifact-stores/artifact-stores.md). This is where **materializers** come into play.

A materializer dictates how a given artifact can be written to and retrieved from the artifact store and also contains all serialization and deserialization logic. Whenever you pass artifacts as outputs from one pipeline step to other steps as inputs, the corresponding materializer for the respective data type defines how this artifact is first serialized and written to the artifact store, and then deserialized and read in the next step. Read more about this [here](handle-custom-data-types.md).

However, there are instances where you might **not** want to materialize an artifact in a step, but rather use a reference to it instead.
This is where skipping materialization comes in.

{% hint style="warning" %}
Skipping materialization might have unintended consequences for downstream tasks that rely on materialized artifacts. Only skip materialization if there is no other way to do what you want to do.
{% endhint %}

## How to skip materialization

While materializers should in most cases be used to control how artifacts are returned and consumed from pipeline steps, you might sometimes need to have a completely unmaterialized artifact in a step, e.g., if you need to know the exact path to where your artifact is stored.

An unmaterialized artifact is a [`zenml.materializers.UnmaterializedArtifact`](https://sdkdocs.zenml.io/latest/core_code_docs/core-artifacts/#zenml.artifacts.unmaterialized_artifact). Among others, it has a property `uri` that points to the unique path in the artifact store where the artifact is persisted. One can use an unmaterialized artifact by specifying `UnmaterializedArtifact` as the type in the step:

```python
from zenml.artifacts.unmaterialized_artifact import UnmaterializedArtifact
from zenml import step

@step
def my_step(my_artifact: UnmaterializedArtifact): # rather than pd.DataFrame
pass
```

## Code Example

The following shows an example of how unmaterialized artifacts can be used in the steps of a pipeline. The pipeline we define will look like this:

```shell
s1 -> s3
s2 -> s4
```

`s1` and `s2` produce identical artifacts, however `s3` consumes materialized artifacts while `s4` consumes unmaterialized artifacts. `s4` can now use the `dict_.uri` and `list_.uri` paths directly rather than their materialized counterparts.

```python
from typing_extensions import Annotated # or `from typing import Annotated on Python 3.9+
from typing import Dict, List, Tuple

from zenml.artifacts.unmaterialized_artifact import UnmaterializedArtifact
from zenml import pipeline, step


@step
def step_1() -> Tuple[
Annotated[Dict[str, str], "dict_"],
Annotated[List[str], "list_"],
]:
return {"some": "data"}, []


@step
def step_2() -> Tuple[
Annotated[Dict[str, str], "dict_"],
Annotated[List[str], "list_"],
]:
return {"some": "data"}, []


@step
def step_3(dict_: Dict, list_: List) -> None:
assert isinstance(dict_, dict)
assert isinstance(list_, list)


@step
def step_4(
dict_: UnmaterializedArtifact,
list_: UnmaterializedArtifact,
) -> None:
print(dict_.uri)
print(list_.uri)


@pipeline
def example_pipeline():
step_3(*step_1())
step_4(*step_2())


example_pipeline()
```

You can see another example of using an `UnmaterializedArtifact` when triggering a [pipeline from another](../trigger-pipelines/trigger-a-pipeline-from-another.md).

<figure><img src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" alt="ZenML Scarf"><figcaption></figcaption></figure>
47 changes: 47 additions & 0 deletions docs/book/how-to/trigger-pipelines/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
description: >-
There are numerous ways to trigger a pipeline, apart from
calling the runner script.
---

# 🚨 Trigger a pipeline

A pipeline can be run via Python like this:

```python
@step # Just add this decorator
def load_data() -> dict:
training_data = [[1, 2], [3, 4], [5, 6]]
labels = [0, 1, 0]
return {'features': training_data, 'labels': labels}


@step
def train_model(data: dict) -> None:
total_features = sum(map(sum, data['features']))
total_labels = sum(data['labels'])

# Train some model here

print(f"Trained model using {len(data['features'])} data points. "
f"Feature sum is {total_features}, label sum is {total_labels}")


@pipeline # This function combines steps together
def simple_ml_pipeline():
dataset = load_data()
train_model(dataset)
```

You can now run this pipeline by simply calling the function:

```python
simple_ml_pipeline()
```

However, there are other ways to trigger a pipeline, specifically a pipeline with a remote stack (remote
orchestrator, artifact store, and container registry).

<table data-view="cards"><thead><tr><th></th><th></th><th></th><th data-hidden data-card-target data-type="content-ref"></th></tr></thead><tbody><tr><td>Trigger a pipeline from Python SDK</td><td></td><td></td><td><a href="trigger-a-pipeline-from-client.md">trigger-a-pipeline-from-client.md</a></td></tr><tr><td>Trigger a pipeline from another</td><td></td><td></td><td><a href="trigger-a-pipeline-from-another.md">trigger-a-pipeline-from-another.md</a></td></tr><tr><td>Trigger a pipeline from the REST API</td><td></td><td></td><td><a href="trigger-a-pipeline-from-rest-api.md">trigger-a-pipeline-from-rest-api.md</a></td></tr></tbody></table>

<figure><img src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" alt="ZenML Scarf"><figcaption></figcaption></figure>
Loading
Loading