zenml-io · htahir1 · Jun 27, 2024 · Jun 25, 2024 · Jun 25, 2024 · Jun 25, 2024
diff --git a/.typos.toml b/.typos.toml
@@ -34,6 +34,7 @@ ret = "ret"
 daa = "daa"
 arange = "arange"
 cachable = "cachable"
+OT = "OT"
 
 [default]
 locale = "en-us"
diff --git a/docs/book/.gitbook/assets/rest_api_step_1.png b/docs/book/.gitbook/assets/rest_api_step_1.png
diff --git a/docs/book/.gitbook/assets/rest_api_step_2.png b/docs/book/.gitbook/assets/rest_api_step_2.png
diff --git a/docs/book/component-guide/container-registries/aws.md b/docs/book/component-guide/container-registries/aws.md
@@ -20,7 +20,7 @@ The ECR registry is automatically activated once you create an AWS account. Howe
 * Go to the [ECR website](https://console.aws.amazon.com/ecr).
 * Make sure the correct region is selected on the top right.
 * Click on `Create repository`.
-* Create a private repository. The name of the repository depends on the \[orchestrator] (../orchestrators/orchestrators.md or [step operator](../step-operators/step-operators.md) you're using in your stack.
+* Create a private repository. The name of the repository depends on the [orchestrator](../orchestrators/orchestrators.md) or [step operator](../step-operators/step-operators.md) you're using in your stack.
 
 ### URI format
 

diff --git a/docs/book/how-to/auth-management/docker-service-connector.md b/docs/book/how-to/auth-management/docker-service-connector.md
@@ -25,8 +25,8 @@ The Docker Service Connector only supports authenticating to and granting access
 
 The resource name identifies a Docker/OCI registry using one of the following formats (the repository name is optional and ignored).
 
-* DockerHub: docker.io or \[https://]index.docker.io/v1/\[/\<repository-name>]
-* generic OCI registry URI: http\[s]://host\[:port]\[/\<repository-name>]
+* DockerHub: docker.io or `https://index.docker.io/v1/<repository-name>`
+* generic OCI registry URI: `https://host:port/<repository-name>`
 
 ## Authentication Methods
 

diff --git a/docs/book/how-to/build-pipelines/compose-pipelines.md b/docs/book/how-to/build-pipelines/compose-pipelines.md
@@ -0,0 +1,39 @@
+---
+description: Reuse steps between pipelines.
+---
+
+# Compose pipelines
+
+Sometimes it can be useful to extract some common functionality into separate functions
+in order to avoid code duplication. To facilitate this, ZenML allows you to compose your pipelines:
+
+```python
+from zenml import pipeline
+
+@pipeline
+def data_loading_pipeline(mode: str):
+    if mode == "train":
+        data = training_data_loader_step()
+    else:
+        data = test_data_loader_step()
+
+    processed_data = preprocessing_step(data)
+    return processed_data
+
+
+@pipeline
+def training_pipeline():
+    training_data = data_loading_pipeline(mode="train")
+    model = training_step(data=training_data)
+    test_data = data_loading_pipeline(mode="test")
+    evaluation_step(model=model, data=test_data)
+```
+
+{% hint style="info" %}
+Here we are calling one pipeline from within another pipeline, so functionally the `data_loading_pipeline` is functioning as a step within the `training_pipeline`, i.e. the steps of the former are added to the latter. Only the parent pipeline will be visible in the dashboard. In order to actually trigger a pipeline from another, see [here](../trigger-pipelines/trigger-a-pipeline-from-another.md)
+{% endhint %}
+
+<table data-view="cards"><thead><tr><th></th><th></th><th></th><th data-hidden data-card-target data-type="content-ref"></th></tr></thead><tbody><tr><td>Learn more about orchestrators here</td><td></td><td></td><td><a href="../../component-guide/orchestrators/orchestrators.md">orchestrators.md</a></td></tr></tbody></table>
+
+<!-- For scarf -->
+<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>
diff --git a/docs/book/how-to/build-pipelines/configuring-a-pipeline-at-runtime.md b/docs/book/how-to/build-pipelines/configuring-a-pipeline-at-runtime.md
@@ -0,0 +1,22 @@
+---
+description: Configuring a pipeline at runtime.
+---
+
+# Runtime configuration of a pipeline run
+
+It is often the case that there is a need to run a pipeline with a different configuration.
+In this case, you should in most cases use the [`pipeline.with_options`](../use-configuration-files/README.md) method. You can do this:
+
+1. Either by explicitly configuring options like `with_options(steps="trainer": {"parameters": {"param1": 1}})`
+2. Or by passing a YAML file using `with_options(config_file="path_to_yaml_file")`.
+
+You can learn more about these options [here](../use-configuration-files/README.md).
+
+However, there is one exception: if you would like to trigger a pipeline from the client
+or another pipeline, you would need to pass the `PipelineRunConfiguration` object.
+Learn more about this [here](../trigger-pipelines/trigger-a-pipeline-from-another.md).
+
+<table data-view="cards"><thead><tr><th></th><th></th><th></th><th data-hidden data-card-target data-type="content-ref"></th></tr></thead><tbody><tr><td>Using config files</td><td></td><td></td><td><a href="../use-configuration-files/README.md">../use-configuration-files/README.md</a></td></tr></tbody></table>
+
+<!-- For scarf -->
+<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>
diff --git a/docs/book/how-to/build-pipelines/trigger-a-pipeline-from-another.md b/docs/book/how-to/build-pipelines/trigger-a-pipeline-from-another.md
diff --git a/docs/book/how-to/handle-data-artifacts/handle-custom-data-types.md b/docs/book/how-to/handle-data-artifacts/handle-custom-data-types.md
@@ -302,82 +302,7 @@ If you would like to disable artifact metadata extraction altogether, you can se
 
 ## Skipping materialization
 
-{% hint style="warning" %}
-Skipping materialization might have unintended consequences for downstream tasks that rely on materialized artifacts. Only skip materialization if there is no other way to do what you want to do.
-{% endhint %}
-
-While materializers should in most cases be used to control how artifacts are returned and consumed from pipeline steps, you might sometimes need to have a completely unmaterialized artifact in a step, e.g., if you need to know the exact path to where your artifact is stored.
-
-An unmaterialized artifact is a `zenml.materializers.UnmaterializedArtifact`. Among others, it has a property `uri` that points to the unique path in the artifact store where the artifact is persisted. One can use an unmaterialized artifact by specifying `UnmaterializedArtifact` as the type in the step:
-
-```python
-from zenml.artifacts.unmaterialized_artifact import UnmaterializedArtifact
-from zenml import step
-
-
-@step
-def my_step(my_artifact: UnmaterializedArtifact):  # rather than pd.DataFrame
-    pass
-```
-
-#### Example
-
-The following shows an example of how unmaterialized artifacts can be used in the steps of a pipeline. The pipeline we define will look like this:
-
-```shell
-s1 -> s3 
-s2 -> s4
-```
-
-`s1` and `s2` produce identical artifacts, however `s3` consumes materialized artifacts while `s4` consumes unmaterialized artifacts. `s4` can now use the `dict_.uri` and `list_.uri` paths directly rather than their materialized counterparts.
-
-```python
-from typing_extensions import Annotated  # or `from typing import Annotated on Python 3.9+
-from typing import Dict, List, Tuple
-
-from zenml.artifacts.unmaterialized_artifact import UnmaterializedArtifact
-from zenml import pipeline, step
-
-
-@step
-def step_1() -> Tuple[
-    Annotated[Dict[str, str], "dict_"],
-    Annotated[List[str], "list_"],
-]:
-    return {"some": "data"}, []
-
-
-@step
-def step_2() -> Tuple[
-    Annotated[Dict[str, str], "dict_"],
-    Annotated[List[str], "list_"],
-]:
-    return {"some": "data"}, []
-
-
-@step
-def step_3(dict_: Dict, list_: List) -> None:
-    assert isinstance(dict_, dict)
-    assert isinstance(list_, list)
-
-
-@step
-def step_4(
-        dict_: UnmaterializedArtifact,
-        list_: UnmaterializedArtifact,
-) -> None:
-    print(dict_.uri)
-    print(list_.uri)
-
-
-@pipeline
-def example_pipeline():
-    step_3(*step_1())
-    step_4(*step_2())
-
-
-example_pipeline()
-```
+You can learn more about skipping materialization [here](unmaterialized-artifacts.md).
 
 ## Interaction with custom artifact stores
 

diff --git a/docs/book/how-to/handle-data-artifacts/unmaterialized-artifacts.md b/docs/book/how-to/handle-data-artifacts/unmaterialized-artifacts.md
@@ -0,0 +1,94 @@
+---
+description: Skip materialization of artifacts.
+---
+
+# Unmaterialized artifacts
+
+A ZenML pipeline is built in a data-centric way. The outputs and inputs of steps define how steps are connected and the order in which they are executed. Each step should be considered as its very own process that reads and writes its inputs and outputs from and to the [artifact store](../../component-guide/artifact-stores/artifact-stores.md). This is where **materializers** come into play.
+
+A materializer dictates how a given artifact can be written to and retrieved from the artifact store and also contains all serialization and deserialization logic. Whenever you pass artifacts as outputs from one pipeline step to other steps as inputs, the corresponding materializer for the respective data type defines how this artifact is first serialized and written to the artifact store, and then deserialized and read in the next step. Read more about this [here](handle-custom-data-types.md).
+
+However, there are instances where you might **not** want to materialize an artifact in a step, but rather use a reference to it instead.
+This is where skipping materialization comes in.
+
+{% hint style="warning" %}
+Skipping materialization might have unintended consequences for downstream tasks that rely on materialized artifacts. Only skip materialization if there is no other way to do what you want to do.
+{% endhint %}
+
+## How to skip materialization
+
+While materializers should in most cases be used to control how artifacts are returned and consumed from pipeline steps, you might sometimes need to have a completely unmaterialized artifact in a step, e.g., if you need to know the exact path to where your artifact is stored.
+
+An unmaterialized artifact is a [`zenml.materializers.UnmaterializedArtifact`](https://sdkdocs.zenml.io/latest/core_code_docs/core-artifacts/#zenml.artifacts.unmaterialized_artifact). Among others, it has a property `uri` that points to the unique path in the artifact store where the artifact is persisted. One can use an unmaterialized artifact by specifying `UnmaterializedArtifact` as the type in the step:
+
+```python
+from zenml.artifacts.unmaterialized_artifact import UnmaterializedArtifact
+from zenml import step
+
+@step
+def my_step(my_artifact: UnmaterializedArtifact):  # rather than pd.DataFrame
+    pass
+```
+
+## Code Example
+
+The following shows an example of how unmaterialized artifacts can be used in the steps of a pipeline. The pipeline we define will look like this:
+
+```shell
+s1 -> s3 
+s2 -> s4
+```
+
+`s1` and `s2` produce identical artifacts, however `s3` consumes materialized artifacts while `s4` consumes unmaterialized artifacts. `s4` can now use the `dict_.uri` and `list_.uri` paths directly rather than their materialized counterparts.
+
+```python
+from typing_extensions import Annotated  # or `from typing import Annotated on Python 3.9+
+from typing import Dict, List, Tuple
+
+from zenml.artifacts.unmaterialized_artifact import UnmaterializedArtifact
+from zenml import pipeline, step
+
+
+@step
+def step_1() -> Tuple[
+    Annotated[Dict[str, str], "dict_"],
+    Annotated[List[str], "list_"],
+]:
+    return {"some": "data"}, []
+
+
+@step
+def step_2() -> Tuple[
+    Annotated[Dict[str, str], "dict_"],
+    Annotated[List[str], "list_"],
+]:
+    return {"some": "data"}, []
+
+
+@step
+def step_3(dict_: Dict, list_: List) -> None:
+    assert isinstance(dict_, dict)
+    assert isinstance(list_, list)
+
+
+@step
+def step_4(
+        dict_: UnmaterializedArtifact,
+        list_: UnmaterializedArtifact,
+) -> None:
+    print(dict_.uri)
+    print(list_.uri)
+
+
+@pipeline
+def example_pipeline():
+    step_3(*step_1())
+    step_4(*step_2())
+
+
+example_pipeline()
+```
+
+You can see another example of using an `UnmaterializedArtifact` when triggering a [pipeline from another](../trigger-pipelines/trigger-a-pipeline-from-another.md).
+
+<figure><img src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" alt="ZenML Scarf"><figcaption></figcaption></figure>
diff --git a/docs/book/how-to/trigger-pipelines/README.md b/docs/book/how-to/trigger-pipelines/README.md
@@ -0,0 +1,47 @@
+---
+description: >-
+  There are numerous ways to trigger a pipeline, apart from
+  calling the runner script.
+---
+
+# 🚨 Trigger a pipeline
+
+A pipeline can be run via Python like this:
+
+```python
+@step  # Just add this decorator
+def load_data() -> dict:
+    training_data = [[1, 2], [3, 4], [5, 6]]
+    labels = [0, 1, 0]
+    return {'features': training_data, 'labels': labels}
+
+
+@step
+def train_model(data: dict) -> None:
+    total_features = sum(map(sum, data['features']))
+    total_labels = sum(data['labels'])
+
+    # Train some model here
+
+    print(f"Trained model using {len(data['features'])} data points. "
+          f"Feature sum is {total_features}, label sum is {total_labels}")
+
+
+@pipeline  # This function combines steps together 
+def simple_ml_pipeline():
+    dataset = load_data()
+    train_model(dataset)
+```
+
+You can now run this pipeline by simply calling the function:
+
+```python
+simple_ml_pipeline()
+```
+
+However, there are other ways to trigger a pipeline, specifically a pipeline with a remote stack (remote
+orchestrator, artifact store, and container registry).
+
+<table data-view="cards"><thead><tr><th></th><th></th><th></th><th data-hidden data-card-target data-type="content-ref"></th></tr></thead><tbody><tr><td>Trigger a pipeline from Python SDK</td><td></td><td></td><td><a href="trigger-a-pipeline-from-client.md">trigger-a-pipeline-from-client.md</a></td></tr><tr><td>Trigger a pipeline from another</td><td></td><td></td><td><a href="trigger-a-pipeline-from-another.md">trigger-a-pipeline-from-another.md</a></td></tr><tr><td>Trigger a pipeline from the REST API</td><td></td><td></td><td><a href="trigger-a-pipeline-from-rest-api.md">trigger-a-pipeline-from-rest-api.md</a></td></tr></tbody></table>
+
+<figure><img src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" alt="ZenML Scarf"><figcaption></figcaption></figure>