Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic (templated) names for model versions #2909

Merged
merged 35 commits into from
Aug 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
737046b
util method
avishniakov Aug 5, 2024
0a868c7
avoid double dot
avishniakov Aug 5, 2024
cf71425
MV names with template support
avishniakov Aug 6, 2024
50d3da9
Merge branch 'develop' into feature/PRD-539-dynamic-names-for-model-v…
avishniakov Aug 6, 2024
ab98623
Merge branch 'develop' into feature/PRD-539-dynamic-names-for-model-v…
avishniakov Aug 8, 2024
1d5876f
templated model version with tracing
avishniakov Aug 8, 2024
63c0018
UTC -> timezone.utc
avishniakov Aug 8, 2024
d61c19c
fix ongoing issues + add test
avishniakov Aug 8, 2024
967c881
Merge branch 'develop' into feature/PRD-539-dynamic-names-for-model-v…
avishniakov Aug 8, 2024
e451477
Merge branch 'develop' into feature/PRD-539-dynamic-names-for-model-v…
htahir1 Aug 8, 2024
02a8f6c
resolve branching
avishniakov Aug 8, 2024
6a7b111
lint
avishniakov Aug 8, 2024
a7a507a
restore previous behavior and fix tests
avishniakov Aug 9, 2024
0383e34
Merge branch 'develop' into feature/PRD-539-dynamic-names-for-model-v…
avishniakov Aug 9, 2024
1ebd20a
fail on conflict of YAML and code pipe config
avishniakov Aug 9, 2024
ba50fdb
revert
avishniakov Aug 9, 2024
85ec98b
fix test
avishniakov Aug 9, 2024
88a8ccf
restore
avishniakov Aug 9, 2024
3b73506
[PRD-551] Fix for cached pipelines linking
avishniakov Aug 15, 2024
90849fc
Auto-update of Starter template
actions-user Aug 15, 2024
24b400f
force CI
avishniakov Aug 16, 2024
6f97758
remove redundant lc
avishniakov Aug 22, 2024
703ff39
remove redundant warm-ups
avishniakov Aug 26, 2024
d12a8a4
fix descs
avishniakov Aug 26, 2024
0ea1d27
fix for cached steps
avishniakov Aug 26, 2024
3abbf8c
Merge branch 'develop' into feature/PRD-539-dynamic-names-for-model-v…
avishniakov Aug 26, 2024
8e4a021
fix introduced caching issues
avishniakov Aug 26, 2024
1150f77
typos
avishniakov Aug 26, 2024
a44464a
`is_schedulable` prop
avishniakov Aug 26, 2024
28ddf94
move to `is_schedulable`
avishniakov Aug 26, 2024
f3c7443
fix artifact config linkage on cached
avishniakov Aug 26, 2024
cadf3ee
simplify
avishniakov Aug 27, 2024
914a776
rename
avishniakov Aug 27, 2024
0152dc4
bugfix
avishniakov Aug 27, 2024
feb6028
rename
avishniakov Aug 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .typos.toml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ daa = "daa"
arange = "arange"
cachable = "cachable"
OT = "OT"
cll = "cll"

[default]
locale = "en-us"
8 changes: 4 additions & 4 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -731,7 +731,7 @@ by adding support for `Schedule.start_time` to the HyperAI orchestrator.
## What's Changed
* Really run migration testing by @avishniakov in https://github.com/zenml-io/zenml/pull/2562
* Interact with feature gate by @AlexejPenner in https://github.com/zenml-io/zenml/pull/2492
* Allow for logs to be unformatted / without colours by @strickvl in https://github.com/zenml-io/zenml/pull/2544
* Allow for logs to be unformatted / without colors by @strickvl in https://github.com/zenml-io/zenml/pull/2544
* Add VS Code extension to README / docs by @strickvl in https://github.com/zenml-io/zenml/pull/2568
* Allow loading of artifacts without needing to activate the artifact store (again) by @avishniakov in https://github.com/zenml-io/zenml/pull/2545
* Minor fix by @htahir1 in https://github.com/zenml-io/zenml/pull/2578
Expand Down Expand Up @@ -1302,7 +1302,7 @@ and some improvements to the Model Control Plane.
## What's Changed
* Bump aquasecurity/trivy-action from 0.16.0 to 0.16.1 by @dependabot in https://github.com/zenml-io/zenml/pull/2244
* Bump crate-ci/typos from 1.16.26 to 1.17.0 by @dependabot in https://github.com/zenml-io/zenml/pull/2245
* Add YAML formatting standardisation to formatting & linting scripts by @strickvl in https://github.com/zenml-io/zenml/pull/2224
* Add YAML formatting standardization to formatting & linting scripts by @strickvl in https://github.com/zenml-io/zenml/pull/2224
* Remove text annotation by @strickvl in https://github.com/zenml-io/zenml/pull/2246
* Add MariaDB migration testing by @strickvl in https://github.com/zenml-io/zenml/pull/2170
* Delete artifact links from model version via Client, ModelVersion and API by @avishniakov in https://github.com/zenml-io/zenml/pull/2191
Expand Down Expand Up @@ -1383,7 +1383,7 @@ which allows you to define custom blocks for the Slack message.
* Bump google-github-actions/auth from 1 to 2 by @dependabot in https://github.com/zenml-io/zenml/pull/2203
* Bump aws-actions/amazon-ecr-login from 1 to 2 by @dependabot in https://github.com/zenml-io/zenml/pull/2200
* Bump crate-ci/typos from 1.16.25 to 1.16.26 by @dependabot in https://github.com/zenml-io/zenml/pull/2207
* Fix unreliable test behaviour when using hypothesis by @strickvl in https://github.com/zenml-io/zenml/pull/2208
* Fix unreliable test behavior when using hypothesis by @strickvl in https://github.com/zenml-io/zenml/pull/2208
* Added more pod spec properties for k8s orchestrator by @htahir1 in https://github.com/zenml-io/zenml/pull/2097
* Fix API docs environment setup by @strickvl in https://github.com/zenml-io/zenml/pull/2190
* Use placeholder runs to show pipeline runs in the dashboard without delay by @schustmi in https://github.com/zenml-io/zenml/pull/2048
Expand Down Expand Up @@ -2602,7 +2602,7 @@ improvements and bug fixes.
* Delete extra word from `bentoml` docs by @strickvl in https://github.com/zenml-io/zenml/pull/1484
* Remove top-level config from recommended repo structure by @schustmi in https://github.com/zenml-io/zenml/pull/1485
* Bump `mypy` and `ruff` by @strickvl in https://github.com/zenml-io/zenml/pull/1481
* ZenML Version Downgrade - Silence Warnning by @safoinme in https://github.com/zenml-io/zenml/pull/1477
* ZenML Version Downgrade - Silence Warning by @safoinme in https://github.com/zenml-io/zenml/pull/1477
* Update ZenServer recipes to include secret stores by @wjayesh in https://github.com/zenml-io/zenml/pull/1483
* Fix alembic order by @schustmi in https://github.com/zenml-io/zenml/pull/1487
* Fix source resolving for classes in notebooks by @schustmi in https://github.com/zenml-io/zenml/pull/1486
Expand Down
2 changes: 1 addition & 1 deletion docs/book/component-guide/annotators/prodigy.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ workflow!
With Prodigy, there is no need to specially start the annotator ahead of time
like with [Label Studio](label-studio.md). Instead, just use Prodigy as per the
[Prodigy docs](https://prodi.gy) and then you can use the ZenML wrapper / API to
get your labelled data etc using our Python methods.
get your labeled data etc using our Python methods.

ZenML supports access to your data and annotations via the `zenml annotator ...`
CLI command.
Expand Down
2 changes: 1 addition & 1 deletion docs/book/component-guide/orchestrators/azureml.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ from zenml import step, pipeline
from zenml.integrations.azure.flavors import AzureMLOrchestratorSettings

azureml_settings = AzureMLOrchestratorSettings(
mode="serverless" # It's the default behaviour
mode="serverless" # It's the default behavior
)

@step
Expand Down
2 changes: 1 addition & 1 deletion docs/book/component-guide/orchestrators/databricks.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ The Databricks orchestrator only supports the `cron_expression`, in the `Schedul
{% endhint %}

{% hint style="warning" %}
The Databricks orchestrator requires Java Timezone IDs to be used in the `cron_expression`. You can find a list of supported timezones [here](https://docs.oracle.com/middleware/1221/wcs/tag-ref/MISC/TimeZones.html), the timezone ID must be set in the settings of the orchestrator (see below for more imformation how to set settings for the orchestrator).
The Databricks orchestrator requires Java Timezone IDs to be used in the `cron_expression`. You can find a list of supported timezones [here](https://docs.oracle.com/middleware/1221/wcs/tag-ref/MISC/TimeZones.html), the timezone ID must be set in the settings of the orchestrator (see below for more information how to set settings for the orchestrator).
{% endhint %}

**How to delete a scheduled pipeline**
Expand Down
2 changes: 1 addition & 1 deletion docs/book/component-guide/orchestrators/skypilot-vm.md
Original file line number Diff line number Diff line change
Expand Up @@ -407,7 +407,7 @@ One of the key features of the SkyPilot VM Orchestrator is the ability to run ea

The SkyPilot VM Orchestrator allows you to configure resources for each step individually. This means you can specify different VM types, CPU and memory requirements, and even use spot instances for certain steps while using on-demand instances for others.

If no step-specific settings are specified, the orchestrator will use the resources specified in the orchestrator settings for each step and run the entire pipeline in one VM. If step-specific settings are specified, an orchestrator VM will be spun up first, which will subsequently spin out new VMs dependant on the step settings. You can disable this behavior by setting the `disable_step_based_settings` parameter to `True` in the orchestrator configuration, using the following command:
If no step-specific settings are specified, the orchestrator will use the resources specified in the orchestrator settings for each step and run the entire pipeline in one VM. If step-specific settings are specified, an orchestrator VM will be spun up first, which will subsequently spin out new VMs dependent on the step settings. You can disable this behavior by setting the `disable_step_based_settings` parameter to `True` in the orchestrator configuration, using the following command:

```shell
zenml orchestrator update <ORCHESTRATOR_NAME> --disable_step_based_settings=True
Expand Down
2 changes: 1 addition & 1 deletion docs/book/how-to/build-pipelines/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ When this pipeline is executed, the run of the pipeline gets logged to the ZenML
at its DAG and all the associated metadata. To access the dashboard you need to have a ZenML server either running
locally or remotely. See our documentation on this [here](../../getting-started/deploying-zenml/README.md).

<figure><img src="../../.gitbook/assets/SimplePipelineDag.png" alt=""><figcaption><p>DAG representation in the ZenML Dahboard.</p></figcaption></figure>
<figure><img src="../../.gitbook/assets/SimplePipelineDag.png" alt=""><figcaption><p>DAG representation in the ZenML Dashboard.</p></figcaption></figure>

Check below for more advanced ways to build and interact with your pipeline.

Expand Down
28 changes: 18 additions & 10 deletions docs/book/how-to/build-pipelines/schedule-a-pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,24 @@ description: Learn how to set, pause and stop a schedule for pipelines.
Schedules don't work for all orchestrators. Here is a list of all supported orchestrators.
{% endhint %}

| Orchestrator | Scheduling Support |
|--------------------------------------------------------------------------------|--------------------|
| [LocalOrchestrator](../../component-guide/orchestrators/local.md) | ⛔️ |
| [LocalDockerOrchestrator](../../component-guide/orchestrators/local-docker.md) | ⛔️ |
| [KubernetesOrchestrator](../../component-guide/orchestrators/kubernetes.md) | ✅ |
| [KubeflowOrchestrator](../../component-guide/orchestrators/kubeflow.md) | ✅ |
| [VertexOrchestrator](../../component-guide/orchestrators/vertex.md) | ✅ |
| [TektonOrchestrator](../../component-guide/orchestrators/tekton.md) | ⛔️ |
| [AirflowOrchestrator](../../component-guide/orchestrators/airflow.md) | ✅ |
| [AzureMLOrchestrator](../../component-guide/orchestrators/azureml.md) | ✅ |
| Orchestrator | Scheduling Support |
|----------------------------------------------------------------------------------|--------------------|
| [AirflowOrchestrator](../../component-guide/orchestrators/airflow.md) | ✅ |
| [AzureMLOrchestrator](../../component-guide/orchestrators/azureml.md) | ✅ |
| [DatabricksOrchestrator](../../component-guide/orchestrators/databricks.md) | ✅ |
| [HyperAIOrchestrator](../../component-guide/orchestrators/hyperai.md) | ✅ |
| [KubeflowOrchestrator](../../component-guide/orchestrators/kubeflow.md) | ✅ |
| [KubernetesOrchestrator](../../component-guide/orchestrators/kubernetes.md) | ✅ |
| [LocalOrchestrator](../../component-guide/orchestrators/local.md) | ⛔️ |
| [LocalDockerOrchestrator](../../component-guide/orchestrators/local-docker.md) | ⛔️ |
| [SagemakerOrchestrator](../../component-guide/orchestrators/sagemaker.md) | ⛔️ |
| [SkypilotAWSOrchestrator](../../component-guide/orchestrators/skypilot-vm.md) | ⛔️ |
| [SkypilotAzureOrchestrator](../../component-guide/orchestrators/skypilot-vm.md) | ⛔️ |
| [SkypilotGCPOrchestrator](../../component-guide/orchestrators/skypilot-vm.md) | ⛔️ |
| [SkypilotLambdaOrchestrator](../../component-guide/orchestrators/skypilot-vm.md) | ⛔️ |
| [TektonOrchestrator](../../component-guide/orchestrators/tekton.md) | ⛔️ |
| [VertexOrchestrator](../../component-guide/orchestrators/vertex.md) | ✅ |


### Set a schedule

Expand Down
4 changes: 2 additions & 2 deletions docs/book/reference/environment-variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,9 +72,9 @@ Set to `false` to disable the [`rich` traceback](https://rich.readthedocs.io/en/
export ZENML_ENABLE_RICH_TRACEBACK=true
```

## Disable colourful logging
## Disable colorful logging

If you wish to disable colourful logging, set the following environment variable:
If you wish to disable colorful logging, set the following environment variable:

```bash
ZENML_LOGGING_COLORS_DISABLED=true
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ following:
- finetune our model using the [Sentence
Transformers](https://www.sbert.net/) library
- evaluate the base and finetuned embeddings
- visualise the results of the evaluation
- visualize the results of the evaluation

![Embeddings finetuning pipeline with Sentence Transformers and
ZenML](../../../.gitbook/assets/rag-finetuning-embeddings-pipeline.png)
Expand Down Expand Up @@ -94,7 +94,7 @@ The finetuning process leverages the capabilities of the Sentence Transformers l
Our model is finetuned, saved in the Hugging Face Hub for easy access and
reference in subsequent steps, but also versioned and tracked within ZenML for
full observability. At this point the pipeline will evaluate the base and
finetuned embeddings and visualise the results.
finetuned embeddings and visualize the results.

<!-- For scarf -->
<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ Step retrieval_evaluation_full_with_reranking has finished in 4m20s.

We can see here a specific example of a failure in the reranking evaluation. It's quite a good one because we can see that the question asked was actually an anomaly in the sense that the LLM has generated two questions and included its meta-discussion of the two questions it generated. Obviously this is not a representative question for the dataset, and if we saw a lot of these we might want to take some time to both understand why the LLM is generating these questions and how we can filter them out.

### Visualising our reranking performance
### Visualizing our reranking performance

Since ZenML can display visualizations in its dashboard, we can showcase the results of our experiments in a visual format. For example, we can plot the failure rates of the retrieval system with and without reranking to see the impact of reranking on the performance.

Expand Down
2 changes: 1 addition & 1 deletion scripts/format.sh
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ ruff check $SRC --select F401,F841 --fix --exclude "__init__.py" --isolated
ruff check $SRC --select I --fix --ignore D
ruff format $SRC

# standardises / formats CI yaml files
# standardizes / formats CI yaml files
if [ "$SKIP_YAMLFIX" = false ]; then
yamlfix .github tests --exclude "dependabot.yml"
fi
Expand Down
2 changes: 1 addition & 1 deletion src/zenml/cli/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2186,7 +2186,7 @@ def my_pipeline(...):

You can update a registered service connector by using the `update` command.
Keep in mind that all service connector updates are validated before being
applied. If you want to disable this behaviour please use the `--no-verify`
applied. If you want to disable this behavior please use the `--no-verify`
flag.

```bash
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,15 @@ class AirflowOrchestratorConfig(

local: bool = True

@property
def is_schedulable(self) -> bool:
"""Whether the orchestrator is schedulable or not.

Returns:
Whether the orchestrator is schedulable or not.
"""
return True


class AirflowOrchestratorFlavor(BaseOrchestratorFlavor):
"""Flavor for the Airflow orchestrator."""
Expand Down
4 changes: 2 additions & 2 deletions src/zenml/integrations/azure/flavors/azureml.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,8 @@ class AzureMLComputeSettings(BaseSettings):

There are three possible use cases for this implementation:

1. Serverless compute (default behaviour):
- The `mode` is set to `serverless` (default behaviour).
1. Serverless compute (default behavior):
- The `mode` is set to `serverless` (default behavior).
- All the other parameters become irrelevant and will throw a
warning if set.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,15 @@ def is_synchronous(self) -> bool:
"""
return self.synchronous

@property
def is_schedulable(self) -> bool:
"""Whether the orchestrator is schedulable or not.

Returns:
Whether the orchestrator is schedulable or not.
"""
return True


class AzureMLOrchestratorFlavor(BaseOrchestratorFlavor):
"""Flavor for the AzureML orchestrator."""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,15 @@ def is_remote(self) -> bool:
"""
return True

@property
def is_schedulable(self) -> bool:
"""Whether the orchestrator is schedulable or not.

Returns:
Whether the orchestrator is schedulable or not.
"""
return True


class DatabricksOrchestratorFlavor(BaseOrchestratorFlavor):
"""Databricks orchestrator flavor."""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,15 @@ def is_synchronous(self) -> bool:
"""
return self.synchronous

@property
def is_schedulable(self) -> bool:
"""Whether the orchestrator is schedulable or not.

Returns:
Whether the orchestrator is schedulable or not.
"""
return True


class VertexOrchestratorFlavor(BaseOrchestratorFlavor):
"""Vertex Orchestrator flavor."""
Expand Down
11 changes: 10 additions & 1 deletion src/zenml/integrations/gcp/orchestrators/vertex_orchestrator.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,16 @@
import os
import re
import types
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple, Type, cast
from typing import (
TYPE_CHECKING,
Any,
Dict,
List,
Optional,
Tuple,
Type,
cast,
)
from uuid import UUID

from google.api_core import exceptions as google_exceptions
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ def inner(*args: Any, **kwargs: Any) -> Any:
)

with create_cli_wrapped_script(
entrypoint, flavour="accelerate"
entrypoint, flavor="accelerate"
) as (
script_path,
output_path,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,15 @@ def is_remote(self) -> bool:
"""
return True

@property
def is_schedulable(self) -> bool:
"""Whether the orchestrator is schedulable or not.

Returns:
Whether the orchestrator is schedulable or not.
"""
return True


class HyperAIOrchestratorFlavor(BaseOrchestratorFlavor):
"""Flavor for the HyperAI orchestrator."""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,15 @@ def is_synchronous(self) -> bool:
"""
return self.synchronous

@property
def is_schedulable(self) -> bool:
"""Whether the orchestrator is schedulable or not.

Returns:
Whether the orchestrator is schedulable or not.
"""
return True


class KubeflowOrchestratorFlavor(BaseOrchestratorFlavor):
"""Kubeflow orchestrator flavor."""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,16 @@

import os
import types
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple, Type, cast
from typing import (
TYPE_CHECKING,
Any,
Dict,
List,
Optional,
Tuple,
Type,
cast,
)
from uuid import UUID

import kfp
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,15 @@ def is_synchronous(self) -> bool:
"""
return self.synchronous

@property
def is_schedulable(self) -> bool:
"""Whether the orchestrator is schedulable or not.

Returns:
Whether the orchestrator is schedulable or not.
"""
return True


class KubernetesOrchestratorFlavor(BaseOrchestratorFlavor):
"""Kubernetes orchestrator flavor."""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,16 @@
"""Kubernetes-native orchestrator."""

import os
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple, Type, cast
from typing import (
TYPE_CHECKING,
Any,
Dict,
List,
Optional,
Tuple,
Type,
cast,
)

from kubernetes import client as k8s_client
from kubernetes import config as k8s_config
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,7 @@ def delete_dataset(self, **kwargs: Any) -> None:
def get_dataset(self, **kwargs: Any) -> Any:
"""Gets the dataset metadata for the given name.

If you would like the labelled data, use `get_labeled_data` instead.
If you would like the labeled data, use `get_labeled_data` instead.

Args:
**kwargs: Additional keyword arguments to pass to the Prodigy client.
Expand Down
Loading
Loading