Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

align with core changes #34

Merged
merged 2 commits into from
Jan 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,5 +56,5 @@ jobs:
with:
stack-name: ${{ matrix.stack-name }}
python-version: ${{ matrix.python-version }}
ref-zenml: ${{ inputs.ref-zenml || 'develop' }}
ref-zenml: ${{ inputs.ref-zenml || 'feature/OSSK-357-deprecate-external-artifact-with-non-value-inputs' }}
ref-template: ${{ inputs.ref-template || github.ref }}
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -274,18 +274,18 @@ model:

The process of loading data is similar to training, even the same step function is used, but with the `is_inference` flag.

But inference flow has an important difference - there is no need to fit preprocessing sklearn `Pipeline`, rather we need to reuse one fitted during training on the train set, to ensure that the model object gets the expected input. To do so we will use [ExternalArtifact](https://docs.zenml.io/user-guide/advanced-guide/pipelining-features/configure-steps-pipelines#pass-any-kind-of-data-to-your-steps) with lookup by `model_artifact_name` only to get the preprocessing pipeline fitted during the quality-assured training run. This is possible since we configured the batch inference pipeline to run inside a Model Control Plane version context.
But inference flow has an important difference - there is no need to fit preprocessing sklearn `Pipeline`, rather we need to reuse one fitted during training on the train set, to ensure that the model object gets the expected input. To do so we will use the [Model interface](https://docs.zenml.io/user-guide/starter-guide/track-ml-models#configuring-a-model-in-a-pipeline) with lookup by artifact name inside a model context to get the preprocessing pipeline fitted during the quality-assured training run. This is possible since we configured the batch inference pipeline to run inside a Model Control Plane version context.
<details>
<summary>Code snippet 💻</summary>

```python
model = get_pipeline_context().model
########## ETL stage ##########
df_inference, target = data_loader(is_inference=True)
df_inference = inference_data_preprocessor(
dataset_inf=df_inference,
preprocess_pipeline=ExternalArtifact(
model_artifact_name="preprocess_pipeline",
), # this fetches artifact using Model Control Plane
# this fetches artifact using Model Control Plane
preprocess_pipeline=model.get_artifact("preprocess_pipeline"),
target=target,
)
```
Expand All @@ -298,7 +298,7 @@ df_inference = inference_data_preprocessor(

In the drift reporting stage, we will use [standard step](https://docs.zenml.io/stacks-and-components/component-guide/data-validators/evidently#the-evidently-data-validator) `evidently_report_step` to build Evidently report to assess certain data quality metrics. `evidently_report_step` has a number of options, but for this example, we will build only `DataQualityPreset` metrics preset to get a number of NA values in reference and current datasets.

We pass `dataset_trn` from the training pipeline as a `reference_dataset` here. To do so we will use [ExternalArtifact](https://docs.zenml.io/user-guide/advanced-guide/pipelining-features/configure-steps-pipelines#pass-any-kind-of-data-to-your-steps) with lookup by `model_artifact_name` only to get the training dataset used during quality-assured training run. This is possible since we configured the batch inference pipeline to run inside a Model Control Plane version context.
We pass `dataset_trn` from the training pipeline as a `reference_dataset` here. To do so we will use the [Model interface](https://docs.zenml.io/user-guide/starter-guide/track-ml-models#configuring-a-model-in-a-pipeline) with lookup by artifact name inside a model context to get the training dataset used during quality-assured training run. This is possible since we configured the batch inference pipeline to run inside a Model Control Plane version context.

After the report is built we execute another quality gate using the `drift_quality_gate` step, which assesses if a significant drift in the NA count is observed. If so, execution is stopped with an exception.

Expand Down
9 changes: 5 additions & 4 deletions template/pipelines/batch_inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
notify_on_failure,
notify_on_success,
)
from zenml import ExternalArtifact, pipeline
from zenml import pipeline, get_pipeline_context
from zenml.integrations.evidently.metrics import EvidentlyMetricConfig
from zenml.integrations.evidently.steps import evidently_report_step
from zenml.logger import get_logger
Expand All @@ -29,21 +29,22 @@ def {{product_name}}_batch_inference():
### ADD YOUR OWN CODE HERE - THIS IS JUST AN EXAMPLE ###
# Link all the steps together by calling them and passing the output
# of one step as the input of the next step.
model = get_pipeline_context().model
########## ETL stage ##########
df_inference, target, _ = data_loader(
random_state=ExternalArtifact(name="random_state"),
random_state=model.get_artifact("random_state"),
is_inference=True
)
df_inference = inference_data_preprocessor(
dataset_inf=df_inference,
preprocess_pipeline=ExternalArtifact(name="preprocess_pipeline"),
preprocess_pipeline=model.get_artifact("preprocess_pipeline"),
target=target,
)

{%- if data_quality_checks %}
########## DataQuality stage ##########
report, _ = evidently_report_step(
reference_dataset=ExternalArtifact(name="dataset_trn"),
reference_dataset=model.get_artifact("dataset_trn"),
comparison_dataset=df_inference,
ignored_cols=["target"],
metrics=[
Expand Down
Loading