Skip to content

Commit

Permalink
Upload code to artifact store (#2895)
Browse files Browse the repository at this point in the history
* WIP

* Docstrings

* Add DB migration

* Add archivable superclass

* Improve build reuse

* Fix gzip for archives

* Better error messages

* Docstrings/mypy

* Remove some unnecessary stuff

* Typo

* Update build context to inherit from new superclass

* Fix unit tests

* Small fixes

* Ignore .zen folder and other small improvements

* Sort and remove duplicates for better build reuse

* Update docker settings to use booleans

* Add code path to pipeline run for frontend

* Move log

* Better docstring

* Remove hub tests

* Try manual cleanup

* Docs

* Fix alembic order
  • Loading branch information
schustmi authored Aug 7, 2024
1 parent a981e50 commit 510f10d
Show file tree
Hide file tree
Showing 21 changed files with 915 additions and 258 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,23 @@

ZenML determines the root directory of your source files in the following order:

* If you've initialized zenml (`zenml init`), the repository root directory will be used.
* If you've initialized zenml (`zenml init`) in your current working directory or one of its parent directories, the repository root directory will be used.
* Otherwise, the parent directory of the Python file you're executing will be the source root. For example, running `python /path/to/file.py`, the source root would be `/path/to`.

You can specify how the files inside this root directory are handled using the `source_files` attribute on the [DockerSettings](https://sdkdocs.zenml.io/latest/core_code_docs/core-config/#zenml.config.docker_settings.DockerSettings):
You can specify how the files inside this root directory are handled using the following three attributes on the [DockerSettings](https://sdkdocs.zenml.io/latest/core_code_docs/core-config/#zenml.config.docker_settings.DockerSettings):
* `allow_download_from_code_repository`: If this is set to `True` and your files are inside a registered [code repository](../setting-up-a-project-repository/connect-your-git-repository.md) and the repository has no local changes, the files will be downloaded from the code repository and not included in the image.
* `allow_download_from_artifact_store`: If the previous option is disabled or no code repository without local changes exists for the root directory, ZenML will archive and upload your code to the artifact store if this is set to `True`.
* `allow_including_files_in_images`: If both previous options were disabled or not possible, ZenML will include your files in the Docker image if this option is enabled. This means a new Docker image has to be built each time you modify one of your code files.

* The default behavior `download_or_include`: The files will be downloaded if they're inside a registered [code repository](../setting-up-a-project-repository/connect-your-git-repository.md) and the repository has no local changes, otherwise, they will be included in the image.
* If you want your files to be included in the image in any case, set the `source_files` attribute to `include`.
* If you want your files to be downloaded in any case, set the `source_files` attribute to `download`. If this is specified, the files must be inside a registered code repository and the repository must have no local changes, otherwise the Docker build will fail.
* If you want to prevent ZenML from copying or downloading any of your source files, you can do so by setting the `source_files` attribute on the Docker settings to `ignore`. This is an advanced feature and will most likely cause unintended and unanticipated behavior when running your pipelines. If you use this, make sure to copy all the necessary files to the correct paths yourself.
{% hint style="warning" %}
Setting all of the above attributes to `False` is not recommended and will most likely cause unintended and unanticipated behavior when running your pipelines. If you do this, you're responsible that all your files are at the correct paths in the Docker images that will be used to run your pipeline steps.
{% endhint %}

**Which files get included**
## Control which files get downloaded

When downloading files either from a code repository or the artifact store, ZenML downloads all contents of the root directory into the Docker container. To exclude files, track your code in a Git repository use a [gitignore](https://git-scm.com/docs/gitignore/en) to specify which files should be excluded.

## Control which files get included

When including files in the image, ZenML copies all contents of the root directory into the Docker image. To exclude files and keep the image smaller, use a [.dockerignore file](https://docs.docker.com/engine/reference/builder/#dockerignore-file) in either of the following ways:

Expand All @@ -26,6 +32,7 @@ When including files in the image, ZenML copies all contents of the root directo
def my_pipeline(...):
...
```

<!-- For scarf -->
<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,10 @@ settings:
required_integrations: List[str]
requirements: Union[NoneType, str, List[str]]
skip_build: bool
source_files: SourceFileMode
prevent_build_reuse: bool
allow_including_files_in_images: bool
allow_download_from_code_repository: bool
allow_download_from_artifact_store: bool
target_repository: str
user: Optional[str]
resources:
Expand Down Expand Up @@ -133,7 +136,10 @@ steps:
required_integrations: List[str]
requirements: Union[NoneType, str, List[str]]
skip_build: bool
source_files: SourceFileMode
prevent_build_reuse: bool
allow_including_files_in_images: bool
allow_download_from_code_repository: bool
allow_download_from_artifact_store: bool
target_repository: str
user: Optional[str]
resources:
Expand Down Expand Up @@ -191,7 +197,10 @@ steps:
required_integrations: List[str]
requirements: Union[NoneType, str, List[str]]
skip_build: bool
source_files: SourceFileMode
prevent_build_reuse: bool
allow_including_files_in_images: bool
allow_download_from_code_repository: bool
allow_download_from_artifact_store: bool
target_repository: str
user: Optional[str]
resources:
Expand Down
2 changes: 1 addition & 1 deletion docs/book/toc.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@
* [Trigger a pipeline from Python Client](how-to/trigger-pipelines/trigger-a-pipeline-from-client.md)
* [Trigger a pipeline from another pipeline](how-to/trigger-pipelines/trigger-a-pipeline-from-another.md)
* [Trigger a pipeline from REST API](how-to/trigger-pipelines/trigger-a-pipeline-from-rest-api.md)
* [🚨 Create and run templates](how-to/create-and-run-templates/README.md)
* [▶️ Create and run templates](how-to/create-and-run-templates/README.md)
* [Create a run template](how-to/create-and-run-templates/create-a-run-template.md)
* [Run a template](how-to/create-and-run-templates/run-a-template.md)
* [📃 Use configuration files](how-to/use-configuration-files/README.md)
Expand Down
60 changes: 43 additions & 17 deletions src/zenml/config/build_configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,13 @@
"""Build configuration class."""

import hashlib
import json
from typing import TYPE_CHECKING, Dict, Optional

from pydantic import BaseModel

from zenml.config.docker_settings import DockerSettings, SourceFileMode
from zenml.config.docker_settings import DockerSettings
from zenml.utils import json_utils

if TYPE_CHECKING:
from zenml.code_repositories import BaseCodeRepository
Expand Down Expand Up @@ -60,7 +62,14 @@ def compute_settings_checksum(
The checksum.
"""
hash_ = hashlib.md5() # nosec
hash_.update(self.settings.model_dump_json().encode())
settings_json = json.dumps(
self.settings.model_dump(
mode="json", exclude={"prevent_build_reuse"}
),
sort_keys=True,
default=json_utils.pydantic_encoder,
)
hash_.update(settings_json.encode())
if self.entrypoint:
hash_.update(self.entrypoint.encode())

Expand All @@ -72,7 +81,7 @@ def compute_settings_checksum(
PipelineDockerImageBuilder,
)

pass_code_repo = self.should_download_files(
pass_code_repo = self.should_download_files_from_code_repository(
code_repository=code_repository
)
requirements_files = (
Expand Down Expand Up @@ -101,34 +110,51 @@ def should_include_files(
Returns:
Whether files should be included in the image.
"""
if self.settings.source_files == SourceFileMode.INCLUDE:
return True
if self.should_download_files(code_repository=code_repository):
return False

if (
self.settings.source_files == SourceFileMode.DOWNLOAD_OR_INCLUDE
and not code_repository
return self.settings.allow_including_files_in_images

def should_download_files(
self,
code_repository: Optional["BaseCodeRepository"],
) -> bool:
"""Whether files should be downloaded in the image.
Args:
code_repository: Code repository that can be used to download files
inside the image.
Returns:
Whether files should be downloaded in the image.
"""
if self.should_download_files_from_code_repository(
code_repository=code_repository
):
return True

if self.settings.allow_download_from_artifact_store:
return True

return False

def should_download_files(
def should_download_files_from_code_repository(
self,
code_repository: Optional["BaseCodeRepository"],
) -> bool:
"""Whether files should be downloaded in the image.
"""Whether files should be downloaded from the code repository.
Args:
code_repository: Code repository that can be used to download files
inside the image.
Returns:
Whether files should be downloaded in the image.
Whether files should be downloaded from the code repository.
"""
if not code_repository:
return False
if (
code_repository
and self.settings.allow_download_from_code_repository
):
return True

return self.settings.source_files in {
SourceFileMode.DOWNLOAD,
SourceFileMode.DOWNLOAD_OR_INCLUDE,
}
return False
130 changes: 79 additions & 51 deletions src/zenml/config/docker_settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,7 @@
from enum import Enum
from typing import Any, Dict, List, Optional, Union

from pydantic import BaseModel, Field, model_validator
from pydantic_settings import SettingsConfigDict
from pydantic import BaseModel, ConfigDict, Field, model_validator

from zenml.config.base_settings import BaseSettings
from zenml.logger import get_logger
Expand Down Expand Up @@ -49,15 +48,6 @@ def command(self) -> str:
}[self]


class SourceFileMode(Enum):
"""Different methods to handle source files in Docker images."""

INCLUDE = "include"
DOWNLOAD_OR_INCLUDE = "download_or_include"
DOWNLOAD = "download"
IGNORE = "ignore"


class PythonPackageInstaller(Enum):
"""Different installers for python packages."""

Expand Down Expand Up @@ -134,10 +124,9 @@ class DockerSettings(BaseSettings):
when the `dockerfile` attribute is set. If this is left empty, the
build context will only contain the Dockerfile.
parent_image_build_config: Configuration for the parent image build.
build_options: DEPRECATED, use parent_image_build_config.build_options
instead.
skip_build: If set to `True`, the parent image will be used directly to
run the steps of your pipeline.
prevent_build_reuse: Prevent the reuse of an existing build.
target_repository: Name of the Docker repository to which the
image should be pushed. This repository will be appended to the
registry URI of the container registry of your stack and should
Expand Down Expand Up @@ -171,33 +160,32 @@ class DockerSettings(BaseSettings):
environment: Dictionary of environment variables to set inside the
Docker image.
build_config: Configuration for the main image build.
dockerignore: DEPRECATED, use build_config.dockerignore instead.
copy_files: DEPRECATED, use the `source_files` attribute instead.
copy_global_config: DEPRECATED/UNUSED.
user: If not `None`, will set the user, make it owner of the `/app`
directory which contains all the user code and run the container
entrypoint as this user.
source_files: Defines how the user source files will be handled when
building the Docker image.
* INCLUDE: The files will be included in the Docker image.
* DOWNLOAD: The files will be downloaded when running the image. If
this is specified, the files must be inside a registered code
repository and the repository must have no local changes,
otherwise the build will fail.
* DOWNLOAD_OR_INCLUDE: The files will be downloaded if they're
inside a registered code repository and the repository has no
local changes, otherwise they will be included in the image.
* IGNORE: The files will not be included or downloaded in the image.
If you use this option, you're responsible that all the files
to run your steps exist in the right place.
allow_including_files_in_images: If `True`, code can be included in the
Docker images if code download from a code repository or artifact
store is disabled or not possible.
allow_download_from_code_repository: If `True`, code can be downloaded
from a code repository if possible.
allow_download_from_artifact_store: If `True`, code can be downloaded
from the artifact store.
build_options: DEPRECATED, use parent_image_build_config.build_options
instead.
dockerignore: DEPRECATED, use build_config.dockerignore instead.
copy_files: DEPRECATED/UNUSED.
copy_global_config: DEPRECATED/UNUSED.
source_files: DEPRECATED. Use allow_including_files_in_images,
allow_download_from_code_repository and
allow_download_from_artifact_store instead.
"""

parent_image: Optional[str] = None
dockerfile: Optional[str] = None
build_context_root: Optional[str] = None
build_options: Dict[str, Any] = {}
parent_image_build_config: Optional[DockerBuildConfig] = None
skip_build: bool = False
prevent_build_reuse: bool = False
target_repository: Optional[str] = None
python_package_installer: PythonPackageInstaller = (
PythonPackageInstaller.PIP
Expand All @@ -210,49 +198,89 @@ class DockerSettings(BaseSettings):
default=None, union_mode="left_to_right"
)
required_integrations: List[str] = []
required_hub_plugins: List[str] = []
install_stack_requirements: bool = True
apt_packages: List[str] = []
environment: Dict[str, Any] = {}
dockerignore: Optional[str] = None
copy_files: bool = True
copy_global_config: bool = True
user: Optional[str] = None
build_config: Optional[DockerBuildConfig] = None

source_files: SourceFileMode = SourceFileMode.DOWNLOAD_OR_INCLUDE
allow_including_files_in_images: bool = True
allow_download_from_code_repository: bool = True
allow_download_from_artifact_store: bool = True

# Deprecated attributes
build_options: Dict[str, Any] = {}
dockerignore: Optional[str] = None
copy_files: bool = True
copy_global_config: bool = True
source_files: Optional[str] = None
required_hub_plugins: List[str] = []

_deprecation_validator = deprecation_utils.deprecate_pydantic_attributes(
"copy_files", "copy_global_config", "required_hub_plugins"
"copy_files",
"copy_global_config",
"source_files",
"required_hub_plugins",
)

@model_validator(mode="before")
@classmethod
@before_validator_handler
def _migrate_copy_files(cls, data: Dict[str, Any]) -> Dict[str, Any]:
"""Migrates the value from the old copy_files attribute.
def _migrate_source_files(cls, data: Dict[str, Any]) -> Dict[str, Any]:
"""Migrate old source_files values.
Args:
data: The settings values.
data: The model data.
Raises:
ValueError: If an invalid source file mode is specified.
Returns:
The migrated settings values.
The migrated data.
"""
copy_files = data.get("copy_files", None)
source_files = data.get("source_files", None)

if copy_files is None:
if source_files is None:
return data

if data.get("source_files", None):
# Ignore the copy files value in favor of the new source files
replacement_attributes = [
"allow_including_files_in_images",
"allow_download_from_code_repository",
"allow_download_from_artifact_store",
]
if any(v in data for v in replacement_attributes):
logger.warning(
"Both `copy_files` and `source_files` specified for the "
"DockerSettings, ignoring the `copy_files` value."
"Both `source_files` and one of %s specified for the "
"DockerSettings, ignoring the `source_files` value.",
replacement_attributes,
)
elif copy_files is True:
data["source_files"] = SourceFileMode.INCLUDE
elif copy_files is False:
data["source_files"] = SourceFileMode.IGNORE
return data

allow_including_files_in_images = False
allow_download_from_code_repository = False
allow_download_from_artifact_store = False

if source_files == "download":
allow_download_from_code_repository = True
elif source_files == "include":
allow_including_files_in_images = True
elif source_files == "download_or_include":
allow_including_files_in_images = True
allow_download_from_code_repository = True
elif source_files == "ignore":
pass
else:
raise ValueError(f"Invalid source file mode `{source_files}`.")

data["allow_including_files_in_images"] = (
allow_including_files_in_images
)
data["allow_download_from_code_repository"] = (
allow_download_from_code_repository
)
data["allow_download_from_artifact_store"] = (
allow_download_from_artifact_store
)

return data

Expand All @@ -277,7 +305,7 @@ def _validate_skip_build(self) -> "DockerSettings":

return self

model_config = SettingsConfigDict(
model_config = ConfigDict(
# public attributes are immutable
frozen=True,
# prevent extra attributes during model initialization
Expand Down
Loading

0 comments on commit 510f10d

Please sign in to comment.