Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fondant base image #801

Merged
merged 31 commits into from
Jan 25, 2024
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
e4bb3ca
Init base image build script
mrchtr Jan 18, 2024
4e9d62a
Add base image build to ci/cd
mrchtr Jan 19, 2024
d597ec5
Add usage of base image
mrchtr Jan 19, 2024
297d584
Fix typos
mrchtr Jan 19, 2024
4e0fa63
Add 3.11 image
mrchtr Jan 22, 2024
e29b7f4
Addressing comments
mrchtr Jan 22, 2024
a81d3ec
Addressing comments
mrchtr Jan 22, 2024
48df3a9
Merge branch 'main' into feature/build-fondant-base-image
mrchtr Jan 22, 2024
1dcd74b
Merge branch 'main' into feature/build-fondant-base-image
mrchtr Jan 23, 2024
91d1110
Fixing test
mrchtr Jan 23, 2024
27d2946
Rename Dockerfile folder
mrchtr Jan 23, 2024
02c7421
Temporarily exclude release to test build within ci/cd piepline
mrchtr Jan 23, 2024
63b2ad7
Temporarily exclude release to test build within ci/cd piepline
mrchtr Jan 23, 2024
cafbb26
Temporarily exclude release to test build within ci/cd piepline
mrchtr Jan 23, 2024
59566df
Add build for dev image
mrchtr Jan 23, 2024
6d7ba85
Add build for dev image
mrchtr Jan 23, 2024
73573c3
Update ECR image name
mrchtr Jan 23, 2024
042d943
Update ECR image name
mrchtr Jan 23, 2024
3b09c86
Update ECR image name
mrchtr Jan 23, 2024
8d5dbb1
Fixing tag
mrchtr Jan 23, 2024
59af668
Update ecr image
mrchtr Jan 23, 2024
8214878
Revert changes cicd
mrchtr Jan 23, 2024
21a5164
Revert changes cicd
mrchtr Jan 23, 2024
dba8d6e
Revert changes cicd
mrchtr Jan 23, 2024
b1ae8fa
Addressing comments
mrchtr Jan 24, 2024
694176f
Merge branch 'main' into feature/build-fondant-base-image
mrchtr Jan 24, 2024
0cfe025
rerun cicd
mrchtr Jan 24, 2024
7489475
revert cicd
mrchtr Jan 24, 2024
27dfcbc
revert cicd
mrchtr Jan 24, 2024
c5c7b7c
Addressing comments
mrchtr Jan 25, 2024
5528590
Fix test
mrchtr Jan 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,6 @@ jobs:

- name: Build components
run: ./scripts/build_components.sh --cache -t $GITHUB_SHA -t dev

- name: Build base image
run: ./scripts/build_base_image.sh -t $GITHUB_SHA
3 changes: 3 additions & 0 deletions .github/workflows/prep-release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,9 @@ jobs:
- name: Build data explorer
run: ./scripts/build_explorer.sh -t $GITHUB_REF_NAME

- name: Build base image
run: ./scripts/build_base_image.sh -t $GITHUB_REF_NAME

- name: Update version in pyproject.toml with tag version
run: sed -i "s/^version = .*/version = '${{github.ref_name}}'/" pyproject.toml

Expand Down
12 changes: 12 additions & 0 deletions images/Dockerfile
mrchtr marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
ARG PYTHON_VERSION
FROM --platform=linux/amd64 python:${PYTHON_VERSION}-slim

# System dependencies
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install git -y

# Install Fondant
ARG FONDANT_VERSION=main
RUN pip3 install fondant[component,aws,azure,gcp]@git+https://github.com/ml6team/fondant@${FONDANT_VERSION}

47 changes: 47 additions & 0 deletions scripts/build_base_image.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
#!/bin/bash
set -e

function usage {
echo "Usage: $0 [options]"
echo "Options:"
echo " -t, --tag <value> Set the tag (default: latest)"
echo " -h, --help Display this help message"
}

# Parse the arguments
while [[ "$#" -gt 0 ]]; do case $1 in
-t|--tag) tag="$2"; shift;;
-h|--help) usage; exit;;
*) echo "Unknown parameter passed: $1"; exit 1;;
esac; shift; done

# Supported Python versions
python_versions=("3.8" "3.9" "3.10" "3.11")


for python_version in "${python_versions[@]}"; do
BASENAME=fondant-base
IMAGE_TAG=${tag}-python${python_version}
full_image_names=()

# create repo if not exists
aws ecr-public describe-repositories --region us-east-1 --repository-names ${BASENAME} || aws ecr-public create-repository --region us-east-1 --repository-name ${BASENAME}
full_image_names+=("public.ecr.aws/fndnt/${BASENAME}:${IMAGE_TAG}")
full_image_names+=("fndnt/${BASENAME}:${IMAGE_TAG}")

# Add argument for each tag
for image_name in "${full_image_names[@]}" ; do
args+=(-t "$image_name")
done

for element in "${args[@]}"; do
echo "$element"
done

# Build docker images and push to docker hub
docker build --push "${args[@]}" \
--build-arg="PYTHON_VERSION=${python_version}" \
--build-arg="FONDANT_VERSION=${tag}" \
-f "images/Dockerfile" \
.
done
47 changes: 44 additions & 3 deletions src/fondant/pipeline/lightweight_component.py
Original file line number Diff line number Diff line change
@@ -1,23 +1,64 @@
import inspect
import itertools
import logging
import sys
import textwrap
import typing as t
from dataclasses import asdict, dataclass
from functools import wraps
from importlib.metadata import version

from fondant.component import BaseComponent, Component

logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

MIN_PYTHON_VERSION = (3, 8)
MAX_PYTHON_VERSION = (3, 11)


@dataclass
class Image:
base_image: str = "fondant:latest"
base_image: str
extra_requires: t.Optional[t.List[str]] = None
script: t.Optional[str] = None

def __post_init__(self):
if self.base_image is None:
# TODO: link to Fondant version
self.base_image = "fondant:latest"
self.base_image = self.resolve_fndnt_base_image()

# log info when custom image without Fondant is defined
elif not any(
dependency.startswith("fondant") for dependency in self.extra_requires
):
msg = (
"You are not using a Fondant default base image, and Fondant is not part of"
"your extra requirements. Please make sure that you have installed fondant "
"inside your container. Alternatively, you can should add Fondant to "
"the extra requirements. \n"
"E.g. \n"
'@lightweight_component(..., extra_requires=["fondant"])'
)

logger.info(msg)
mrchtr marked this conversation as resolved.
Show resolved Hide resolved

@staticmethod
def resolve_fndnt_base_image(use_ecr_registry=False):
mrchtr marked this conversation as resolved.
Show resolved Hide resolved
"""Resolve the correct fndnt base image using python version and fondant version."""
# Set python version to latest supported version
python_version = sys.version_info
if MIN_PYTHON_VERSION <= python_version < MAX_PYTHON_VERSION:
python_version = f"{python_version.major}.{python_version.minor}"
else:
python_version = f"{MAX_PYTHON_VERSION[0]}.{MAX_PYTHON_VERSION[1]}"

fondant_version = version("fondant")
basename = (
"fndnt/fondant-base"
if not use_ecr_registry
else "public.ecr.aws/fndnt/fondant-base"
)
return f"{basename}:{fondant_version}-python{python_version}"
mrchtr marked this conversation as resolved.
Show resolved Hide resolved

def to_dict(self):
return asdict(self)
Expand Down
10 changes: 9 additions & 1 deletion tests/pipeline/test_pipeline.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
"""Fondant pipelines test."""
import copy
import sys
from importlib.metadata import version
from pathlib import Path

import dask.dataframe as dd
Expand Down Expand Up @@ -83,10 +85,16 @@ def load(self) -> dd.DataFrame:
)
return dd.from_pandas(df, npartitions=1)

basename = "fndnt/fondant-base"
fondant_version = version("fondant")
python_version = sys.version_info
python_version = f"{python_version.major}.{python_version.minor}"
fondant_image_name = f"{basename}:{fondant_version}-python{python_version}"

component = ComponentOp.from_ref(Foo, produces={"bar": pa.string()})
assert component.component_spec._specification == {
"name": "Foo",
"image": "fondant:latest",
"image": fondant_image_name,
"description": "python component",
"consumes": {"additionalProperties": True},
"produces": {"additionalProperties": True},
Expand Down
62 changes: 56 additions & 6 deletions tests/pipeline/test_python_component.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import json
import re
import sys
import textwrap
from importlib.metadata import version

import dask.dataframe as dd
import pandas as pd
Expand All @@ -12,6 +14,15 @@
from fondant.pipeline.compiler import DockerCompiler


@pytest.fixture()
def default_fondant_image():
basename = "fndnt/fondant-base"
fondant_version = version("fondant")
python_version = sys.version_info
python_version = f"{python_version.major}.{python_version.minor}"
return f"{basename}:{fondant_version}-python{python_version}"


def test_build_python_script():
@lightweight_component()
class CreateData(DaskLoadComponent):
Expand Down Expand Up @@ -51,7 +62,7 @@ def load(self) -> dd.DataFrame:
)


def test_lightweight_component_sdk():
def test_lightweight_component_sdk(default_fondant_image, caplog):
pipeline = Pipeline(
name="dummy-pipeline",
base_path="./data",
Expand Down Expand Up @@ -93,6 +104,18 @@ def load(self) -> dd.DataFrame:
"produces": {"x": {"type": "int32"}, "y": {"type": "int32"}},
}

# check warning: fondant is not part of the requirements
msg = (
"You are not using a Fondant default base image, and Fondant is not part of"
"your extra requirements. Please make sure that you have installed fondant "
"inside your container. Alternatively, you can should add Fondant to "
"the extra requirements. \n"
"E.g. \n"
'@lightweight_component(..., extra_requires=["fondant"])'
)

assert any(msg in record.message for record in caplog.records)

@lightweight_component()
class AddN(PandasTransformComponent):
def __init__(self, n: int, **kwargs):
Expand All @@ -110,11 +133,13 @@ def transform(self, dataframe: pd.DataFrame) -> pd.DataFrame:
)
assert len(pipeline._graph.keys()) == 1 + 1
assert pipeline._graph["AddN"]["dependencies"] == ["CreateData"]
pipeline._graph["AddN"]["operation"].operation_spec.to_json()

operation_spec_dict = pipeline._graph["AddN"]["operation"].operation_spec.to_dict()
assert operation_spec_dict == {
"specification": {
"name": "AddN",
"image": "fondant:latest",
"image": default_fondant_image,
"description": "python component",
"consumes": {"additionalProperties": True},
"produces": {"additionalProperties": True},
Expand Down Expand Up @@ -160,7 +185,30 @@ def load(self) -> dd.DataFrame:
)
return dd.from_pandas(df, npartitions=1)

CreateData(produces={}, consumes={})
pipeline = Pipeline(
name="dummy-pipeline",
base_path="./data",
)

pipeline.read(
ref=CreateData,
)

assert len(pipeline._graph.keys()) == 1
operation_spec = pipeline._graph["CreateData"]["operation"].operation_spec.to_json()
operation_spec_without_image = json.loads(operation_spec)

assert operation_spec_without_image == {
"specification": {
"name": "CreateData",
"image": "python:3.8-slim-buster",
"description": "python component",
"consumes": {"additionalProperties": True},
"produces": {"additionalProperties": True},
},
"consumes": {},
"produces": {},
}


def test_invalid_load_component():
Expand Down Expand Up @@ -220,7 +268,7 @@ def load(self) -> int:
CreateData(produces={}, consumes={})


def test_lightweight_component_decorator_without_parentheses():
def test_lightweight_component_decorator_without_parentheses(default_fondant_image):
@lightweight_component
class CreateData(DaskLoadComponent):
def load(self) -> dd.DataFrame:
Expand All @@ -237,10 +285,12 @@ def load(self) -> dd.DataFrame:

assert len(pipeline._graph.keys()) == 1
operation_spec = pipeline._graph["CreateData"]["operation"].operation_spec.to_json()
assert json.loads(operation_spec) == {
operation_spec_without_image = json.loads(operation_spec)

assert operation_spec_without_image == {
"specification": {
"name": "CreateData",
"image": "fondant:latest",
"image": default_fondant_image,
"description": "python component",
"consumes": {"additionalProperties": True},
"produces": {"additionalProperties": True},
mrchtr marked this conversation as resolved.
Show resolved Hide resolved
Expand Down
Loading