diff --git a/docs/art/architecture.png b/docs/art/architecture.png index 8d04d34f..754d67ee 100644 Binary files a/docs/art/architecture.png and b/docs/art/architecture.png differ diff --git a/docs/components/components.md b/docs/components/components.md index 3cdb23f5..73d0ea92 100644 --- a/docs/components/components.md +++ b/docs/components/components.md @@ -71,10 +71,10 @@ We can distinguish two different types of components: - **Custom components** are completely defined and implemented by the user. There are two ways to define a custom component: - - **Lightweight Python Components**: Create a component from a self-contained Python function. + - **Lightweight Components**: Create a component from a self-contained Python function. This is the easiest way to create a custom component. It allows you to define a component without having to build a custom docker image or defining a component specification. - - **Containerized Python Components**: You can build your code into a docker image + - **Containerized Components**: You can build your code into a docker image and write an accompanying component specification that refers to it. This is used for more complex components that require additional dependencies (e.g. GPU support). @@ -85,8 +85,8 @@ We can distinguish two different types of components: ### Custom components -#### Lightweight Python Components -To define a lightweight python component, you can create a self-contained python function that +#### Lightweight Components +To define a lightweight component, you can create a self-contained python function that implements the logic of your component. @@ -119,11 +119,11 @@ _ = dataset.apply( ) ``` -See our [best practices on creating a custom python component](../components/custom_python_component.md). +See our [best practices on creating a lightweight component](../components/lightweight_components.md). -#### Containerized Python Components -To define your own containerized custom component, you can build your code into a docker image and write an +#### Containerized Components +To define your own containerized component, you can build your code into a docker image and write an accompanying component specification that refers to it. A typical file structure for a custom component looks like this: @@ -160,12 +160,12 @@ dataset = dataset.apply( ) ``` -See our [best practices on creating a custom containerized component](../components/custom_containerized_component.md). +See our [best practices on creating a containerized component](../components/containerized_components.md). ### Reusable components -Reusable components are out of the box containerized python components from the Fondant Hub that you can easily add +Reusable components are out of the box containerized components from the Fondant Hub that you can easily add to your pipeline: ```python diff --git a/docs/components/custom_containerized_component.md b/docs/components/containerized_components.md similarity index 91% rename from docs/components/custom_containerized_component.md rename to docs/components/containerized_components.md index 1c1b286a..49e0225a 100644 --- a/docs/components/custom_containerized_component.md +++ b/docs/components/containerized_components.md @@ -1,14 +1,14 @@ -# Creating custom containerized components +# Creating containerized components Fondant makes it easy to build data preparation pipelines leveraging reusable components. Fondant provides a lot of [components out of the box](https://fondant.ai/en/latest/components/hub/), but you can also -define your own custom containerized components. +define your own containerized components. Containerized components are useful when you want to share the components within your organization or community. If you don't need your component to be shareable, we recommend starting -with a simpler [Python components](../components/custom_python_component.md) instead. +with a simpler [lightweight components](../components/lightweight_components.md) instead. To make sure containerized components are reusable, they should implement a single logical data processing @@ -16,7 +16,7 @@ step (like captioning images or removing Personal Identifiable Information [PII] If a component grows too large, consider splitting it into multiple separate components each tackling one logical part. -To implement a custom containerized component, a couple of files need to be defined: +To implement a containerized component, a couple of files need to be defined: - [Fondant component specification](#fondant-component-specification) - [`main.py` script in a `src` folder](#mainpy-script) diff --git a/docs/components/custom_python_component.md b/docs/components/lightweight_components.md similarity index 92% rename from docs/components/custom_python_component.md rename to docs/components/lightweight_components.md index a9781449..aa3d3c15 100644 --- a/docs/components/custom_python_component.md +++ b/docs/components/lightweight_components.md @@ -1,11 +1,11 @@ -# Creating custom python components +# Creating lightweight components -Python components are a great way to implement custom data processing steps in your pipeline. +Lightweight components are a great way to implement custom data processing steps in your pipeline. They are easy to implement and can be reused across different pipelines. If you want to build more complex components that require additional dependencies (e.g. GPU support), you can -also build a containerized component. See the [containerized component guide](../components/custom_containerized_component.md) for more info. +also build a containerized component. See the [containerized component guide](../components/containerized_components.md) for more info. -To implement a custom python component, you simply need to create a python script that implements +To implement a lightweight component, you simply need to create a python script that implements the component logic. Here is an example of a pipeline composed of two custom components, one that creates a dataset and one that adds a number to a column of the dataset: @@ -116,7 +116,7 @@ This will omit the `y` column from the loaded data, which can be useful if you a datasets and want to avoid loading unnecessary data. If you want to publish your component to the Fondant Hub, you will need to convert -it to containerized component. See the [containerized component guide](../components/custom_containerized_component.md) for more info. +it to containerized component. See the [containerized component guide](../components/containerized_components.md) for more info. **Note:** Python based components also support defining dynamic fields by default. See the [dynamic fields guide](../components/component_spec.md#dynamic-fields) for more info on dynamic fields. diff --git a/docs/documentation_guide.md b/docs/documentation_guide.md index 0845ba75..3f2bf8e5 100644 --- a/docs/documentation_guide.md +++ b/docs/documentation_guide.md @@ -16,12 +16,12 @@ own [custom components](guides/implement_custom_components.md). Learn how to use Fondant to build your own data processing pipeline. -> Design your own fondant [pipeline](pipeline.md) using the Fondant pipeline SDK. --> Use existing [reusable components](components/hub.md) to build your pipeline. --> Build your own custom [python component](components/custom_containerized_component.md) -and share them by packaging them into [containerized component](components/custom_containerized_component.md) using the Fondant component -SDK. +-> Use existing [reusable components](components/hub.md) to build your pipeline. +-> Build your own custom [lightweight component](components/lightweight_components.md) +and share them by packaging them into [containerized component](components/containerized_components.md) using the Fondant component +SDK. -> Learn how to publish your own [components](components/publishing_components.md) to a container -registry so that you can reuse them in your pipelines. +registry so that you can reuse them in your pipelines. ## Components hub @@ -80,7 +80,7 @@ Learn about some of the more advanced concepts in Fondant. hood. -> Learn how Fondant uses [caching](caching.md) to speed up your pipeline development. -> Find out how Fondant uses [partitions](partitions.md) to parallelize and scale your pipeline and -how you can use it to your advantage. +how you can use it to your advantage. -> Learn how to setup a Kubeflow to run your Fondant pipeline on a [Kubeflow cluster](runners/kfp_infrastructure.md). ## Contributing diff --git a/docs/guides/implement_custom_components.md b/docs/guides/implement_custom_components.md index 78c50cef..c38f57c8 100644 --- a/docs/guides/implement_custom_components.md +++ b/docs/guides/implement_custom_components.md @@ -16,7 +16,7 @@ In this tutorial, we will guide you through the process of implementing your ver component. We will illustrate this by building a transform component that uppercases the `alt_text` of the image dataset. If you want to build a complex custom component or share the component within your organization or even the community, -take a look at how to build [reusable components](../components/custom_containerized_component.md). +take a look at how to build [reusable components](../components/containerized_components.md). This pipeline is an extension of the one introduced in the [previous tutorial](../guides/build_a_simple_pipeline.md). @@ -97,22 +97,22 @@ class UpperCaseTextComponent(PandasTransformComponent): !!! note "IMPORTANT" Note that we have used a decorator `@lightweight_component`. This decorator is necessary to inform - Fondant that this class is a Python component and can be used as a component in your pipeline. + Fondant that this class is a lightweight component and can be used as a component in your pipeline. We apply the uppercase transformation to the `alt_text` column of the dataframe. Afterward, we return the transformed dataframe from the `transform` method, which Fondant will use to automatically update the index. -The Python components provide an easy way to start with your component implementation. However, the -Python component implementation still allows you to define all advanced component configurations, +The lightweight components provide an easy way to start with your component implementation. However, the +lightweight component implementation still allows you to define all advanced component configurations, including installing extra arguments or defining component arguments. These concepts are more advanced and not needed for quick exploration and experiments. You can find more information on these topics in -the [documentation of the Python components](../components/custom_python_component.md). +the [documentation of the lightweight components](../components/lightweight_components.md). ### Using the component -Now were we have defined our Python component we can start using it in our pipeline. +Now were we have defined our lightweight component we can start using it in our pipeline. For instance we can put this component at the end of our pipeline. ```python @@ -130,7 +130,7 @@ Now, you can execute the pipeline once more and examine the results. In the fina the `alt_text` is in uppercase. Of course, it is debatable whether uppercasing the alt_text is genuinely useful. This is just a -constructive and simple example to showcase how to use Python components as glue code within your +constructive and simple example to showcase how to use lightweight components as glue code within your pipeline, helping you connect reusable components to each other. ## Next steps diff --git a/mkdocs.yml b/mkdocs.yml index c2e278af..7f1b8880 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -47,8 +47,8 @@ nav: - Pipeline: pipeline.md - Components: - Components: components/components.md - - Python components: components/custom_python_component.md - - Containerized components: components/custom_containerized_component.md + - Lightweight components: components/lightweight_components.md + - Containerized components: components/containerized_components.md - Component spec: components/component_spec.md - Publishing components: components/publishing_components.md - Runners: diff --git a/src/fondant/core/exceptions.py b/src/fondant/core/exceptions.py index c638e700..66fa7f87 100644 --- a/src/fondant/core/exceptions.py +++ b/src/fondant/core/exceptions.py @@ -27,5 +27,5 @@ class UnsupportedTypeAnnotation(FondantException): """Thrown when an unsupported type annotation is encountered during type inference.""" -class InvalidPythonComponent(FondantException): - """Thrown when a component is not a valid Python component.""" +class InvalidLightweightComponent(FondantException): + """Thrown when a component is not a valid lightweight component.""" diff --git a/src/fondant/pipeline/__init__.py b/src/fondant/pipeline/__init__.py index 0c3f60ac..7db85a67 100644 --- a/src/fondant/pipeline/__init__.py +++ b/src/fondant/pipeline/__init__.py @@ -1,4 +1,8 @@ -from .lightweight_component import Image, PythonComponent, lightweight_component # noqa +from .lightweight_component import ( # noqa + Image, + LightweightComponent, + lightweight_component, +) from .pipeline import ( # noqa VALID_ACCELERATOR_TYPES, VALID_VERTEX_ACCELERATOR_TYPES, diff --git a/src/fondant/pipeline/lightweight_component.py b/src/fondant/pipeline/lightweight_component.py index c45079b8..73c15163 100644 --- a/src/fondant/pipeline/lightweight_component.py +++ b/src/fondant/pipeline/lightweight_component.py @@ -60,7 +60,7 @@ def to_dict(self): return asdict(self) -class PythonComponent(BaseComponent): +class LightweightComponent(BaseComponent): @classmethod def image(cls) -> Image: raise NotImplementedError @@ -71,7 +71,7 @@ def lightweight_component( extra_requires: t.Optional[t.List[str]] = None, base_image: t.Optional[str] = None, ): - """Decorator to enable a python component.""" + """Decorator to enable a lightweight component.""" def wrapper(cls): script = build_python_script(cls) @@ -140,7 +140,7 @@ def validate_abstract_methods_are_implemented(cls): ] if len(abstract_methods) >= 1: msg = ( - f"Every required function must be overridden in the PythonComponent. " + f"Every required function must be overridden in the LightweightComponent. " f"Missing implementations for the following functions: {abstract_methods}" ) raise ValueError( @@ -153,12 +153,12 @@ def validate_abstract_methods_are_implemented(cls): # updated=() is needed to prevent an attempt to update the class's __dict__ @wraps(cls, updated=()) - class PythonComponentOp(cls, PythonComponent): + class LightweightComponentOp(cls, LightweightComponent): @classmethod def image(cls) -> Image: return image - return PythonComponentOp + return LightweightComponentOp # Call wrapper with function (`args[0]`) when no additional arguments were passed if args: diff --git a/src/fondant/pipeline/pipeline.py b/src/fondant/pipeline/pipeline.py index 0711c9f3..66fd0830 100644 --- a/src/fondant/pipeline/pipeline.py +++ b/src/fondant/pipeline/pipeline.py @@ -19,10 +19,13 @@ from fondant.component import BaseComponent from fondant.core.component_spec import ComponentSpec, OperationSpec -from fondant.core.exceptions import InvalidPipelineDefinition, InvalidPythonComponent +from fondant.core.exceptions import ( + InvalidLightweightComponent, + InvalidPipelineDefinition, +) from fondant.core.manifest import Manifest from fondant.core.schema import Field -from fondant.pipeline import Image, PythonComponent +from fondant.pipeline import Image, LightweightComponent from fondant.pipeline.argument_inference import infer_arguments logger = logging.getLogger(__name__) @@ -212,10 +215,10 @@ def from_ref(cls, ref: t.Any, **kwargs) -> "ComponentOp": or a python component class. """ if inspect.isclass(ref) and issubclass(ref, BaseComponent): - if issubclass(ref, PythonComponent): + if issubclass(ref, LightweightComponent): name = ref.__name__ image = ref.image() - description = ref.__doc__ or "python component" + description = ref.__doc__ or "lightweight component" component_spec = ComponentSpec( name, @@ -236,9 +239,9 @@ def from_ref(cls, ref: t.Any, **kwargs) -> "ComponentOp": **kwargs, ) else: - msg = """Reference is not a valid Python component. + msg = """Reference is not a valid lightweight component. Make sure the component is decorated properly.""" - raise InvalidPythonComponent(msg) + raise InvalidLightweightComponent(msg) elif isinstance(ref, (str, Path)): operation = cls.from_component_yaml( @@ -247,7 +250,7 @@ def from_ref(cls, ref: t.Any, **kwargs) -> "ComponentOp": ) else: msg = f"""Invalid reference type: {type(ref)}. - Expected a string, Path, or a Python component class.""" + Expected a string, Path, or a lightweight component class.""" raise ValueError(msg) return operation @@ -417,7 +420,7 @@ def read( Args: ref: The name of a reusable component, or the path to the directory containing - a custom component, or a python component class. + a containerized component, or a lightweight component class. produces: A mapping to update the fields produced by the operation as defined in the component spec. The keys are the names of the fields to be received by the component, while the values are the type of the field, or the name of the field to @@ -638,7 +641,7 @@ def apply( Args: ref: The name of a reusable component, or the path to the directory containing - a custom component, or a python component class. + a custom component, or a lightweight component class. consumes: A mapping to update the fields consumed by the operation as defined in the component spec. The keys are the names of the fields to be received by the component, while the values are the type of the field, or the name of the field to @@ -753,7 +756,7 @@ def write( Args: ref: The name of a reusable component, or the path to the directory containing - a custom component, or a python component class. + a custom component, or a lightweight component class. consumes: A mapping to update the fields consumed by the operation as defined in the component spec. The keys are the names of the fields to be received by the component, while the values are the type of the field, or the name of the field to diff --git a/tests/pipeline/test_pipeline.py b/tests/pipeline/test_pipeline.py index 9309d5dd..3360c7a4 100644 --- a/tests/pipeline/test_pipeline.py +++ b/tests/pipeline/test_pipeline.py @@ -95,7 +95,7 @@ def load(self) -> dd.DataFrame: assert component.component_spec._specification == { "name": "Foo", "image": fondant_image_name, - "description": "python component", + "description": "lightweight component", "consumes": {"additionalProperties": True}, "produces": {"additionalProperties": True}, } @@ -105,7 +105,7 @@ def test_component_op_bad_ref(): with pytest.raises( ValueError, match="""Invalid reference type: . - Expected a string, Path, or a Python component class.""", + Expected a string, Path, or a lightweight component class.""", ): ComponentOp.from_ref(123) diff --git a/tests/pipeline/test_python_component.py b/tests/pipeline/test_python_component.py index 40b7eb3a..0272e337 100644 --- a/tests/pipeline/test_python_component.py +++ b/tests/pipeline/test_python_component.py @@ -9,7 +9,7 @@ import pyarrow as pa import pytest from fondant.component import DaskLoadComponent, PandasTransformComponent -from fondant.core.exceptions import InvalidPythonComponent +from fondant.core.exceptions import InvalidLightweightComponent from fondant.pipeline import Pipeline, lightweight_component from fondant.pipeline.compiler import DockerCompiler @@ -96,7 +96,7 @@ def load(self) -> dd.DataFrame: "specification": { "name": "CreateData", "image": "python:3.8-slim-buster", - "description": "python component", + "description": "lightweight component", "consumes": {"additionalProperties": True}, "produces": {"additionalProperties": True}, }, @@ -140,7 +140,7 @@ def transform(self, dataframe: pd.DataFrame) -> pd.DataFrame: "specification": { "name": "AddN", "image": default_fondant_image, - "description": "python component", + "description": "lightweight component", "consumes": {"additionalProperties": True}, "produces": {"additionalProperties": True}, "args": {"n": {"type": "int"}}, @@ -163,7 +163,7 @@ class Foo(DaskLoadComponent): def load(self) -> str: return "bar" - with pytest.raises(InvalidPythonComponent): + with pytest.raises(InvalidLightweightComponent): _ = pipeline.read( ref=Foo, produces={"x": pa.int32(), "y": pa.int32()}, @@ -202,7 +202,7 @@ def load(self) -> dd.DataFrame: "specification": { "name": "CreateData", "image": "python:3.8-slim-buster", - "description": "python component", + "description": "lightweight component", "consumes": {"additionalProperties": True}, "produces": {"additionalProperties": True}, }, @@ -214,7 +214,7 @@ def load(self) -> dd.DataFrame: def test_invalid_load_component(): with pytest.raises( # noqa: PT012 ValueError, - match="Every required function must be overridden in the PythonComponent. " + match="Every required function must be overridden in the LightweightComponent. " "Missing implementations for the following functions: \\['load'\\]", ): @@ -291,7 +291,7 @@ def load(self) -> dd.DataFrame: "specification": { "name": "CreateData", "image": default_fondant_image, - "description": "python component", + "description": "lightweight component", "consumes": {"additionalProperties": True}, "produces": {"additionalProperties": True}, },