Skip to content

Commit

Permalink
Update lightweight docs (#817)
Browse files Browse the repository at this point in the history
Changed the name of the components for consistency. 

Custom components -> can be **Lightweight components** or
**Containerized components**

The name **python lightweight/containerized component** makes a bit less
sense imo since everything is python based

Other changes:
* resized the architecture image since it would take some time to load
in the the docs
  • Loading branch information
PhilippeMoussalli authored Jan 29, 2024
1 parent d0f438e commit c45abe3
Show file tree
Hide file tree
Showing 13 changed files with 67 additions and 60 deletions.
Binary file modified docs/art/architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
18 changes: 9 additions & 9 deletions docs/components/components.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,10 +71,10 @@ We can distinguish two different types of components:

- **Custom components** are completely defined and implemented by the user. There are two ways to
define a custom component:
- **Lightweight Python Components**: Create a component from a self-contained Python function.
- **Lightweight Components**: Create a component from a self-contained Python function.
This is the easiest way to create a custom component. It allows you to define a component without
having to build a custom docker image or defining a component specification.
- **Containerized Python Components**: You can build your code into a docker image
- **Containerized Components**: You can build your code into a docker image
and write an accompanying component specification that refers to it. This is used for
more complex components that require additional dependencies (e.g. GPU support).

Expand All @@ -85,8 +85,8 @@ We can distinguish two different types of components:
### Custom components


#### Lightweight Python Components
To define a lightweight python component, you can create a self-contained python function that
#### Lightweight Components
To define a lightweight component, you can create a self-contained python function that
implements the logic of your component.


Expand Down Expand Up @@ -119,11 +119,11 @@ _ = dataset.apply(
)
```

See our [best practices on creating a custom python component](../components/custom_python_component.md).
See our [best practices on creating a lightweight component](../components/lightweight_components.md).


#### Containerized Python Components
To define your own containerized custom component, you can build your code into a docker image and write an
#### Containerized Components
To define your own containerized component, you can build your code into a docker image and write an
accompanying component specification that refers to it.

A typical file structure for a custom component looks like this:
Expand Down Expand Up @@ -160,12 +160,12 @@ dataset = dataset.apply(
)
```

See our [best practices on creating a custom containerized component](../components/custom_containerized_component.md).
See our [best practices on creating a containerized component](../components/containerized_components.md).


### Reusable components

Reusable components are out of the box containerized python components from the Fondant Hub that you can easily add
Reusable components are out of the box containerized components from the Fondant Hub that you can easily add
to your pipeline:

```python
Expand Down
Original file line number Diff line number Diff line change
@@ -1,22 +1,22 @@
# Creating custom containerized components
# Creating containerized components

Fondant makes it easy to build data preparation pipelines leveraging reusable components. Fondant
provides a lot
of [components out of the box](https://fondant.ai/en/latest/components/hub/), but you can also
define your own custom containerized components.
define your own containerized components.

Containerized components are useful when you want to share the components within your organization
or community.
If you don't need your component to be shareable, we recommend starting
with a simpler [Python components](../components/custom_python_component.md) instead.
with a simpler [lightweight components](../components/lightweight_components.md) instead.

To make sure containerized components are reusable, they should implement a single logical data
processing
step (like captioning images or removing Personal Identifiable Information [PII] from text.)
If a component grows too large, consider splitting it into multiple separate components each
tackling one logical part.

To implement a custom containerized component, a couple of files need to be defined:
To implement a containerized component, a couple of files need to be defined:

- [Fondant component specification](#fondant-component-specification)
- [`main.py` script in a `src` folder](#mainpy-script)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Creating custom python components
# Creating lightweight components

Python components are a great way to implement custom data processing steps in your pipeline.
Lightweight components are a great way to implement custom data processing steps in your pipeline.
They are easy to implement and can be reused across different pipelines. If you want to
build more complex components that require additional dependencies (e.g. GPU support), you can
also build a containerized component. See the [containerized component guide](../components/custom_containerized_component.md) for more info.
also build a containerized component. See the [containerized component guide](../components/containerized_components.md) for more info.

To implement a custom python component, you simply need to create a python script that implements
To implement a lightweight component, you simply need to create a python script that implements
the component logic. Here is an example of a pipeline composed of two custom components,
one that creates a dataset and one that adds a number to a column of the dataset:

Expand Down Expand Up @@ -116,7 +116,7 @@ This will omit the `y` column from the loaded data, which can be useful if you a
datasets and want to avoid loading unnecessary data.

If you want to publish your component to the Fondant Hub, you will need to convert
it to containerized component. See the [containerized component guide](../components/custom_containerized_component.md) for more info.
it to containerized component. See the [containerized component guide](../components/containerized_components.md) for more info.

**Note:** Python based components also support defining dynamic fields by default. See the [dynamic fields guide](../components/component_spec.md#dynamic-fields) for more info
on dynamic fields.
12 changes: 6 additions & 6 deletions docs/documentation_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,12 @@ own [custom components](guides/implement_custom_components.md).
Learn how to use Fondant to build your own data processing pipeline.

-> Design your own fondant [pipeline](pipeline.md) using the Fondant pipeline SDK.
-> Use existing [reusable components](components/hub.md) to build your pipeline.
-> Build your own custom [python component](components/custom_containerized_component.md)
and share them by packaging them into [containerized component](components/custom_containerized_component.md) using the Fondant component
SDK.
-> Use existing [reusable components](components/hub.md) to build your pipeline.
-> Build your own custom [lightweight component](components/lightweight_components.md)
and share them by packaging them into [containerized component](components/containerized_components.md) using the Fondant component
SDK.
-> Learn how to publish your own [components](components/publishing_components.md) to a container
registry so that you can reuse them in your pipelines.
registry so that you can reuse them in your pipelines.

## Components hub

Expand Down Expand Up @@ -80,7 +80,7 @@ Learn about some of the more advanced concepts in Fondant.
hood.
-> Learn how Fondant uses [caching](caching.md) to speed up your pipeline development.
-> Find out how Fondant uses [partitions](partitions.md) to parallelize and scale your pipeline and
how you can use it to your advantage.
how you can use it to your advantage.
-> Learn how to setup a Kubeflow to run your Fondant pipeline on a [Kubeflow cluster](runners/kfp_infrastructure.md).

## Contributing
Expand Down
14 changes: 7 additions & 7 deletions docs/guides/implement_custom_components.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ In this tutorial, we will guide you through the process of implementing your ver
component. We will illustrate this by building a transform component that uppercases the `alt_text` of the image dataset.

If you want to build a complex custom component or share the component within your organization or even the community,
take a look at how to build [reusable components](../components/custom_containerized_component.md).
take a look at how to build [reusable components](../components/containerized_components.md).

This pipeline is an extension of the one introduced in
the [previous tutorial](../guides/build_a_simple_pipeline.md).
Expand Down Expand Up @@ -97,22 +97,22 @@ class UpperCaseTextComponent(PandasTransformComponent):
!!! note "IMPORTANT"

Note that we have used a decorator `@lightweight_component`. This decorator is necessary to inform
Fondant that this class is a Python component and can be used as a component in your pipeline.
Fondant that this class is a lightweight component and can be used as a component in your pipeline.

We apply the uppercase transformation to the `alt_text` column of the dataframe. Afterward, we
return the transformed dataframe from the `transform` method, which Fondant will use to
automatically update the index.

The Python components provide an easy way to start with your component implementation. However, the
Python component implementation still allows you to define all advanced component configurations,
The lightweight components provide an easy way to start with your component implementation. However, the
lightweight component implementation still allows you to define all advanced component configurations,
including installing extra arguments or defining component arguments. These concepts are more
advanced and not needed for quick exploration and experiments. You can find more information on
these topics in
the [documentation of the Python components](../components/custom_python_component.md).
the [documentation of the lightweight components](../components/lightweight_components.md).

### Using the component

Now were we have defined our Python component we can start using it in our pipeline.
Now were we have defined our lightweight component we can start using it in our pipeline.
For instance we can put this component at the end of our pipeline.

```python
Expand All @@ -130,7 +130,7 @@ Now, you can execute the pipeline once more and examine the results. In the fina
the `alt_text` is in uppercase.

Of course, it is debatable whether uppercasing the alt_text is genuinely useful. This is just a
constructive and simple example to showcase how to use Python components as glue code within your
constructive and simple example to showcase how to use lightweight components as glue code within your
pipeline, helping you connect reusable components to each other.

## Next steps
Expand Down
4 changes: 2 additions & 2 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,8 @@ nav:
- Pipeline: pipeline.md
- Components:
- Components: components/components.md
- Python components: components/custom_python_component.md
- Containerized components: components/custom_containerized_component.md
- Lightweight components: components/lightweight_components.md
- Containerized components: components/containerized_components.md
- Component spec: components/component_spec.md
- Publishing components: components/publishing_components.md
- Runners:
Expand Down
4 changes: 2 additions & 2 deletions src/fondant/core/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,5 +27,5 @@ class UnsupportedTypeAnnotation(FondantException):
"""Thrown when an unsupported type annotation is encountered during type inference."""


class InvalidPythonComponent(FondantException):
"""Thrown when a component is not a valid Python component."""
class InvalidLightweightComponent(FondantException):
"""Thrown when a component is not a valid lightweight component."""
6 changes: 5 additions & 1 deletion src/fondant/pipeline/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
from .lightweight_component import Image, PythonComponent, lightweight_component # noqa
from .lightweight_component import ( # noqa
Image,
LightweightComponent,
lightweight_component,
)
from .pipeline import ( # noqa
VALID_ACCELERATOR_TYPES,
VALID_VERTEX_ACCELERATOR_TYPES,
Expand Down
10 changes: 5 additions & 5 deletions src/fondant/pipeline/lightweight_component.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ def to_dict(self):
return asdict(self)


class PythonComponent(BaseComponent):
class LightweightComponent(BaseComponent):
@classmethod
def image(cls) -> Image:
raise NotImplementedError
Expand All @@ -71,7 +71,7 @@ def lightweight_component(
extra_requires: t.Optional[t.List[str]] = None,
base_image: t.Optional[str] = None,
):
"""Decorator to enable a python component."""
"""Decorator to enable a lightweight component."""

def wrapper(cls):
script = build_python_script(cls)
Expand Down Expand Up @@ -140,7 +140,7 @@ def validate_abstract_methods_are_implemented(cls):
]
if len(abstract_methods) >= 1:
msg = (
f"Every required function must be overridden in the PythonComponent. "
f"Every required function must be overridden in the LightweightComponent. "
f"Missing implementations for the following functions: {abstract_methods}"
)
raise ValueError(
Expand All @@ -153,12 +153,12 @@ def validate_abstract_methods_are_implemented(cls):

# updated=() is needed to prevent an attempt to update the class's __dict__
@wraps(cls, updated=())
class PythonComponentOp(cls, PythonComponent):
class LightweightComponentOp(cls, LightweightComponent):
@classmethod
def image(cls) -> Image:
return image

return PythonComponentOp
return LightweightComponentOp

# Call wrapper with function (`args[0]`) when no additional arguments were passed
if args:
Expand Down
23 changes: 13 additions & 10 deletions src/fondant/pipeline/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,13 @@

from fondant.component import BaseComponent
from fondant.core.component_spec import ComponentSpec, OperationSpec
from fondant.core.exceptions import InvalidPipelineDefinition, InvalidPythonComponent
from fondant.core.exceptions import (
InvalidLightweightComponent,
InvalidPipelineDefinition,
)
from fondant.core.manifest import Manifest
from fondant.core.schema import Field
from fondant.pipeline import Image, PythonComponent
from fondant.pipeline import Image, LightweightComponent
from fondant.pipeline.argument_inference import infer_arguments

logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -212,10 +215,10 @@ def from_ref(cls, ref: t.Any, **kwargs) -> "ComponentOp":
or a python component class.
"""
if inspect.isclass(ref) and issubclass(ref, BaseComponent):
if issubclass(ref, PythonComponent):
if issubclass(ref, LightweightComponent):
name = ref.__name__
image = ref.image()
description = ref.__doc__ or "python component"
description = ref.__doc__ or "lightweight component"

component_spec = ComponentSpec(
name,
Expand All @@ -236,9 +239,9 @@ def from_ref(cls, ref: t.Any, **kwargs) -> "ComponentOp":
**kwargs,
)
else:
msg = """Reference is not a valid Python component.
msg = """Reference is not a valid lightweight component.
Make sure the component is decorated properly."""
raise InvalidPythonComponent(msg)
raise InvalidLightweightComponent(msg)

elif isinstance(ref, (str, Path)):
operation = cls.from_component_yaml(
Expand All @@ -247,7 +250,7 @@ def from_ref(cls, ref: t.Any, **kwargs) -> "ComponentOp":
)
else:
msg = f"""Invalid reference type: {type(ref)}.
Expected a string, Path, or a Python component class."""
Expected a string, Path, or a lightweight component class."""
raise ValueError(msg)
return operation

Expand Down Expand Up @@ -417,7 +420,7 @@ def read(
Args:
ref: The name of a reusable component, or the path to the directory containing
a custom component, or a python component class.
a containerized component, or a lightweight component class.
produces: A mapping to update the fields produced by the operation as defined in the
component spec. The keys are the names of the fields to be received by the
component, while the values are the type of the field, or the name of the field to
Expand Down Expand Up @@ -638,7 +641,7 @@ def apply(
Args:
ref: The name of a reusable component, or the path to the directory containing
a custom component, or a python component class.
a custom component, or a lightweight component class.
consumes: A mapping to update the fields consumed by the operation as defined in the
component spec. The keys are the names of the fields to be received by the
component, while the values are the type of the field, or the name of the field to
Expand Down Expand Up @@ -753,7 +756,7 @@ def write(
Args:
ref: The name of a reusable component, or the path to the directory containing
a custom component, or a python component class.
a custom component, or a lightweight component class.
consumes: A mapping to update the fields consumed by the operation as defined in the
component spec. The keys are the names of the fields to be received by the
component, while the values are the type of the field, or the name of the field to
Expand Down
4 changes: 2 additions & 2 deletions tests/pipeline/test_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ def load(self) -> dd.DataFrame:
assert component.component_spec._specification == {
"name": "Foo",
"image": fondant_image_name,
"description": "python component",
"description": "lightweight component",
"consumes": {"additionalProperties": True},
"produces": {"additionalProperties": True},
}
Expand All @@ -105,7 +105,7 @@ def test_component_op_bad_ref():
with pytest.raises(
ValueError,
match="""Invalid reference type: <class 'int'>.
Expected a string, Path, or a Python component class.""",
Expected a string, Path, or a lightweight component class.""",
):
ComponentOp.from_ref(123)

Expand Down
Loading

0 comments on commit c45abe3

Please sign in to comment.