Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split general component behavior from user implementation #257

Closed
RobbeSneyders opened this issue Jul 3, 2023 · 0 comments · Fixed by #302
Closed

Split general component behavior from user implementation #257

RobbeSneyders opened this issue Jul 3, 2023 · 0 comments · Fixed by #302
Assignees
Labels
Core Core framework

Comments

@RobbeSneyders
Copy link
Member

Currently users implement components by subclassing one of the provided component classes, which themselves provide a lot of general component behavior, such as data loading, validating, and writing.

Instead, we could split this by letting the user implement their components by implementing a class with a specific interface which is passed to a general component class.

Current:

from fondant import PandasTransformComponent 

class MyComponent(PandasTransformComponent):

    def setup(self, kwarg1, kwarg2):
        ...

    def transform(self, dataframe):
        ...

component = MyComponent.from_args()
component.run()

Future:

from fondant import Component

class MyComponent:

    def __init__(self, kwarg1, kwarg2):
        ...

    def transform(self, dataframe):
        ...

Component(MyComponent).run()

We can still provide an interface which the user can subclass, for instance using a Protocol.

This has some advantages:

  • Component implementations can use __init__ instead of setup which will be more familiar to users
  • It's easier to test the custom component implementation, as no dummy arguments need to be provided for the general component behavior (see [LLM pipeline] Language filter component #232 (comment))
@RobbeSneyders RobbeSneyders converted this from a draft issue Jul 3, 2023
@RobbeSneyders RobbeSneyders moved this from Breakdown to Ready for development in Fondant development Jul 3, 2023
@RobbeSneyders RobbeSneyders moved this from Ready for development to In Progress in Fondant development Jul 4, 2023
@RobbeSneyders RobbeSneyders added the Core Core framework label Jul 4, 2023
@RobbeSneyders RobbeSneyders self-assigned this Jul 4, 2023
@RobbeSneyders RobbeSneyders moved this from In Progress to Validation in Fondant development Jul 5, 2023
@github-project-automation github-project-automation bot moved this from Validation to Done in Fondant development Jul 18, 2023
satishjasthi pushed a commit to satishjasthi/fondant that referenced this issue Jul 21, 2023
This PR follows up on the PoC presented in ml6team#268

---

Fixes ml6team#257 

It splits the implementation and execution of components, this has some
advantages:

- Pandas components can use `__init__` instead of setup, which is
probably more familiar to users
- Other components can use `__init__` as well instead of receiving all
arguments to their transform or equivalent method, aligning
implementation of different component types
- Component implementation and execution should be easier to test
separately

I borrowed the executor terminology from KfP.

---

Fixes ml6team#203 

Since I had to update all the components, I also switched some of them
to subclass `PandasTransformComponent` instead of
`DaskTransformComponent`.

---

These changes open some opportunities for additional improvements, but I
propose to tackle those as separate PRs as this PR is already quite huge
due to all the changes to the components.

- [ ] ml6team#300
- [ ] ml6team#301
Hakimovich99 pushed a commit that referenced this issue Oct 16, 2023
This PR follows up on the PoC presented in #268

---

Fixes #257 

It splits the implementation and execution of components, this has some
advantages:

- Pandas components can use `__init__` instead of setup, which is
probably more familiar to users
- Other components can use `__init__` as well instead of receiving all
arguments to their transform or equivalent method, aligning
implementation of different component types
- Component implementation and execution should be easier to test
separately

I borrowed the executor terminology from KfP.

---

Fixes #203 

Since I had to update all the components, I also switched some of them
to subclass `PandasTransformComponent` instead of
`DaskTransformComponent`.

---

These changes open some opportunities for additional improvements, but I
propose to tackle those as separate PRs as this PR is already quite huge
due to all the changes to the components.

- [ ] #300
- [ ] #301
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Core Core framework
Projects
Archived in project
1 participant