Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lightweight Python components #558

Closed
10 of 11 tasks
Tracked by #563
RobbeSneyders opened this issue Oct 27, 2023 · 2 comments
Closed
10 of 11 tasks
Tracked by #563

Lightweight Python components #558

RobbeSneyders opened this issue Oct 27, 2023 · 2 comments
Assignees
Labels
Core Core framework Ease of use

Comments

@RobbeSneyders
Copy link
Member

RobbeSneyders commented Oct 27, 2023

The goal of this issue is to let user define custom components in Python only instead of having to build a complete docker component with component spec.


Original description:

The Dockerfile of most components is identical, but the user still needs to provide the whole component structure to create a component, which introduces unnecessary complexity.

I see two options to do this:

  • Built the custom user code on top of a base image and use it like a docker component.
  • Run the base image and provide the custom user code as an argument. Any additional dependencies are installed at runtime.

In both cases, we could make the base image configurable by the user.

The first option has some downsides:

  • We need access to a user container registry to push to
  • The base image would still need to be pulled before building and then pushing, which can take some time

The second option also has some downsides:

  • Installs need to happen every time at runtime.
  • It might be hard to pass complex code structures via argument.

We could have a look at how Kubeflow's lightweight Python components are implemented.

We might also be able to generate the component spec from the Python code:

  • Arguments could be inferred by inspecting the __init__ method.
  • If we find a way to typehint dataframes, we could also infer the consumes and produces section. Tools like Pandera could be helpful in this.

@GeorgesLorre
Copy link
Collaborator

We will need to be strict with the lightweight and clearly decide what it can and can't do. Otherwise it will become very complex very fast. We also need to think about how easy it would be to convert (for a user) a lightweight component to a real custom component (and a reusable one).

I'm a bit concerned these will become the first class way of doing things which makes sense since it the easiest but it will unload a lot of complexity to the fondant side.

@RobbeSneyders
Copy link
Member Author

I think it will indeed become the default way to run custom code, but I believe we need to make that easier to make Fondant more useful. Moving from a Lightweight Python component to a Docker component should only be needed when a user wants to make the component reusable or runs into a limitation of the Lightweight Python component.

I would remove as many limitations as possible, but can see the following:

  • Everything should be encapsulated in the Component class. Probably even non-Fondant imports (Fondant imports could already be needed to define the class, we can add those ourselves like kfp does.
  • Only support Python dependencies. If a component needs non-Python dependencies, it needs to be dockerized. This will allow us to be more flexible about which base docker image to run the component on or even run it outside of docker (eg. with a VenvRunner).

Moving from a Lightweight Python component to a reusable Docker component should be quite straightforward since the Python code can stay the same. The user will need to add a component specification and Dockerfile, and build it and make it available for remote runners.

I think the additional complexity on the Fondant side can be limited and best effort. Eg. we can try to infer types so the user doesn't need to define the consumes and produces arguments, but they can still do that as a fallback when inference does not succeed for some reason.

@RobbeSneyders RobbeSneyders moved this from Ready for development to In Progress in Fondant development Jan 29, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in Fondant development Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Core Core framework Ease of use
Projects
Archived in project
Development

No branches or pull requests

2 participants