Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate consumes and infer produces for Lightweight Python components #752

Open
Tracked by #558 ...
RobbeSneyders opened this issue Jan 2, 2024 · 3 comments
Open
Tracked by #558 ...
Assignees
Labels
Core Core framework

Comments

@RobbeSneyders
Copy link
Member

RobbeSneyders commented Jan 2, 2024

When the user uses Lightweight Python components (#558) we want to get any information we currently get from the component spec from the provided Python code.

For the consumes section, we can assume it matches the schema of the dataset the operation is applied to, possibly altered by the consumes argument passed to the apply method.

For the produces section, the user can either provide a schema via the produces argument on the apply method, or we can try to infer it by simulating the transform function. We could do this by generating dummy data based on the consumes schema, and applying the transform method on it.

This only makes sense for Transform components since we always expect the user to provide a produces schema for a Read component, and a Write component doesn't produce anything.

Inferring the produces schema by simulation would also validate the consumes schema if it succeeds. It doesn't invalidate it when failing though, since there can be multiple reasons for a failed simulation: either the consumes schema is incorrect, there's a bug in the component, or a bug in the dummy data generation.

@RobbeSneyders
Copy link
Member Author

See this gist for a quick PoC to simulate transform components using pandera.

@RobbeSneyders RobbeSneyders moved this from Backlog to Ready for development in Fondant development Jan 2, 2024
@RobbeSneyders RobbeSneyders added the Core Core framework label Jan 2, 2024
@RobbeSneyders RobbeSneyders changed the title Validate consumes and infer produces from transform function Validate consumes and infer produces for Lightweight Python components Jan 2, 2024
@mrchtr mrchtr self-assigned this Jan 22, 2024
@mrchtr mrchtr moved this from Ready for development to In Progress in Fondant development Jan 22, 2024
@RobbeSneyders
Copy link
Member Author

@mrchtr
Copy link
Contributor

mrchtr commented Jan 30, 2024

Happy to hear additional opinions on #806.
Implements a produce infer for the PandasTransformer components under the prerequisites that all needed requirements are installed on the local machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Core Core framework
Projects
Status: On hold
Development

Successfully merging a pull request may close this issue.

2 participants