Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Suggestion]: Make pyspark an optional dependency #456

Open
FredrikBakken opened this issue Aug 28, 2024 · 1 comment
Open

[Suggestion]: Make pyspark an optional dependency #456

FredrikBakken opened this issue Aug 28, 2024 · 1 comment

Comments

@FredrikBakken
Copy link

Hi 👋

We are currently experimenting with using sparkdantic on our Spark schema definitions in our pipelines inside Databricks. However, based on our current configuration, we are bound to installing all dependencies inside notebook scopes - rather than installing dependencies on the cluster-level. This means we need to run the !pip install command for each dependency used in the beginning of our notebooks.

We've noticed that the sparkdantic installation is taking a lot of time to install as it also installs pyspark as part of its dependencies, even though spark is already available inside the Databricks environment. A potential solution for this is to move pyspark to become an optional dependency, rather than a mandatory dependency.

Any thoughts on this suggestion?

@jaceklaskowski
Copy link

I've noticed it too lately and would much appreciate this change ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants