[Suggestion]: Make `pyspark` an optional dependency #456

FredrikBakken · 2024-08-28T09:04:06Z

Hi 👋

We are currently experimenting with using sparkdantic on our Spark schema definitions in our pipelines inside Databricks. However, based on our current configuration, we are bound to installing all dependencies inside notebook scopes - rather than installing dependencies on the cluster-level. This means we need to run the !pip install command for each dependency used in the beginning of our notebooks.

We've noticed that the sparkdantic installation is taking a lot of time to install as it also installs pyspark as part of its dependencies, even though spark is already available inside the Databricks environment. A potential solution for this is to move pyspark to become an optional dependency, rather than a mandatory dependency.

Any thoughts on this suggestion?

The text was updated successfully, but these errors were encountered:

jaceklaskowski · 2024-08-28T14:49:57Z

I've noticed it too lately and would much appreciate this change ❤️

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Suggestion]: Make `pyspark` an optional dependency #456

[Suggestion]: Make `pyspark` an optional dependency #456

FredrikBakken commented Aug 28, 2024

jaceklaskowski commented Aug 28, 2024

[Suggestion]: Make pyspark an optional dependency #456

[Suggestion]: Make pyspark an optional dependency #456

Comments

FredrikBakken commented Aug 28, 2024

jaceklaskowski commented Aug 28, 2024

[Suggestion]: Make `pyspark` an optional dependency #456

[Suggestion]: Make `pyspark` an optional dependency #456