You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are currently experimenting with using sparkdantic on our Spark schema definitions in our pipelines inside Databricks. However, based on our current configuration, we are bound to installing all dependencies inside notebook scopes - rather than installing dependencies on the cluster-level. This means we need to run the !pip install command for each dependency used in the beginning of our notebooks.
We've noticed that the sparkdantic installation is taking a lot of time to install as it also installs pyspark as part of its dependencies, even though spark is already available inside the Databricks environment. A potential solution for this is to move pyspark to become an optional dependency, rather than a mandatory dependency.
Any thoughts on this suggestion?
The text was updated successfully, but these errors were encountered:
Hi 👋
We are currently experimenting with using
sparkdantic
on our Spark schema definitions in our pipelines inside Databricks. However, based on our current configuration, we are bound to installing all dependencies inside notebook scopes - rather than installing dependencies on the cluster-level. This means we need to run the!pip install
command for each dependency used in the beginning of our notebooks.We've noticed that the
sparkdantic
installation is taking a lot of time to install as it also installspyspark
as part of its dependencies, even thoughspark
is already available inside the Databricks environment. A potential solution for this is to movepyspark
to become an optional dependency, rather than a mandatory dependency.Any thoughts on this suggestion?
The text was updated successfully, but these errors were encountered: