Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to reuse existing spark session #41

Closed
wants to merge 1 commit into from

Conversation

aaronzo
Copy link

@aaronzo aaronzo commented May 29, 2022

Fixes #40

Allows reusing an existing spark session for the backend.

@aaronzo
Copy link
Author

aaronzo commented Mar 31, 2023

@WeichenXu123 please could you review this code and approve the workflow?

@WeichenXu123
Copy link
Collaborator

Sorry for late response!
One question: I think

SparkSession \
            .builder \
            .appName("JoblibSparkBackend") \
            .getOrCreate()

will reuse existing spark session if there's one, otherwise it will create a spark session.

Did you find any case that one spark session is created but joblib-spark shutdown the session and then start a new session ?

@aaronzo
Copy link
Author

aaronzo commented Apr 8, 2023

I have run into situations where:

  • I wanted to name the spark session that joblib used
  • I had spark as an unused variable in my code, and wanted to explicitly show it was used by passing it.
    Both are pretty minor, but having the option to explicitly pass a spark session is nice sometimes, even if the behaviour is unchanged

@aaronzo aaronzo closed this Aug 1, 2023
@aaronzo aaronzo deleted the reuse-spark branch August 1, 2023 23:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ability to reuse existing spark session for ParallelBackend
2 participants