Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python Interface for SparkFlightManager #1

Open
tokoko opened this issue Jul 30, 2022 · 1 comment
Open

Python Interface for SparkFlightManager #1

tokoko opened this issue Jul 30, 2022 · 1 comment

Comments

@tokoko
Copy link
Owner

tokoko commented Jul 30, 2022

SparkFlightManager should also have a python interface to enable development of microservices in pyspark. The current SparkFlightManager interface should probably have to be altered as it's tied too much to Java implementation of FlightServer. It will have to be made general enough to be usable in Python FlightServers as well. For example, streamDistributedFlight right now takes FlightProducer.ServerStreamListener as a parameter, but there's of course no such thing in Python implementation.

@tokoko
Copy link
Owner Author

tokoko commented Jul 31, 2022

Yet another problem is that pyspark uses py4j for communication between python and java virtual machines, meaning that in Python, unlike Java/Scala, FlightServer and SparkSession will be running in different processes. The easiest way to reuse FlightManager from Python is with jpype which runs jvm in the same process. For this reason, having a single class serving requests and also calling functions in Scala Spark is not feasible. Will have to refactor SparkFlightManager into more general DistributedFlightManager class that's independent of Spark.

The method that converts DataFrame to a RDD of ArrowRecordBatches and sends them to Internal FlightServers will be a separate utility that can be independently reused from pyspark with py4j or directly called from Scala Spark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant