-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read Presto Data into Spark Dataframe #23673
Comments
Can you share more on this? The part I am getting confused on why presto is needed to read the data, rather than just using spark df or spark-sql ? This is not supported currently. Depending on how large data is and where driver is running, you can go with different approaches : If data can not fit on driver :
|
@singcha Thanks for your quick response.
We have many Presto views stored in the Hive Metastore, and there are requirements to build Spark pipelines that read data from these Presto views. Some of the Presto views are large and might not fit in the driver memory.
I think option 2 makes more sense because it avoids the overhead of using an intermediate storage location, and the reads can be done within the same Spark session. We could actually write a new method, I can contribute to this feature. |
Hi Team,
I have requirement for using reading data from presto query and load it into Spark Dataframe and do further processing using it in Spark.
The text was updated successfully, but these errors were encountered: