-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: spark-substrait example #293
base: main
Are you sure you want to change the base?
Conversation
84aac6b
to
3d8df4e
Compare
Signed-off-by: MBWhite <[email protected]>
3d8df4e
to
d7ea257
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was able to make some time to look at part of this. Left some questions.
@@ -0,0 +1,587 @@ | |||
# Introduction to the Substrait-Spark library | |||
|
|||
The Substrait-Spark library was recently added to the [substrait-java](https://github.com/substrait-io/substrait-java) project; this library allows Substrait plans to convert to and from Spark Plans. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: I wouldn't mention the recently added
bit here as that's going to become out of date eventually.
- Java 17 or greater | ||
- Docker to start a test Spark Cluster | ||
- you could use your own cluster, but would need to adjust file locations defined in [SparkHelper](./app/src/main/java/io/substrait/examples/SparkHelper.java) | ||
- [just task runner](https://github.com/casey/just#installation) optional, but very helpful to run the bash commands |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor:
the [just](https://github.com/casey/just#installation) task runner
Makes it clearer that just is the name of the utility. I definitely read it wrong/fast the first time and thought you meant I just needed a task runner 😅
* This file was generated by the Gradle 'init' task. | ||
* | ||
* This project uses @Incubating APIs which are subject to change. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we including the examples
project in the root settings.gradle.kts
file so that it's part of the standard build? I noticed it wasn't getting picked up by IntelliJ.
@@ -0,0 +1,62 @@ | |||
/* | |||
* This file was generated by the Gradle 'init' task. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor/non-blocking: For consistency it would be nice to use the Kotlin DSL and not the Groovy DSL, as that is what the other builds use.
import org.apache.spark.sql.SparkSession; | ||
import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan; | ||
import java.io.IOException; | ||
import java.nio.file.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: we generally avoid * imports in this repo
@@ -0,0 +1,249 @@ | |||
#!/bin/sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this submodule need it's own Gradle wrapper, or can it share the top-level one?
This adds a new Substrait-Spark example to the repo
Note I've not added running up the Spark engine to test - as I don't to blow any of the github actions time or capacity.