Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: spark-substrait example #293

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mbwhite
Copy link
Contributor

@mbwhite mbwhite commented Sep 9, 2024

This adds a new Substrait-Spark example to the repo

  • example code on creating and consuming substrait plans
  • details in the example of how the plans are converted
  • place to add further examples
  • add to build process

Note I've not added running up the Spark engine to test - as I don't to blow any of the github actions time or capacity.

@mbwhite mbwhite marked this pull request as ready for review September 16, 2024 08:15
Copy link
Member

@vbarua vbarua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was able to make some time to look at part of this. Left some questions.

@@ -0,0 +1,587 @@
# Introduction to the Substrait-Spark library

The Substrait-Spark library was recently added to the [substrait-java](https://github.com/substrait-io/substrait-java) project; this library allows Substrait plans to convert to and from Spark Plans.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I wouldn't mention the recently added bit here as that's going to become out of date eventually.

- Java 17 or greater
- Docker to start a test Spark Cluster
- you could use your own cluster, but would need to adjust file locations defined in [SparkHelper](./app/src/main/java/io/substrait/examples/SparkHelper.java)
- [just task runner](https://github.com/casey/just#installation) optional, but very helpful to run the bash commands
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor:

the [just](https://github.com/casey/just#installation) task runner

Makes it clearer that just is the name of the utility. I definitely read it wrong/fast the first time and thought you meant I just needed a task runner 😅

* This file was generated by the Gradle 'init' task.
*
* This project uses @Incubating APIs which are subject to change.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we including the examples project in the root settings.gradle.kts file so that it's part of the standard build? I noticed it wasn't getting picked up by IntelliJ.

@@ -0,0 +1,62 @@
/*
* This file was generated by the Gradle 'init' task.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor/non-blocking: For consistency it would be nice to use the Kotlin DSL and not the Groovy DSL, as that is what the other builds use.

import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan;
import java.io.IOException;
import java.nio.file.*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: we generally avoid * imports in this repo

@@ -0,0 +1,249 @@
#!/bin/sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this submodule need it's own Gradle wrapper, or can it share the top-level one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants