Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KYUUBI #6577] Add spark sql engine plugin module and define a sql stringify plugin #6578

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

wForget
Copy link
Member

@wForget wForget commented Aug 2, 2024

🔍 Description

Issue References 🔗

This pull request fixes #6577

Describe Your Solution 🔧

Define a PlanOnlyExecutor interface to extend plan only mode of spark engine using SPI

Types of changes 🔖

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Test Plan 🧪

Behavior Without This Pull Request ⚰️

Behavior With This Pull Request 🎉

Related Unit Tests


Checklist 📝

Be nice. Be informative.

@wForget
Copy link
Member Author

wForget commented Aug 2, 2024

cc @pan3793 @iodone @yaooqinn Could you take a quick look at this? If this is feasible, I will try to replace the current lineage plan only mode implementation.

@wForget wForget self-assigned this Aug 2, 2024
@codecov-commenter
Copy link

codecov-commenter commented Aug 2, 2024

Codecov Report

Attention: Patch coverage is 0% with 13 lines in your changes missing coverage. Please review.

Project coverage is 0.00%. Comparing base (c94f0d7) to head (5f5494e).
Report is 60 commits behind head on master.

Files with missing lines Patch % Lines
...ubi/engine/spark/operation/PlanOnlyStatement.scala 0.00% 7 Missing ⚠️
...spark/operation/planonly/SQLStringifyPlugins.scala 0.00% 6 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master   #6578    +/-   ##
=======================================
  Coverage    0.00%   0.00%            
=======================================
  Files         677     684     +7     
  Lines       41806   42222   +416     
  Branches     5711    5758    +47     
=======================================
- Misses      41806   42222   +416     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan;

public interface PlanOnlyExecutor {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a curious question, what functions can be used to extend this SPI?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope to only expand the functionality of explaining the execution plan.


String mode();

default String execute(SparkPlans plans) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this called?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in PlanOnlyStatement

LogicalPlan optimizedPlan();
SparkPlan sparkPlan();
SparkPlan executedPlan();
String mode();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move the mode to execute?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will use SPI to load plugins, so we need to define a qualified name for the plugin to distinguish them.


public interface SparkPlans {
public interface PlanOnlyExecutor {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about SQLStringifyPlugin? Since output does not have to be a plan


public interface SparkPlans {
public interface PlanOnlyExecutor {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a developer guide?


String execute(SparkSession spark, String statement);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe toString

@@ -28,6 +28,7 @@ import org.apache.spark.sql.types.StructType
import org.apache.kyuubi.KyuubiSQLException
import org.apache.kyuubi.config.KyuubiConf.{LINEAGE_PARSER_PLUGIN_PROVIDER, OPERATION_PLAN_ONLY_EXCLUDES, OPERATION_PLAN_ONLY_OUT_STYLE}
import org.apache.kyuubi.engine.spark.KyuubiSparkUtil.getSessionConf
import org.apache.kyuubi.engine.spark.operation.planonly.SQLStringifyPlugins
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the plugin is actually a must for spark sql engine? why do we need to separate it from the engine module?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the plugin is actually a must for spark sql engine? why do we need to separate it from the engine module?

Because I don't want to introduce kyuubi engine dependency in kyuubi spark extensions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have to bundle this when packaging

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the kyuubi server plugin also works this way

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plugin will be lightweight enough to only retain some interface definitions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense

Copy link
Member

@yaooqinn yaooqinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except for #6578 (comment)

@wForget
Copy link
Member Author

wForget commented Aug 16, 2024

@yaooqinn Thanks for your review, I will add developer guide later.

@wForget wForget changed the title [KYUUBI #6577] Define a PlanOnlyExecutor interface to extend plan only mode of spark engine using SPI [KYUUBI #6577] Add spark sql engine plugin module and define a sql stringify plugin Aug 16, 2024
@wForget wForget marked this pull request as ready for review August 16, 2024 12:50
@wForget wForget added this to the v1.10.0 milestone Sep 6, 2024
@bowenliang123 bowenliang123 modified the milestones: v1.10.0, v1.11.0 Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Define a PlanOnlyExecutor interface to extend plan only mode of spark engine using SPI
5 participants