Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Spark 4.0.0 Build Profile and Other Supporting Changes [databricks] #10994

Merged
merged 54 commits into from
Jul 16, 2024

Conversation

razajafri
Copy link
Collaborator

@razajafri razajafri commented Jun 6, 2024

This change adds the profiles needed to build the plugin for Spark 4.0.0

This PR depends on #10993

fixes #9259

@razajafri
Copy link
Collaborator Author

This is in draft until #10993 #10992 are merged

@razajafri
Copy link
Collaborator Author

pre-commit task failing for Spark400 as the private repo changes haven't been merged.

build/buildall Outdated Show resolved Hide resolved
build/buildall Outdated Show resolved Hide resolved
pom.xml Outdated Show resolved Hide resolved
scala2.13/pom.xml Outdated Show resolved Hide resolved
@razajafri razajafri marked this pull request as ready for review June 14, 2024 17:05
@razajafri
Copy link
Collaborator Author

@gerashegalov do you have any more questions on this PR?

@gerashegalov
Copy link
Collaborator

The 400 Pr check is failing. We need to wait until spark-rapids-private artifact is published https://github.com/NVIDIA/spark-rapids/actions/runs/9519648358/job/26243644463?pr=10994

@mythrocks
Copy link
Collaborator

I was wondering if this could be checked in before #11066. It doesn't need to wait, does it?

@gerashegalov
Copy link
Collaborator

@razajafri @mythrocks there are two pending issues with this PR indicated by the failing check

❌ mvn[compile,RAT,scalastyle,docgen] / package-tests-scala213 (400, true) (pull_request)

  1. buildver=400 cannot be built due to the corresponding spark-rapids-private dependency being missing
  2. buildver=400 check should also be moved under JDK17

It currently fails with

Error:  Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.5.0:enforce (enforce-java-version) on project rapids-4-spark-parent_2.13: 
Error:  Rule 0: org.apache.maven.enforcer.rules.version.RequireJavaVersion failed with message:
Error:  Support for Spark 4.0.0 is only available with Java 17+

Update

spark-rapids/pom.xml

Lines 871 to 873 in 4b44903

<jdk17.buildvers>
330
</jdk17.buildvers>

We need to exclude 400 from from PR checks based on JDK8,11

@razajafri
Copy link
Collaborator Author

Depends on #11092

There is a bytecode incompatibility which is why we are skipping these
until we add support for it. For details please see the following two
issues
NVIDIA#11174
NVIDIA#10203
@razajafri
Copy link
Collaborator Author

build

<executions>
<execution>
<id>enforce-maven</id>
<id>default-cli</id>
Copy link
Collaborator

@pxLi pxLi Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing execution id here caused the buildver fail to get correct default builder var (330) from maven life cycle if not specify buildver= in mvn run

Unexpected buildver value 311 for a Scala 2.13 build, only Apache Spark versions 3.3.0 (330) and higher are supported, no vendor builds such as 330db

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I don't think the enforcer was actually working before changing the id. We need this value from command line to run the enforcer

Copy link
Collaborator

@pxLi pxLi Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK then you are changing the expected behavior of running mvn command with default buildver.

This means we cannot simply mvn verify w/o specify buildver in scala213. If this is expected then we must doc this for all occurences.

see below

Copy link
Collaborator

@pxLi pxLi Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran below in scala2.13 folder,

mvn help:evaluate -Dexpression=buildver  -q -DforceStdout

only showed 311 in this PR,
for latest upstream, it could print out correct 330.

please make sure it could print out the correct builder for default profile locally

# try check
mvn help:active-profiles

upstream:

Active Profiles for Project 'com.nvidia:rapids-4-spark-parent_2.13:pom:24.08.0-SNAPSHOT':

The following profiles are active:

 - release330 (source: com.nvidia:rapids-4-spark-parent_2.13:24.08.0-SNAPSHOT)

vs
this PR:

Active Profiles for Project 'com.nvidia:rapids-4-spark-parent_2.13:pom:24.08.0-SNAPSHOT':

The following profiles are active:

 - enforce-cloudera-jdk-version (source: com.nvidia:rapids-4-spark-parent_2.13:24.08.0-SNAPSHOT)

Maven is prioritizing it over the default release activation (comment out this profile could result correct builder). I think you could just add the enforcer to cdh profiles directly if it's just a JDK version check

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@razajafri @gerashegalov Please note I've merge PR dropping 31x shims: #11159

In this PR there're some files' change which are conflicted[modified or deleted in MR11159], can you help sync those files with the TOT source on the upstream branch-24.08? Thanks!

Copy link
Collaborator

@NvTimLiu NvTimLiu Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran below in scala2.13 folder,

mvn help:evaluate -Dexpression=buildver  -q -DforceStdout

only showed 311 in this PR, for latest upstream, it could print out correct 330.

please make sure it could print out the correct builder for default profile locally

# try check
mvn help:active-profiles

upstream:

Active Profiles for Project 'com.nvidia:rapids-4-spark-parent_2.13:pom:24.08.0-SNAPSHOT':

The following profiles are active:

 - release330 (source: com.nvidia:rapids-4-spark-parent_2.13:24.08.0-SNAPSHOT)

vs this PR:

Active Profiles for Project 'com.nvidia:rapids-4-spark-parent_2.13:pom:24.08.0-SNAPSHOT':

The following profiles are active:

 - enforce-cloudera-jdk-version (source: com.nvidia:rapids-4-spark-parent_2.13:24.08.0-SNAPSHOT)

Maven is prioritizing it over the default release activation (comment out this profile could result correct builder). I think you could just add the enforcer to cdh profiles directly if it's just a JDK version check

Looks like when building Scala2.13 + JDK17, it enables enforce-cloudera-jdk-version with condition <jdk>[17,)</jdk>

Then the 330 is disabled unless we manually add -Dbuildver=330, otherwise buildver falls back to the initial value of 311

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created the enforce-cloudera-jdk-version to avoid code duplication as maven doesn't allow profiles inheritance. I have updated cloudera profiles now

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I see now

mvn help:active-profiles

in the root project dir

Active Profiles for Project 'com.nvidia:rapids-4-spark-parent_2.12:pom:24.08.0-SNAPSHOT':

The following profiles are active:

 - release320 (source: com.nvidia:rapids-4-spark-parent_2.12:24.08.0-SNAPSHOT)

This is what I see in scala2.13 folder

Active Profiles for Project 'com.nvidia:rapids-4-spark-parent_2.13:pom:24.08.0-SNAPSHOT':

The following profiles are active:

 - release330 (source: com.nvidia:rapids-4-spark-parent_2.13:24.08.0-SNAPSHOT)

@razajafri
Copy link
Collaborator Author

build

Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR contains different source code changes. Can you reconcile the title with the PR intent.

Technically, more shimming is needed due to a recent upstream change:

14:47:51,237 [ERROR] [Error] /home/user/gits/NVIDIA/spark-rapids.worktrees/spark400/sql-plugin/src/main/spark332db/scala/org/apache/spark/sql/hive/rapids/shims/GpuRowBasedHiveGenericUDFShim.scala:36: type mismatch;
 found   : Any
 required: () => Any
14:47:51,240 [ERROR] one error found

pom.xml Outdated Show resolved Hide resolved
@razajafri razajafri changed the title POM Changes for Spark 4.0.0 [databricks] Add Spark 4.0.0 Build Profile and Other Supporting Changes [databricks] Jul 15, 2024
Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let us make sure that all unresolved issues are captured as GH issues.

Comment on lines -191 to +201
needs: cache-dependencies
needs: set-scala213-versions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need an issue how to better handle cache for scala2.13 build dependencies

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gerashegalov
Copy link
Collaborator

build

@razajafri razajafri merged commit a41a616 into NVIDIA:branch-24.08 Jul 16, 2024
43 checks passed
@razajafri razajafri deleted the SP-9259-POM-changes branch July 16, 2024 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Spark 4.0+ Spark 4.0+ issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Create Spark 4.0.0 shim and build env
7 participants