-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Spark 4.0.0 Build Profile and Other Supporting Changes [databricks] #10994
Add Spark 4.0.0 Build Profile and Other Supporting Changes [databricks] #10994
Conversation
b7f175f
to
5d2b867
Compare
pre-commit task failing for Spark400 as the private repo changes haven't been merged. |
@gerashegalov do you have any more questions on this PR? |
The 400 Pr check is failing. We need to wait until spark-rapids-private artifact is published https://github.com/NVIDIA/spark-rapids/actions/runs/9519648358/job/26243644463?pr=10994 |
I was wondering if this could be checked in before #11066. It doesn't need to wait, does it? |
@razajafri @mythrocks there are two pending issues with this PR indicated by the failing check ❌ mvn[compile,RAT,scalastyle,docgen] / package-tests-scala213 (400, true) (pull_request)
It currently fails with
Update Lines 871 to 873 in 4b44903
We need to exclude 400 from from PR checks based on JDK8,11 |
Depends on #11092 |
There is a bytecode incompatibility which is why we are skipping these until we add support for it. For details please see the following two issues NVIDIA#11174 NVIDIA#10203
build |
scala2.13/pom.xml
Outdated
<executions> | ||
<execution> | ||
<id>enforce-maven</id> | ||
<id>default-cli</id> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing execution id here caused the buildver fail to get correct default builder
var (330) from maven life cycle if not specify buildver=
in mvn run
Unexpected buildver value 311 for a Scala 2.13 build, only Apache Spark versions 3.3.0 (330) and higher are supported, no vendor builds such as 330db
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I don't think the enforcer was actually working before changing the id. We need this value from command line to run the enforcer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK then you are changing the expected behavior of running mvn command with default buildver.
This means we cannot simply mvn verify
w/o specify buildver in scala213. If this is expected then we must doc this for all occurences.
see below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran below in scala2.13 folder,
mvn help:evaluate -Dexpression=buildver -q -DforceStdout
only showed 311 in this PR,
for latest upstream, it could print out correct 330.
please make sure it could print out the correct builder for default profile locally
# try check
mvn help:active-profiles
upstream:
Active Profiles for Project 'com.nvidia:rapids-4-spark-parent_2.13:pom:24.08.0-SNAPSHOT':
The following profiles are active:
- release330 (source: com.nvidia:rapids-4-spark-parent_2.13:24.08.0-SNAPSHOT)
vs
this PR:
Active Profiles for Project 'com.nvidia:rapids-4-spark-parent_2.13:pom:24.08.0-SNAPSHOT':
The following profiles are active:
- enforce-cloudera-jdk-version (source: com.nvidia:rapids-4-spark-parent_2.13:24.08.0-SNAPSHOT)
Maven is prioritizing it over the default release activation (comment out this profile could result correct builder). I think you could just add the enforcer to cdh profiles directly if it's just a JDK version check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@razajafri @gerashegalov Please note I've merge PR dropping 31x shims: #11159
In this PR there're some files' change which are conflicted[modified or deleted in MR11159], can you help sync those files with the TOT source on the upstream branch-24.08? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran below in scala2.13 folder,
mvn help:evaluate -Dexpression=buildver -q -DforceStdout
only showed 311 in this PR, for latest upstream, it could print out correct 330.
please make sure it could print out the correct builder for default profile locally
# try check mvn help:active-profiles
upstream:
Active Profiles for Project 'com.nvidia:rapids-4-spark-parent_2.13:pom:24.08.0-SNAPSHOT': The following profiles are active: - release330 (source: com.nvidia:rapids-4-spark-parent_2.13:24.08.0-SNAPSHOT)
vs this PR:
Active Profiles for Project 'com.nvidia:rapids-4-spark-parent_2.13:pom:24.08.0-SNAPSHOT': The following profiles are active: - enforce-cloudera-jdk-version (source: com.nvidia:rapids-4-spark-parent_2.13:24.08.0-SNAPSHOT)
Maven is prioritizing it over the default release activation (comment out this profile could result correct builder). I think you could just add the enforcer to cdh profiles directly if it's just a JDK version check
Looks like when building Scala2.13 + JDK17, it enables enforce-cloudera-jdk-version
with condition <jdk>[17,)</jdk>
Then the 330 is disabled unless we manually add -Dbuildver=330, otherwise buildver falls back to the initial value of 311
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created the enforce-cloudera-jdk-version
to avoid code duplication as maven doesn't allow profiles inheritance. I have updated cloudera profiles now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I see now
mvn help:active-profiles
in the root project dir
Active Profiles for Project 'com.nvidia:rapids-4-spark-parent_2.12:pom:24.08.0-SNAPSHOT':
The following profiles are active:
- release320 (source: com.nvidia:rapids-4-spark-parent_2.12:24.08.0-SNAPSHOT)
This is what I see in scala2.13 folder
Active Profiles for Project 'com.nvidia:rapids-4-spark-parent_2.13:pom:24.08.0-SNAPSHOT':
The following profiles are active:
- release330 (source: com.nvidia:rapids-4-spark-parent_2.13:24.08.0-SNAPSHOT)
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR contains different source code changes. Can you reconcile the title with the PR intent.
Technically, more shimming is needed due to a recent upstream change:
14:47:51,237 [ERROR] [Error] /home/user/gits/NVIDIA/spark-rapids.worktrees/spark400/sql-plugin/src/main/spark332db/scala/org/apache/spark/sql/hive/rapids/shims/GpuRowBasedHiveGenericUDFShim.scala:36: type mismatch;
found : Any
required: () => Any
14:47:51,240 [ERROR] one error found
8d74758
to
e177024
Compare
e177024
to
5b0e36d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, let us make sure that all unresolved issues are captured as GH issues.
needs: cache-dependencies | ||
needs: set-scala213-versions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need an issue how to better handle cache for scala2.13 build dependencies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build |
This change adds the profiles needed to build the plugin for Spark 4.0.0
This PR depends on #10993
fixes #9259