Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Spark 4.0.0 Build Profile and Other Supporting Changes [databricks] #10994

Merged
merged 54 commits into from
Jul 16, 2024
Merged
Show file tree
Hide file tree
Changes from 49 commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
0c3a1ba
POM changes for Spark 4.0.0
razajafri Jun 5, 2024
1631ab4
validate buildver and scala versions
razajafri Jun 7, 2024
3271bfd
more pom changes
razajafri Jun 7, 2024
5d2b867
fixed the scala-2.12 comment
razajafri Jun 7, 2024
58806bd
more fixes for scala-2.13 pom
razajafri Jun 7, 2024
540c732
addressed comments
razajafri Jun 7, 2024
5faecf8
add in shim check to account for 400
razajafri Jun 14, 2024
5de0fff
add 400 for premerge tests against jdk 17
razajafri Jun 25, 2024
bb784f0
temporarily remove 400 from snapshotScala213
razajafri Jun 25, 2024
140eb7b
Merge remote-tracking branch 'origin/branch-24.08' into HEAD
razajafri Jun 25, 2024
41e1982
fixed 2.13 pom
razajafri Jun 25, 2024
85a2f6d
Remove 400 from jdk17 as it will compile with Scala 2.12
razajafri Jun 25, 2024
f7f5a98
github workflow changes
razajafri Jun 25, 2024
74dd568
added quotes to pom-directory
razajafri Jun 25, 2024
bd1bc70
update version defs to include scala 213 jdk 17
razajafri Jun 25, 2024
46eb751
Merge remote-tracking branch 'origin/branch-24.08' into HEAD
razajafri Jun 27, 2024
2b15ab2
Cross-compile all shims from JDK17 to JDK8
gerashegalov Jun 27, 2024
f7f8edf
dummy
gerashegalov Jun 27, 2024
ac0ecba
undo api pom change
gerashegalov Jun 27, 2024
c644ce8
Add preview1 to the allowed shim versions
gerashegalov Jun 27, 2024
9d182b3
Scala 2.13 to require JDK17
gerashegalov Jun 28, 2024
96e0843
Merge pull request #3 from gerashegalov/spark400crosscompile
razajafri Jun 28, 2024
b51c08b
Removed unused import left over from https://github.com/razajafri/spa…
razajafri Jun 28, 2024
1b9beb5
Setup JAVA_HOME before caching
razajafri Jun 30, 2024
a173f35
Only upgrade the Scala plugin for Scala 2.13
razajafri Jul 1, 2024
6138cc8
Regenerate Scala 2.13 poms
razajafri Jul 1, 2024
1faabd4
Remove 330 from JDK17 builds for Scala 2.12
razajafri Jul 1, 2024
0cf0036
Revert "Remove 330 from JDK17 builds for Scala 2.12"
razajafri Jul 1, 2024
a7b42c6
Downgrade scala.plugin.version for cloudera
razajafri Jul 1, 2024
8d0f8ca
Updated comment to include the issue
razajafri Jul 1, 2024
45b0d57
Upgrading the scala.maven.plugin version to 4.9.1 which is the same a…
razajafri Jul 3, 2024
eb09f98
Downgrade scala-maven-plugin for Cloudera
razajafri Jul 3, 2024
4407dbf
revert mvn verify changes
razajafri Jul 3, 2024
0e4e45a
Avoid cache for JDK 17
razajafri Jul 3, 2024
bd10267
Handle the change for UnaryPositive now extending RuntimeReplaceable
razajafri Jul 3, 2024
2982a59
Removing 330 from jdk17.buildvers as we only support Scala2.13 and fi…
razajafri Jul 3, 2024
319aefc
Update Scala 2.13 poms
razajafri Jul 3, 2024
835fbb4
Merge remote-tracking branch 'origin/branch-24.08' into HEAD
razajafri Jul 5, 2024
dfbb149
fixed scala2.13 verify to actually use the scala2.13/pom.xml
razajafri Jul 8, 2024
8ffc4f1
Added missing csv files
razajafri Jul 8, 2024
f599413
Merge remote-tracking branch 'origin/branch-24.08' into HEAD
razajafri Jul 9, 2024
0544c6f
Skip Opcode tests
razajafri Jul 12, 2024
a43a68d
Merge remote-tracking branch 'origin/branch-24.08' into HEAD
razajafri Jul 12, 2024
2cf7351
upmerged and fixed the new compile error introduced
razajafri Jul 12, 2024
cabcda0
addressed review comments
razajafri Jul 12, 2024
c19fccf
Merge remote-tracking branch 'origin/branch-24.08' into HEAD
razajafri Jul 12, 2024
feeabfc
Removed jdk17 cloudera check and moved it inside the 321,330 and 332 …
razajafri Jul 12, 2024
54a3ee4
fixed upmerge conflicts
razajafri Jul 12, 2024
48fe626
reverted renaming of id
razajafri Jul 12, 2024
166e4c6
Merge remote-tracking branch 'origin/branch-24.08' into SP-9259-POM-c…
razajafri Jul 15, 2024
6cbce78
Fixed HiveGenericUDFShim
razajafri Jul 15, 2024
8d0351b
addressed review comments
razajafri Jul 15, 2024
5b0e36d
reverted the debugging code
razajafri Jul 15, 2024
66419b8
generated Scala 2.13 poms
razajafri Jul 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 90 additions & 36 deletions .github/workflows/mvn-verify-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,9 @@ jobs:
runs-on: ubuntu-latest
outputs:
dailyCacheKey: ${{ steps.generateCacheKey.outputs.dailyCacheKey }}
defaultSparkVersion: ${{ steps.allShimVersionsStep.outputs.defaultSparkVersion }}
sparkTailVersions: ${{ steps.allShimVersionsStep.outputs.tailVersions }}
sparkJDKVersions: ${{ steps.allShimVersionsStep.outputs.jdkVersions }}
scala213Versions: ${{ steps.allShimVersionsStep.outputs.scala213Versions }}
defaultSparkVersion: ${{ steps.all212ShimVersionsStep.outputs.defaultSparkVersion }}
sparkTailVersions: ${{ steps.all212ShimVersionsStep.outputs.tailVersions }}
sparkJDKVersions: ${{ steps.all212ShimVersionsStep.outputs.jdkVersions }}
steps:
- uses: actions/checkout@v4 # refs/pull/:prNumber/merge
- uses: actions/setup-java@v4
Expand All @@ -69,7 +68,7 @@ jobs:
set -x
max_retry=3; delay=30; i=1
while true; do
for pom in pom.xml scala2.13/pom.xml
for pom in pom.xml
do
mvn ${{ env.COMMON_MVN_FLAGS }} --file $pom help:evaluate -pl dist \
-Dexpression=included_buildvers \
Expand All @@ -89,7 +88,7 @@ jobs:
}
done
- name: all shim versions
id: allShimVersionsStep
id: all212ShimVersionsStep
run: |
set -x
. jenkins/version-def.sh
Expand All @@ -113,30 +112,12 @@ jobs:
jdkHeadVersionArrBody=$(printf ",{\"spark-version\":\"%s\",\"java-version\":8}" "${SPARK_BASE_SHIM_VERSION}")
# jdk11
jdk11VersionArrBody=$(printf ",{\"spark-version\":\"%s\",\"java-version\":11}" "${SPARK_SHIM_VERSIONS_JDK11[@]}")
# jdk17
jdk17VersionArrBody=$(printf ",{\"spark-version\":\"%s\",\"java-version\":17}" "${SPARK_SHIM_VERSIONS_JDK17[@]}")
# jdk
jdkVersionArrBody=$jdkHeadVersionArrBody$jdk11VersionArrBody$jdk17VersionArrBody
jdkVersionArrBody=$jdkHeadVersionArrBody$jdk11VersionArrBody
jdkVersionArrBody=${jdkVersionArrBody:1}
jdkVersionJsonStr=$(printf {\"include\":[%s]} $jdkVersionArrBody)
echo "jdkVersions=$jdkVersionJsonStr" >> $GITHUB_OUTPUT

SCALA_BINARY_VER=2.13
. jenkins/version-def.sh
svArrBodyNoSnapshot=$(printf ",{\"spark-version\":\"%s\",\"isSnapshot\":false}" "${SPARK_SHIM_VERSIONS_NOSNAPSHOTS[@]}")
svArrBodyNoSnapshot=${svArrBodyNoSnapshot:1}
# get private artifact version
privateVer=$(mvn help:evaluate -q -pl dist -Dexpression=spark-rapids-private.version -DforceStdout)
# do not add empty snapshot versions or when private version is released one (does not include snapshot shims)
if [[ ${#SPARK_SHIM_VERSIONS_SNAPSHOTS_ONLY[@]} -gt 0 && $privateVer == *"-SNAPSHOT" ]]; then
svArrBodySnapshot=$(printf ",{\"spark-version\":\"%s\",\"isSnapshot\":true}" "${SPARK_SHIM_VERSIONS_SNAPSHOTS_ONLY[@]}")
svArrBodySnapshot=${svArrBodySnapshot:1}
svJsonStr=$(printf {\"include\":[%s]} $svArrBodyNoSnapshot,$svArrBodySnapshot)
else
svJsonStr=$(printf {\"include\":[%s]} $svArrBodyNoSnapshot)
fi

echo "scala213Versions=$svJsonStr" >> $GITHUB_OUTPUT

package-tests:
needs: cache-dependencies
Expand Down Expand Up @@ -187,27 +168,51 @@ jobs:
}
done

set-scala213-versions:
runs-on: ubuntu-latest
outputs:
scala213Versions: ${{ steps.all213ShimVersionsStep.outputs.scala213Versions }}
sparkJDK17Versions: ${{ steps.all213ShimVersionsStep.outputs.jdkVersions }}
steps:
- uses: actions/checkout@v4 # refs/pull/:prNumber/merge

- id: all213ShimVersionsStep
run: |
set -x
SCALA_BINARY_VER=2.13
. jenkins/version-def.sh
svArrBodyNoSnapshot=$(printf ",{\"spark-version\":\"%s\",\"isSnapshot\":false}" "${SPARK_SHIM_VERSIONS_NOSNAPSHOTS[@]}")
svArrBodyNoSnapshot=${svArrBodyNoSnapshot:1}
# get private artifact version
privateVer=$(mvn help:evaluate -q -pl dist -Dexpression=spark-rapids-private.version -DforceStdout)
svJsonStr=$(printf {\"include\":[%s]} $svArrBodyNoSnapshot)

echo "scala213Versions=$svJsonStr" >> $GITHUB_OUTPUT

# jdk17
jdk17VersionArrBody=$(printf ",{\"spark-version\":\"%s\",\"java-version\":17}" "${SPARK_SHIM_VERSIONS_JDK17_SCALA213[@]}")

jdkVersionArrBody=$jdk17VersionArrBody
jdkVersionArrBody=${jdkVersionArrBody:1}
jdkVersionJsonStr=$(printf {\"include\":[%s]} $jdkVersionArrBody)
echo "jdkVersions=$jdkVersionJsonStr" >> $GITHUB_OUTPUT

package-tests-scala213:
needs: cache-dependencies
needs: set-scala213-versions
Comment on lines -191 to +201
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need an issue how to better handle cache for scala2.13 build dependencies

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continue-on-error: ${{ matrix.isSnapshot }}
strategy:
matrix: ${{ fromJSON(needs.cache-dependencies.outputs.scala213Versions) }}
matrix: ${{ fromJSON(needs.set-scala213-versions.outputs.scala213Versions) }}
fail-fast: false
runs-on: ubuntu-latest
steps:

- uses: actions/checkout@v4 # refs/pull/:prNumber/merge

- name: Setup Java and Maven Env
uses: actions/setup-java@v4
with:
distribution: adopt
java-version: 8

- name: Cache local Maven repository
uses: actions/cache@v4
with:
path: ~/.m2
key: ${{ needs.cache-dependencies.outputs.dailyCacheKey }}
java-version: 17

- name: check runtime before tests
run: |
Expand All @@ -218,7 +223,7 @@ jobs:
run: |
# https://github.com/NVIDIA/spark-rapids/issues/8847
# specify expected versions
export JAVA_HOME=${JAVA_HOME_8_X64}
export JAVA_HOME=${JAVA_HOME_17_X64}
export PATH=${JAVA_HOME}/bin:${PATH}
java -version && mvn --version && echo "ENV JAVA_HOME: $JAVA_HOME, PATH: $PATH"
# verify Scala 2.13 build files
Expand Down Expand Up @@ -246,8 +251,57 @@ jobs:
}
done

verify-213-modules:
needs: set-scala213-versions
runs-on: ubuntu-latest
strategy:
matrix: ${{ fromJSON(needs.set-scala213-versions.outputs.sparkJDK17Versions) }}
steps:
- uses: actions/checkout@v4 # refs/pull/:prNumber/merge

- name: Setup Java and Maven Env
uses: actions/setup-java@v4
with:
distribution: adopt
java-version: 17

- name: check runtime before tests
run: |
env | grep JAVA
java -version && mvn --version && echo "ENV JAVA_HOME: $JAVA_HOME, PATH: $PATH"

- name: Build JDK
run: |
# https://github.com/NVIDIA/spark-rapids/issues/8847
# specify expected versions
export JAVA_HOME=${JAVA_HOME_${{ matrix.java-version }}_X64}
export PATH=${JAVA_HOME}/bin:${PATH}
java -version && mvn --version && echo "ENV JAVA_HOME: $JAVA_HOME, PATH: $PATH"
# verify Scala 2.13 build files
./build/make-scala-version-build-files.sh 2.13
# verify git status
if [ -n "$(echo -n $(git status -s | grep 'scala2.13'))" ]; then
git add -N scala2.13/* && git diff 'scala2.13/*'
echo "Generated Scala 2.13 build files don't match what's in repository"
exit 1
fi
razajafri marked this conversation as resolved.
Show resolved Hide resolved
# change to Scala 2.13 Directory
cd scala2.13
# test command, will retry for 3 times if failed.
max_retry=3; delay=30; i=1
while true; do
mvn verify \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-P "individual,pre-merge" -Dbuildver=${{ matrix.spark-version }} \
${{ env.COMMON_MVN_FLAGS }} && break || {
if [[ $i -le $max_retry ]]; then
echo "mvn command failed. Retry $i/$max_retry."; ((i++)); sleep $delay; ((delay=delay*2))
else
echo "mvn command failed. Exit 1"; exit 1
fi
}
done

verify-all-modules:
verify-all-212-modules:
needs: cache-dependencies
runs-on: ubuntu-latest
strategy:
Expand Down
19 changes: 19 additions & 0 deletions aggregator/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -715,5 +715,24 @@
</dependency>
</dependencies>
</profile>
<!-- #if scala-2.13 --><!--
<profile>
<id>release400</id>
<activation>
<property>
<name>buildver</name>
<value>400</value>
</property>
</activation>
<dependencies>
<dependency>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-delta-stub_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<classifier>${spark.version.classifier}</classifier>
</dependency>
</dependencies>
</profile>
--><!-- #endif scala-2.13 -->
</profiles>
</project>
3 changes: 1 addition & 2 deletions build/buildall
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,6 @@ if [[ "$DIST_PROFILE" == *Scala213 ]]; then
SCALA213=1
fi


# include options to mvn command
export MVN="mvn -Dmaven.wagon.http.retryHandler.count=3 ${MVN_OPT}"

Expand Down Expand Up @@ -196,7 +195,7 @@ case $DIST_PROFILE in
SPARK_SHIM_VERSIONS=($(versionsFromDistProfile "minimumFeatureVersionMix"))
;;

3*)
[34]*)
<<< $DIST_PROFILE IFS="," read -ra SPARK_SHIM_VERSIONS
INCLUDED_BUILDVERS_OPT="-Dincluded_buildvers=$DIST_PROFILE"
unset DIST_PROFILE
Expand Down
4 changes: 3 additions & 1 deletion build/shimplify.py
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,7 @@ def __csv_as_arr(str_val):
__dirs_to_derive_shims = sorted(__csv_ant_prop_as_arr('shimplify.dirs'))

__all_shims_arr = sorted(__csv_ant_prop_as_arr('all.buildvers'))
__allScala213_shims_arr = sorted(__csv_ant_prop_as_arr('allScala213.buildvers'))

__log = logging.getLogger('shimplify')
__log.setLevel(logging.DEBUG if __should_trace else logging.INFO)
Expand Down Expand Up @@ -372,7 +373,8 @@ def __generate_symlinks():

def __map_version_array(shim_json_string):
shim_ver = str(json.loads(shim_json_string).get('spark'))
assert shim_ver in __all_shims_arr, "all.buildvers in pom.xml does not contain %s" % shim_ver
assert shim_ver in __all_shims_arr or shim_ver in __allScala213_shims_arr, "all.buildvers or " \
"allScala213.buildvers in pom.xml does not contain %s" % shim_ver
return shim_ver

def __traverse_source_tree_of_all_shims(src_type, func):
Expand Down
8 changes: 8 additions & 0 deletions dist/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,14 @@
</included_buildvers>
</properties>
</profile>
<profile>
<id>jdk17-scala213-test</id>
<properties>
<included_buildvers>
${jdk17.scala213.buildvers}
</included_buildvers>
</properties>
</profile>
<profile>
<id>jdk17-test</id>
<properties>
Expand Down
10 changes: 5 additions & 5 deletions dist/scripts/binary-dedupe.sh
Original file line number Diff line number Diff line change
Expand Up @@ -168,12 +168,12 @@ function verify_same_sha_for_unshimmed() {
# TODO currently RapidsShuffleManager is "removed" from /spark* by construction in
# dist pom.xml via ant. We could delegate this logic to this script
# and make both simmpler
if [[ ! "$class_file_quoted" =~ (com/nvidia/spark/rapids/spark[34].*/.*ShuffleManager.class|org/apache/spark/sql/rapids/shims/spark[34].*/ProxyRapidsShuffleInternalManager.class) ]]; then
if [[ ! "$class_file_quoted" =~ com/nvidia/spark/rapids/spark[34].*/.*ShuffleManager.class ]]; then

if ! grep -q "/spark.\+/$class_file_quoted" "$SPARK_SHARED_TXT"; then
echo >&2 "$class_file is not bitwise-identical across shims"
exit 255
fi
if ! grep -q "/spark.\+/$class_file_quoted" "$SPARK_SHARED_TXT"; then
echo >&2 "$class_file is not bitwise-identical across shims"
exit 255
fi
fi
}

Expand Down
42 changes: 35 additions & 7 deletions jdk-profiles/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -31,17 +31,45 @@
<version>24.08.0-SNAPSHOT</version>
<profiles>
<profile>
<id>jdk9plus</id>
<properties>
<scala.plugin.version>4.6.1</scala.plugin.version>
<maven.compiler.source>${java.specification.version}</maven.compiler.source>
<maven.compiler.release>${maven.compiler.source}</maven.compiler.release>
<maven.compiler.target>${maven.compiler.source}</maven.compiler.target>
</properties>
<id>jdk8</id>
<activation>
<jdk>8</jdk>
</activation>
<build>
<pluginManagement>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>${scala.plugin.version}</version>
<configuration>
<target>${java.major.version}</target>
</configuration>
</plugin>
</plugins>
</pluginManagement>
</build>
</profile>
<profile>
<id>jdk9plus</id>
<activation>
<!-- activate for all java versions after 9 -->
<jdk>[9,)</jdk>
</activation>
<build>
<pluginManagement>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>${scala.plugin.version}</version>
<configuration>
<release>${java.major.version}</release>
</configuration>
</plugin>
</plugins>
</pluginManagement>
</build>
</profile>
</profiles>
</project>
3 changes: 3 additions & 0 deletions jenkins/version-def.sh
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,9 @@ SPARK_SHIM_VERSIONS_JDK11=("${SPARK_SHIM_VERSIONS_ARR[@]}")
# jdk17 cases
set_env_var_SPARK_SHIM_VERSIONS_ARR -Pjdk17-test
SPARK_SHIM_VERSIONS_JDK17=("${SPARK_SHIM_VERSIONS_ARR[@]}")
# jdk17 scala213 cases
set_env_var_SPARK_SHIM_VERSIONS_ARR -Pjdk17-scala213-test
SPARK_SHIM_VERSIONS_JDK17_SCALA213=("${SPARK_SHIM_VERSIONS_ARR[@]}")
# databricks shims
set_env_var_SPARK_SHIM_VERSIONS_ARR -Pdatabricks
SPARK_SHIM_VERSIONS_DATABRICKS=("${SPARK_SHIM_VERSIONS_ARR[@]}")
Expand Down
Loading