Skip to content

Commit

Permalink
Merge pull request #341 from SurajAralihalli/branch-23.12.1-main
Browse files Browse the repository at this point in the history
[Doc] Update readme, ipynb files for 23.12.1 version [skip ci]
  • Loading branch information
SurajAralihalli authored Jan 8, 2024
2 parents 4d45eb0 + 3f04b4a commit 9e97b9c
Show file tree
Hide file tree
Showing 20 changed files with 28 additions and 28 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Navigate to your home directory in the UI and select **Create** > **File** from
create an `init.sh` scripts with contents:
```bash
#!/bin/bash
sudo wget -O /databricks/jars/rapids-4-spark_2.12-23.12.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.0/rapids-4-spark_2.12-23.12.0.jar
sudo wget -O /databricks/jars/rapids-4-spark_2.12-23.12.1.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.1/rapids-4-spark_2.12-23.12.1.jar
```
1. Select the Databricks Runtime Version from one of the supported runtimes specified in the
Prerequisites section.
Expand Down Expand Up @@ -68,7 +68,7 @@ create an `init.sh` scripts with contents:
```bash
spark.rapids.sql.python.gpu.enabled true
spark.python.daemon.module rapids.daemon_databricks
spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.12-23.12.0.jar:/databricks/spark/python
spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.12-23.12.1.jar:/databricks/spark/python
```
Note that since python memory pool require installing the cudf library, so you need to install cudf library in
each worker nodes `pip install cudf-cu11 --extra-index-url=https://pypi.nvidia.com` or disable python memory pool
Expand Down
2 changes: 1 addition & 1 deletion docs/get-started/xgboost-examples/csp/databricks/init.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
sudo rm -f /databricks/jars/spark--maven-trees--ml--10.x--xgboost-gpu--ml.dmlc--xgboost4j-gpu_2.12--ml.dmlc__xgboost4j-gpu_2.12__1.5.2.jar
sudo rm -f /databricks/jars/spark--maven-trees--ml--10.x--xgboost-gpu--ml.dmlc--xgboost4j-spark-gpu_2.12--ml.dmlc__xgboost4j-spark-gpu_2.12__1.5.2.jar

sudo wget -O /databricks/jars/rapids-4-spark_2.12-23.12.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.0/rapids-4-spark_2.12-23.12.0.jar
sudo wget -O /databricks/jars/rapids-4-spark_2.12-23.12.1.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.1/rapids-4-spark_2.12-23.12.1.jar
sudo wget -O /databricks/jars/xgboost4j-gpu_2.12-1.7.1.jar https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-gpu_2.12/1.7.1/xgboost4j-gpu_2.12-1.7.1.jar
sudo wget -O /databricks/jars/xgboost4j-spark-gpu_2.12-1.7.1.jar https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-spark-gpu_2.12/1.7.1/xgboost4j-spark-gpu_2.12-1.7.1.jar
ls -ltr
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ For simplicity export the location to these jars. All examples assume the packag
### Download the jars

Download the RAPIDS Accelerator for Apache Spark plugin jar
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.0/rapids-4-spark_2.12-23.12.0.jar)
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.1/rapids-4-spark_2.12-23.12.1.jar)

### Build XGBoost Python Examples

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ For simplicity export the location to these jars. All examples assume the packag
### Download the jars

1. Download the RAPIDS Accelerator for Apache Spark plugin jar
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.0/rapids-4-spark_2.12-23.12.0.jar)
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.1/rapids-4-spark_2.12-23.12.1.jar)

### Build XGBoost Scala Examples

Expand Down
6 changes: 3 additions & 3 deletions examples/ML+DL-Examples/Spark-cuML/pca/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ User can also download the release jar from Maven central:

[rapids-4-spark-ml_2.12-22.02.0-cuda11.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-ml_2.12/22.02.0/rapids-4-spark-ml_2.12-22.02.0-cuda11.jar)

[rapids-4-spark_2.12-23.12.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.0/rapids-4-spark_2.12-23.12.0.jar)
[rapids-4-spark_2.12-23.12.1.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.1/rapids-4-spark_2.12-23.12.1.jar)

Note: This demo could only work with v22.02.0 spark-ml version, and only compatible with spark-rapids versions prior to 23.12.0 . Please do not update the version in release.
Note: This demo could only work with v22.02.0 spark-ml version, and only compatible with spark-rapids versions prior to 23.12.1 . Please do not update the version in release.

## Sample code

Expand Down Expand Up @@ -49,7 +49,7 @@ It is assumed that a Standalone Spark cluster has been set up, the `SPARK_MASTER

``` bash
RAPIDS_ML_JAR=PATH_TO_rapids-4-spark-ml_2.12-22.02.0-cuda11.jar
PLUGIN_JAR=PATH_TO_rapids-4-spark_2.12-23.12.0.jar
PLUGIN_JAR=PATH_TO_rapids-4-spark_2.12-23.12.1.jar
jupyter toree install \
--spark_home=${SPARK_HOME} \
Expand Down
4 changes: 2 additions & 2 deletions examples/ML+DL-Examples/Spark-cuML/pca/spark-submit.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

# Note that the last rapids-4-spark-ml release version is 22.02.0, snapshot version is 23.04.0-SNPASHOT, please do not update the version in release
ML_JAR=/root/.m2/repository/com/nvidia/rapids-4-spark-ml_2.12/22.02.0/rapids-4-spark-ml_2.12-22.02.0.jar
PLUGIN_JAR=/root/.m2/repository/com/nvidia/rapids-4-spark_2.12/23.12.0/rapids-4-spark_2.12-23.12.0.jar
PLUGIN_JAR=/root/.m2/repository/com/nvidia/rapids-4-spark_2.12/23.12.1/rapids-4-spark_2.12-23.12.1.jar
Note: The last rapids-4-spark-ml release version is 22.02.0, snapshot version is 23.04.0-SNPASHOT.

$SPARK_HOME/bin/spark-submit \
Expand All @@ -40,4 +40,4 @@ $SPARK_HOME/bin/spark-submit \
--conf spark.network.timeout=1000s \
--jars $ML_JAR,$PLUGIN_JAR \
--class com.nvidia.spark.examples.pca.Main \
/workspace/target/PCAExample-23.12.0-SNAPSHOT.jar
/workspace/target/PCAExample-23.12.1-SNAPSHOT.jar
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
"import os\n",
"# Change to your cluster ip:port and directories\n",
"SPARK_MASTER_URL = os.getenv(\"SPARK_MASTER_URL\", \"spark:your-ip:port\")\n",
"RAPIDS_JAR = os.getenv(\"RAPIDS_JAR\", \"/your-path/rapids-4-spark_2.12-23.12.0.jar\")\n"
"RAPIDS_JAR = os.getenv(\"RAPIDS_JAR\", \"/your-path/rapids-4-spark_2.12-23.12.1.jar\")\n"
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ then do the following inside the Docker container.

### Get jars from Maven Central

[rapids-4-spark_2.12-23.12.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.0/rapids-4-spark_2.12-23.12.0.jar)
[rapids-4-spark_2.12-23.12.1.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.1/rapids-4-spark_2.12-23.12.1.jar)


### Launch a local mode Spark
Expand Down
2 changes: 1 addition & 1 deletion examples/UDF-Examples/RAPIDS-accelerated-UDFs/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
<cuda.version>cuda11</cuda.version>
<scala.binary.version>2.12</scala.binary.version>
<!-- Depends on release version, Snapshot version is not published to the Maven Central -->
<rapids4spark.version>23.12.0</rapids4spark.version>
<rapids4spark.version>23.12.1</rapids4spark.version>
<spark.version>3.1.1</spark.version>
<scala.version>2.12.15</scala.version>
<udf.native.build.path>${project.build.directory}/cpp-build</udf.native.build.path>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@
"Setting default log level to \"WARN\".\n",
"To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).\n",
"2022-11-30 06:57:40,550 WARN resource.ResourceUtils: The configuration of cores (exec = 2 task = 1, runnable tasks = 2) will result in wasted resources due to resource gpu limiting the number of runnable tasks per executor to: 1. Please adjust your configuration.\n",
"2022-11-30 06:57:54,195 WARN rapids.RapidsPluginUtils: RAPIDS Accelerator 23.12.0 using cudf 23.12.0.\n",
"2022-11-30 06:57:54,195 WARN rapids.RapidsPluginUtils: RAPIDS Accelerator 23.12.1 using cudf 23.12.0.\n",
"2022-11-30 06:57:54,210 WARN rapids.RapidsPluginUtils: spark.rapids.sql.multiThreadedRead.numThreads is set to 20.\n",
"2022-11-30 06:57:54,214 WARN rapids.RapidsPluginUtils: RAPIDS Accelerator is enabled, to disable GPU support set `spark.rapids.sql.enabled` to false.\n",
"2022-11-30 06:57:54,214 WARN rapids.RapidsPluginUtils: spark.rapids.sql.explain is set to `NOT_ON_GPU`. Set it to 'NONE' to suppress the diagnostics logging about the query placement on the GPU.\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,15 @@
"Dataset is derived from Fannie Mae’s [Single-Family Loan Performance Data](http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html) with all rights reserved by Fannie Mae. Refer to these [instructions](https://github.com/NVIDIA/spark-rapids-examples/blob/branch-23.12/docs/get-started/xgboost-examples/dataset/mortgage.md) to download the dataset.\n",
"\n",
"### 2. Download needed jars\n",
"* [rapids-4-spark_2.12-23.12.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.0/rapids-4-spark_2.12-23.12.0.jar)\n",
"* [rapids-4-spark_2.12-23.12.1.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.1/rapids-4-spark_2.12-23.12.1.jar)\n",
"\n",
"\n",
"### 3. Start Spark Standalone\n",
"Before running the script, please setup Spark standalone mode\n",
"\n",
"### 4. Add ENV\n",
"```\n",
"$ export SPARK_JARS=rapids-4-spark_2.12-23.12.0.jar\n",
"$ export SPARK_JARS=rapids-4-spark_2.12-23.12.1.jar\n",
"$ export PYSPARK_DRIVER_PYTHON=jupyter \n",
"$ export PYSPARK_DRIVER_PYTHON_OPTS=notebook\n",
"```\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@
"Setting default log level to \"WARN\".\n",
"To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).\n",
"2022-11-25 09:34:43,952 WARN resource.ResourceUtils: The configuration of cores (exec = 4 task = 1, runnable tasks = 4) will result in wasted resources due to resource gpu limiting the number of runnable tasks per executor to: 1. Please adjust your configuration.\n",
"2022-11-25 09:34:58,155 WARN rapids.RapidsPluginUtils: RAPIDS Accelerator 23.12.0 using cudf 23.12.0.\n",
"2022-11-25 09:34:58,155 WARN rapids.RapidsPluginUtils: RAPIDS Accelerator 23.12.1 using cudf 23.12.0.\n",
"2022-11-25 09:34:58,171 WARN rapids.RapidsPluginUtils: spark.rapids.sql.multiThreadedRead.numThreads is set to 20.\n",
"2022-11-25 09:34:58,175 WARN rapids.RapidsPluginUtils: RAPIDS Accelerator is enabled, to disable GPU support set `spark.rapids.sql.enabled` to false.\n",
"2022-11-25 09:34:58,175 WARN rapids.RapidsPluginUtils: spark.rapids.sql.explain is set to `NOT_ON_GPU`. Set it to 'NONE' to suppress the diagnostics logging about the query placement on the GPU.\n"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@
"22/11/24 06:14:06 INFO org.apache.spark.SparkEnv: Registering BlockManagerMaster\n",
"22/11/24 06:14:06 INFO org.apache.spark.SparkEnv: Registering BlockManagerMasterHeartbeat\n",
"22/11/24 06:14:06 INFO org.apache.spark.SparkEnv: Registering OutputCommitCoordinator\n",
"22/11/24 06:14:07 WARN com.nvidia.spark.rapids.RapidsPluginUtils: RAPIDS Accelerator 23.12.0 using cudf 23.12.0.\n",
"22/11/24 06:14:07 WARN com.nvidia.spark.rapids.RapidsPluginUtils: RAPIDS Accelerator 23.12.1 using cudf 23.12.0.\n",
"22/11/24 06:14:07 WARN com.nvidia.spark.rapids.RapidsPluginUtils: spark.rapids.sql.multiThreadedRead.numThreads is set to 20.\n",
"22/11/24 06:14:07 WARN com.nvidia.spark.rapids.RapidsPluginUtils: RAPIDS Accelerator is enabled, to disable GPU support set `spark.rapids.sql.enabled` to false.\n",
"22/11/24 06:14:07 WARN com.nvidia.spark.rapids.RapidsPluginUtils: spark.rapids.sql.explain is set to `NOT_ON_GPU`. Set it to 'NONE' to suppress the diagnostics logging about the query placement on the GPU.\n"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,14 @@
"Refer to these [instructions](https://github.com/NVIDIA/spark-rapids-examples/blob/branch-23.12/docs/get-started/xgboost-examples/dataset/mortgage.md) to download the dataset.\n",
"\n",
"### 2. Download needed jars\n",
"* [rapids-4-spark_2.12-23.12.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.0/rapids-4-spark_2.12-23.12.0.jar)\n",
"* [rapids-4-spark_2.12-23.12.1.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.1/rapids-4-spark_2.12-23.12.1.jar)\n",
"\n",
"### 3. Start Spark Standalone\n",
"Before Running the script, please setup Spark standalone mode\n",
"\n",
"### 4. Add ENV\n",
"```\n",
"$ export SPARK_JARS=rapids-4-spark_2.12-23.12.0.jar\n",
"$ export SPARK_JARS=rapids-4-spark_2.12-23.12.1.jar\n",
"\n",
"```\n",
"\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@
"Setting default log level to \"WARN\".\n",
"To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).\n",
"2022-11-30 08:02:10,103 WARN resource.ResourceUtils: The configuration of cores (exec = 2 task = 1, runnable tasks = 2) will result in wasted resources due to resource gpu limiting the number of runnable tasks per executor to: 1. Please adjust your configuration.\n",
"2022-11-30 08:02:23,737 WARN rapids.RapidsPluginUtils: RAPIDS Accelerator 23.12.0 using cudf 23.12.0.\n",
"2022-11-30 08:02:23,737 WARN rapids.RapidsPluginUtils: RAPIDS Accelerator 23.12.1 using cudf 23.12.0.\n",
"2022-11-30 08:02:23,752 WARN rapids.RapidsPluginUtils: spark.rapids.sql.multiThreadedRead.numThreads is set to 20.\n",
"2022-11-30 08:02:23,756 WARN rapids.RapidsPluginUtils: RAPIDS Accelerator is enabled, to disable GPU support set `spark.rapids.sql.enabled` to false.\n",
"2022-11-30 08:02:23,757 WARN rapids.RapidsPluginUtils: spark.rapids.sql.explain is set to `NOT_ON_GPU`. Set it to 'NONE' to suppress the diagnostics logging about the query placement on the GPU.\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,14 @@
"All data could be found at https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page\n",
"\n",
"### 2. Download needed jars\n",
"* [rapids-4-spark_2.12-23.12.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.0/rapids-4-spark_2.12-23.12.0.jar)\n",
"* [rapids-4-spark_2.12-23.12.1.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.1/rapids-4-spark_2.12-23.12.1.jar)\n",
"\n",
"### 3. Start Spark Standalone\n",
"Before running the script, please setup Spark standalone mode\n",
"\n",
"### 4. Add ENV\n",
"```\n",
"$ export SPARK_JARS=rapids-4-spark_2.12-23.12.0.jar\n",
"$ export SPARK_JARS=rapids-4-spark_2.12-23.12.1.jar\n",
"$ export PYSPARK_DRIVER_PYTHON=jupyter \n",
"$ export PYSPARK_DRIVER_PYTHON_OPTS=notebook\n",
"```\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@
"Setting default log level to \"WARN\".\n",
"To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).\n",
"2022-11-30 07:51:19,480 WARN resource.ResourceUtils: The configuration of cores (exec = 2 task = 1, runnable tasks = 2) will result in wasted resources due to resource gpu limiting the number of runnable tasks per executor to: 1. Please adjust your configuration.\n",
"2022-11-30 07:51:33,277 WARN rapids.RapidsPluginUtils: RAPIDS Accelerator 23.12.0 using cudf 23.12.0.\n",
"2022-11-30 07:51:33,277 WARN rapids.RapidsPluginUtils: RAPIDS Accelerator 23.12.1 using cudf 23.12.0.\n",
"2022-11-30 07:51:33,292 WARN rapids.RapidsPluginUtils: spark.rapids.sql.multiThreadedRead.numThreads is set to 20.\n",
"2022-11-30 07:51:33,295 WARN rapids.RapidsPluginUtils: RAPIDS Accelerator is enabled, to disable GPU support set `spark.rapids.sql.enabled` to false.\n",
"2022-11-30 07:51:33,295 WARN rapids.RapidsPluginUtils: spark.rapids.sql.explain is set to `NOT_ON_GPU`. Set it to 'NONE' to suppress the diagnostics logging about the query placement on the GPU.\n",
Expand Down
4 changes: 2 additions & 2 deletions examples/XGBoost-Examples/taxi/notebooks/scala/taxi-ETL.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,14 @@
"All data could be found at https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page\n",
"\n",
"### 2. Download needed jar\n",
"* [rapids-4-spark_2.12-23.12.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.0/rapids-4-spark_2.12-23.12.0.jar)\n",
"* [rapids-4-spark_2.12-23.12.1.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.12.1/rapids-4-spark_2.12-23.12.1.jar)\n",
"\n",
"### 3. Start Spark Standalone\n",
"Before running the script, please setup Spark standalone mode\n",
"\n",
"### 4. Add ENV\n",
"```\n",
"$ export SPARK_JARS=rapids-4-spark_2.12-23.12.0.jar\n",
"$ export SPARK_JARS=rapids-4-spark_2.12-23.12.1.jar\n",
"\n",
"```\n",
"\n",
Expand Down

Large diffs are not rendered by default.

Large diffs are not rendered by default.

0 comments on commit 9e97b9c

Please sign in to comment.