You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had searched in the issues and found no similar issues.
What happened
DolphinScheduler version: 3.2.2
Deployment: pseudo-cluster
Spark is deployed in a standalone cluster, version: 3.5.4
Resource files are stored using MinIO S3
The configuration files involve api-server/conf/common.properties and worker-server/conf/common.properties, The main changes are as follows:
Keep the rest of the configuration as default, After starting the service, the jar file can be uploaded normally.
Then select the SPARK component in the workflow, select the Jar package uploaded to MinIO, and select cluster as the deployment method.
Then run the workflow instance, and the output log attachment is as follows:
[INFO] 2025-01-24 13:53:34.674 +0800 - ********************************* Execute task instance *************************************
[INFO] 2025-01-24 13:53:34.675 +0800 - ***********************************************************************************************
[INFO] 2025-01-24 13:53:34.677 +0800 - Final Shell file is:
[INFO] 2025-01-24 13:53:34.677 +0800 - ****************************** Script Content *****************************************************************
[INFO] 2025-01-24 13:53:34.677 +0800 - #!/bin/bash
BASEDIR=$(cd `dirname $0`; pwd)
cd $BASEDIR
export SPARK_HOME=/opt/spark-3.5.4-bin-hadoop3
${SPARK_HOME}/bin/spark-submit --master spark://192.168.11.17:7077 --deploy-mode cluster --class org.apache.spark.examples.JavaSparkPi --conf spark.driver.cores=1 --conf spark.driver.memory=512M --conf spark.executor.instances=2 --conf spark.executor.cores=2 --conf spark.executor.memory=2G /tmp/dolphinscheduler/exec/process/default/131329535157952/131329769571008_2/6/6/spark-examples_2.12-3.5.4.jar
[INFO] 2025-01-24 13:53:34.678 +0800 - ****************************** Script Content *****************************************************************
[INFO] 2025-01-24 13:53:34.678 +0800 - Executing shell command : sudo -u default -i /tmp/dolphinscheduler/exec/process/default/131329535157952/131329769571008_2/6/6/6_6.sh
[INFO] 2025-01-24 13:53:34.687 +0800 - process start, process id is: 172698
[INFO] 2025-01-24 13:53:37.688 +0800 - ->
25/01/24 13:53:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
25/01/24 13:53:37 INFO SecurityManager: Changing view acls to: default
25/01/24 13:53:37 INFO SecurityManager: Changing modify acls to: default
25/01/24 13:53:37 INFO SecurityManager: Changing view acls groups to:
25/01/24 13:53:37 INFO SecurityManager: Changing modify acls groups to:
25/01/24 13:53:37 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: default; groups with view permissions: EMPTY; users with modify permissions: default; groups with modify permissions: EMPTY
[INFO] 2025-01-24 13:53:38.691 +0800 - ->
25/01/24 13:53:37 INFO Utils: Successfully started service 'driverClient' on port 39639.
25/01/24 13:53:37 INFO TransportClientFactory: Successfully created connection to /192.168.11.17:7077 after 57 ms (0 ms spent in bootstraps)
25/01/24 13:53:38 INFO ClientEndpoint: ... waiting before polling master for driver state
25/01/24 13:53:38 INFO ClientEndpoint: Driver successfully submitted as driver-20250124135338-0056
[INFO] 2025-01-24 13:53:43.693 +0800 - ->
25/01/24 13:53:43 INFO ClientEndpoint: State of driver-20250124135338-0056 is ERROR
25/01/24 13:53:43 ERROR ClientEndpoint: Exception from cluster was: java.nio.file.NoSuchFileException: /tmp/dolphinscheduler/exec/process/default/131329535157952/131329769571008_2/6/6/spark-examples_2.12-3.5.4.jar
java.nio.file.NoSuchFileException: /tmp/dolphinscheduler/exec/process/default/131329535157952/131329769571008_2/6/6/spark-examples_2.12-3.5.4.jar
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixCopyFile.copy(UnixCopyFile.java:526)
at sun.nio.fs.UnixFileSystemProvider.copy(UnixFileSystemProvider.java:253)
at java.nio.file.Files.copy(Files.java:1274)
at org.apache.spark.util.Utils$.copyRecursive(Utils.scala:681)
at org.apache.spark.util.Utils$.copyFile(Utils.scala:652)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:725)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:467)
at org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:162)
at org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:179)
at org.apache.spark.deploy.worker.DriverRunner$$anon$2.run(DriverRunner.scala:99)
25/01/24 13:53:43 INFO ShutdownHookManager: Shutdown hook called
25/01/24 13:53:43 INFO ShutdownHookManager: Deleting directory /tmp/spark-2af4f41d-c583-4698-9d8e-546a656bcf17
[INFO] 2025-01-24 13:53:43.695 +0800 - process has exited. execute path:/tmp/dolphinscheduler/exec/process/default/131329535157952/131329769571008_2/6/6, processId:172698 ,exitStatusCode:255 ,processWaitForStatus:true ,processExitValue:255
[INFO] 2025-01-24 13:53:43.697 +0800 - Start finding appId in /opt/apache-dolphinscheduler-3.2.2-bin/worker-server/logs/20250124/131329769571008/2/6/6.log, fetch way: log
[INFO] 2025-01-24 13:53:43.698 +0800 -
***********************************************************************************************
[INFO] 2025-01-24 13:53:43.699 +0800 - ********************************* Finalize task instance ************************************
[INFO] 2025-01-24 13:53:43.699 +0800 - ***********************************************************************************************
From the error message, we can see that although the jar package on MinIO was selected when configuring the workflow, DolphinScheduler still used the local temporary directory as a parameter during runtime, which caused the Spark Driver to fail to read the package and cause an error.
What you expected to happen
Tasks can be submitted and run normally,
How to reproduce
You can reproduce it by following the steps above.
Anything else
The above problem will occur as long as DolphinScheduler and Spark Driver are not running on the same node.
monchickey
changed the title
[Bug] [All Model] Failed to submit Spark task in cluster mode
[Bug] [All Module] Failed to submit Spark task in cluster mode
Jan 24, 2025
monchickey
changed the title
[Bug] [All Module] Failed to submit Spark task in cluster mode
[Bug] [Worker] Failed to submit Spark task in cluster mode
Jan 24, 2025
@SbloodyS I just tried it and still get the same error, because the main package is required, and the selection in the resources below doesn't seem to take effect.
I uploaded the package by selecting Resources->Upload Files, and then selected the package in Main Package and Resources. Is this OK?
Search before asking
What happened
DolphinScheduler version: 3.2.2
Deployment: pseudo-cluster
Spark is deployed in a standalone cluster, version: 3.5.4
Resource files are stored using MinIO S3
The configuration files involve
api-server/conf/common.properties
andworker-server/conf/common.properties
, The main changes are as follows:Keep the rest of the configuration as default, After starting the service, the jar file can be uploaded normally.
Then select the SPARK component in the workflow, select the Jar package uploaded to MinIO, and select
cluster
as the deployment method.Then run the workflow instance, and the output log attachment is as follows:
1737699046243.log
The important error information is:
From the error message, we can see that although the jar package on MinIO was selected when configuring the workflow, DolphinScheduler still used the local temporary directory as a parameter during runtime, which caused the Spark Driver to fail to read the package and cause an error.
What you expected to happen
Tasks can be submitted and run normally,
How to reproduce
You can reproduce it by following the steps above.
Anything else
The above problem will occur as long as DolphinScheduler and Spark Driver are not running on the same node.
Version
3.2.x
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: