[BUG] Unable to use Spark Rapids with Spark Thrift Server #9966

LIN-Yu-Ting · 2023-11-27T23:28:53Z

LIN-Yu-Ting
Nov 27, 2023

Describe the bug

Our objective is to activate Spark Rapids (SQLPlugin) with Spark Thrift Server. However, we encountered some exception related to ClassNotFound. For your reference, Spark Thrift Server is also known as Distributed SQL Engine.

Steps/Code to reproduce bug

You need to launch Spark Thrift Server with $SPARK_HOME/sbin/start-thriftserver.sh with following steps:

You can not use official Spark package. Instead, Spark with Thrift Server package needs to be compiled as reference
Copy rapids-4-spark.jar to $SPARK_HOME/jars/rapids-4-spark.jar.
Prepare a spark-defaults.conf using following spark configurations under $SPARK_HOME/conf/spark-defaults.conf
Launch Thrift Server with command

bash $SPARK_HOME/sbin/start-thriftserver.sh --driver-class-path $SPARK_HOME/jars/rapids-4-spark.jar --master spark://MASTER_IP:7077

Once thrift server is launched, execute any SQL commands such as SHOW TABLES, SELECT * FROM table.

Expected behavior

Under folder $SPARK_HOME/logs, you will see a log related to Spark Thrift Server with following exception:

14:18:30.827 ERROR SparkExecuteStatementOperation - Error executing query with 71c4098a-6ba1-4320-80c7-d5da9c36427e, currentState RUNNING, 
java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: com/nvidia/spark/rapids/RuleNotFoundRunnableCommandMeta
	at com.nvidia.spark.rapids.GpuOverrides$.wrapRunnableCmd(GpuOverrides.scala:3782) ~[spark3xx-common/:?]
	at com.nvidia.spark.rapids.ExecutedCommandExecMeta.<init>(GpuExecutedCommandExec.scala:89) ~[spark3xx-common/:?]
	at com.nvidia.spark.rapids.GpuOverrides$.$anonfun$commonExecs$7(GpuOverrides.scala:3865) ~[spark3xx-common/:?]
	at com.nvidia.spark.rapids.ReplacementRule.wrap(GpuOverrides.scala:210) ~[spark3xx-common/:?]
	at com.nvidia.spark.rapids.GpuOverrides$.$anonfun$wrapPlan$1(GpuOverrides.scala:3802) ~[spark3xx-common/:?]
	at scala.Option.map(Option.scala:230) ~[scala-library-2.12.15.jar:?]
	at com.nvidia.spark.rapids.GpuOverrides$.wrapPlan(GpuOverrides.scala:3802) ~[spark3xx-common/:?]
	at com.nvidia.spark.rapids.GpuOverrides$.wrapAndTagPlan(GpuOverrides.scala:4203) ~[spark3xx-common/:?]
	at com.nvidia.spark.rapids.GpuOverrides.applyOverrides(GpuOverrides.scala:4530) ~[spark3xx-common/:?]
	at com.nvidia.spark.rapids.GpuOverrides.$anonfun$applyWithContext$3(GpuOverrides.scala:4415) ~[spark3xx-common/:?]
	at com.nvidia.spark.rapids.GpuOverrides$.logDuration(GpuOverrides.scala:451) ~[spark3xx-common/:?]
	at com.nvidia.spark.rapids.GpuOverrides.$anonfun$applyWithContext$1(GpuOverrides.scala:4412) ~[spark3xx-common/:?]
	at com.nvidia.spark.rapids.GpuOverrideUtil$.$anonfun$tryOverride$1(GpuOverrides.scala:4378) ~[spark3xx-common/:?]
	at com.nvidia.spark.rapids.GpuOverrides.applyWithContext(GpuOverrides.scala:4432) ~[spark3xx-common/:?]
	at com.nvidia.spark.rapids.GpuOverrides.apply(GpuOverrides.scala:4405) ~[spark3xx-common/:?]
	at com.nvidia.spark.rapids.GpuOverrides.apply(GpuOverrides.scala:4401) ~[spark3xx-common/:?]
	at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.$anonfun$apply$1(Columnar.scala:553) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.$anonfun$apply$1$adapted(Columnar.scala:553) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) ~[scala-library-2.12.15.jar:?]
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) ~[scala-library-2.12.15.jar:?]
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) ~[scala-library-2.12.15.jar:?]
	at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.apply(Columnar.scala:553) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.apply(Columnar.scala:514) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.execution.QueryExecution$.$anonfun$prepareForExecution$1(QueryExecution.scala:440) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) ~[scala-library-2.12.15.jar:?]
	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) ~[scala-library-2.12.15.jar:?]
	at scala.collection.immutable.List.foldLeft(List.scala:91) ~[scala-library-2.12.15.jar:?]
	at org.apache.spark.sql.execution.QueryExecution$.prepareForExecution(QueryExecution.scala:439) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:158) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) ~[spark-catalyst_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:158) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:151) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:204) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:249) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:218) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:103) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584) ~[spark-catalyst_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176) ~[spark-catalyst_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584) ~[spark-catalyst_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) ~[spark-catalyst_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) ~[spark-catalyst_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) ~[spark-catalyst_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) ~[spark-catalyst_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) ~[spark-catalyst_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560) ~[spark-catalyst_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:220) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:622) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651) ~[spark-sql_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:291) ~[spark-hive-thriftserver_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.runInternal(SparkExecuteStatementOperation.scala:216) ~[spark-hive-thriftserver_2.12-3.3.0.jar:3.3.0]
	at org.apache.hive.service.cli.operation.Operation.run(Operation.java:277) ~[spark-hive-thriftserver_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkOperation$$super$run(SparkExecuteStatementOperation.scala:43) ~[spark-hive-thriftserver_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.hive.thriftserver.SparkOperation.$anonfun$run$1(SparkOperation.scala:45) ~[spark-hive-thriftserver_2.12-3.3.0.jar:3.3.0]
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[scala-library-2.12.15.jar:?]
	at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:79) ~[spark-hive-thriftserver_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:63) ~[spark-hive-thriftserver_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:43) ~[spark-hive-thriftserver_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.hive.thriftserver.SparkOperation.run(SparkOperation.scala:45) ~[spark-hive-thriftserver_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.hive.thriftserver.SparkOperation.run$(SparkOperation.scala:43) ~[spark-hive-thriftserver_2.12-3.3.0.jar:3.3.0]
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(SparkExecuteStatementOperation.scala:43) ~[spark-hive-thriftserver_2.12-3.3.0.jar:3.3.0]
	at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:484) ~[spark-hive-thriftserver_2.12-3.3.0.jar:3.3.0]
	at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:460) ~[spark-hive-thriftserver_2.12-3.3.0.jar:3.3.0]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_382]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_382]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_382]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_382]
	at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:71) ~[spark-hive-thriftserver_2.12-3.3.0.jar:3.3.0]
	at org.apache.hive.service.cli.session.HiveSessionProxy.lambda$invoke$0(HiveSessionProxy.java:58) ~[spark-hive-thriftserver_2.12-3.3.0.jar:3.3.0]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_382]
	at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_382]
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) ~[hadoop-client-api-3.3.2.jar:?]
	at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:58) ~[spark-hive-thriftserver_2.12-3.3.0.jar:3.3.0]
	at com.sun.proxy.$Proxy45.executeStatement(Unknown Source) ~[?:?]
	at org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:280) ~[spark-hive-thriftserver_2.12-3.3.0.jar:3.3.0]
	at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:456) ~[spark-hive-thriftserver_2.12-3.3.0.jar:3.3.0]
	at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557) ~[hive-service-rpc-3.1.2.jar:3.1.2]
	at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542) ~[hive-service-rpc-3.1.2.jar:3.1.2]
	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) ~[libthrift-0.12.0.jar:0.12.0]
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) ~[libthrift-0.12.0.jar:0.12.0]
	at org.apache.thrift.server.TServlet.doPost(TServlet.java:83) ~[libthrift-0.12.0.jar:0.12.0]
	at org.apache.hive.service.cli.thrift.ThriftHttpServlet.doPost(ThriftHttpServlet.java:180) ~[spark-hive-thriftserver_2.12-3.3.0.jar:3.3.0]
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:523) ~[jakarta.servlet-api-4.0.3.jar:4.0.3]
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:590) ~[jakarta.servlet-api-4.0.3.jar:4.0.3]
	at org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.server.Server.handle(Server.java:516) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.server.HttpChannel.dispatch(HttpChannel.java:732) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:479) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:105) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:555) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:410) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:164) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:105) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409) ~[spark-core_2.12-3.3.0.jar:3.3.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_382]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_382]
	at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_382]
Caused by: java.lang.NoClassDefFoundError: com/nvidia/spark/rapids/RuleNotFoundRunnableCommandMeta
	... 135 more

Environment details (please complete the following information)

Environment location: Standalone
Spark configuration settings related to the issue
spark.driver.cores 1
spark.driver.memory 2g
spark.driver.extraClassPath /home/spark-current/jars/rapids-4-spark.jar
spark.executor.cores 1
spark.executor.memory 6g
spark.executor.resource.gpu.discoveryScript /tmp/getGpusResources.sh
spark.executor.resource.gpu.amount 1
spark.executor.extraClassPath /home/spark-current/jars/rapids-4-spark.jar
spark.task.resource.gpu.amount 0.5
spark.task.cpus 1
spark.dynamicAllocation.enabled true
spark.shuffle.service.enabled true
spark.dynamicAllocation.minExecutors 1
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator org.bdgenomics.adam.serialization.ADAMKryoRegistrator
spark.executor.extraJavaOptions -XX:+UseG1GC
spark.hadoop.io.compression.codecs org.seqdoop.hadoop_bam.util.BGZFEnhancedGzipCodec
spark.local.dir /mnt
spark.rapids.memory.pinnedPool.size 2g
spark.rapids.sql.concurrentGpuTasks 2
spark.rapids.sql.csv.read.double.enabled true
spark.rapids.sql.hasNans true
spark.rapids.sql.explain ALL
spark.plugins com.nvidia.spark.SQLPlugin

Additional context

These exceptions only happen with Thrift Server. With the same configurations, I am able to launch spark-shell and execute whatever sql commands that I want.

Answered by gerashegalov

Dec 4, 2023

Going back to our multi-shim production jar, It looks like there is a race condition that affects all the sessions if the user classloader deployment --jars is used.

After superset sessions make GpuOverrides throw NoClassDeFoundErrors, connecting a single beeline session reproduces the same NoClassDeFoundError. The good news is that the reverse is also true. After the HiveThriftServer2 is "pre-warmed" with a single session from beeline, superset's metadata queries start succeeding without NoClassDefFoundError.

So another workaround is to run a single session beeline

$SPARK_HOME/bin/beeline -u jdbc:hive2://localhost:10000 -e 'USE `default`; SHOW FUNCTIONS; SHOW SCHEMAS; SHOW TABLES IN `def…

View full answer

jlowe · 2023-11-28T15:13:12Z

jlowe
Nov 28, 2023
Maintainer

Have you tried not placing the RAPIDS Accelerator jar in the Spark jars directory and instead of specifying the driver/executor classpath via the command-line and configs, use the --jars flag instead? e.g.:

spark330/sbin/start-thriftserver.sh --jars rapids-4-spark_2.12-23.10.0.jar --conf spark.plugins=com.nvidia.spark.SQLPlugin --conf spark.rapids.sql.explain=ALL --master spark://sparkhost:7077

This worked for me, I was able to show tables and select elements from a table using pyhive to connect to the Spark thriftserver and verified via the Hive thiftserver log that the RAPIDS Accelerator was being used during the queries.

0 replies

gerashegalov · 2023-11-29T02:38:04Z

gerashegalov
Nov 29, 2023
Collaborator

I have tried to reproduce as well with different modalities for jar submission. So far I have not been able to.

You can not use official Spark package. Instead, Spark with Thrift Server package needs to be compiled as reference

This looks like an outdated note, hive-thriftserver profile is enabled in standard builds

$ cat ~/dist/spark-3.5.0-bin-hadoop3/RELEASE 
Spark 3.5.0 (git revision ce5ddad9903) built for Hadoop 3.3.4
Build flags: -B -Pmesos -Pyarn -Pkubernetes -Psparkr -Pscala-2.12 -Phadoop-3 -Phive -Phive-thriftserver

Copy rapids-4-spark.jar to $SPARK_HOME/jars/rapids-4-spark.jar.

This has worked for me too, but this is my least-favorite deployment option. It typically is only required with a standalone mode (your case) but only when the RapidsShuffleManager is used as well (not present in your conf). It does not look like the case here but mixing --jars (spark.jars) while the jar is in $SPARK_HOME/jars may cause issues #5758 with this symptom. spark.jars is preferable.

--driver-class-path $SPARK_HOME/jars/rapids-4-spark.jar

this is not necessary when you already placed your jar under $SPARK_HOME/jars

In your setup it looks cleanest if you remove the jar from $SPARK_HOME/jars and start thriftserver with --jars.

0 replies

LIN-Yu-Ting · 2023-11-29T02:54:43Z

LIN-Yu-Ting
Nov 29, 2023
Author

Hi @gerashegalov @jlowe. Thanks for your helps. Here is a youtube link which shows how I encountered this error.

Furthermore, I am using Spark 3.3.0 instead of newest spark version 3.5.0 so that I need recompile my spark package. I can try with spark 3.5.0 later.

0 replies

gerashegalov · 2023-11-29T03:48:40Z

gerashegalov
Nov 29, 2023
Collaborator

Thanks for the demo @LIN-Yu-Ting. I was using beeline to connect to the thriftserver.
Can you check if $SPARK_HOME/beeline works for you? Maybe the issue originates in Superset?

Standard Spark build for 3.3.0 works with beeline for me. And again I am not sure why you need to recompile Spark for hivethriftserver. It should already be there.

cat ~/dist/spark-3.3.0-bin-hadoop3/RELEASE 
Spark 3.3.0 (git revision f74867bddf) built for Hadoop 3.3.2
Build flags: -B -Pmesos -Pyarn -Pkubernetes -Psparkr -Pscala-2.12 -Phadoop-3 -Phive -Phive-thriftserver

At any rate, can you provide your exact build command to double-check if it is about custom build?

0 replies

LIN-Yu-Ting · 2023-11-29T07:12:54Z

LIN-Yu-Ting
Nov 29, 2023
Author

@gerashegalov
I have tried both beeline and pyhive package and as you said it is able to execute SQL query without exception.

However, when I execute SQL query using Superset through PyHive then I got the above exception, which is quite weird.

0 replies

LIN-Yu-Ting · 2023-11-29T07:27:04Z

LIN-Yu-Ting
Nov 29, 2023
Author

@jlowe @gerashegalov I got more information from logs of Spark Thrift Server which might give us more insight. Actually, error occurs when superset is executing a command

SHOW VIEWS IN `table`

Can you please try from your side to execute this command to see whether you can reproduce errors or not ? Thanks a lot.

07:17:30.296 WARN  HiveConf - HiveConf of name hive.server2.thrift.http.bind.host does not exist
07:19:46.238 WARN  HiveConf - HiveConf of name hive.server2.thrift.http.bind.host does not exist
07:19:46.239 INFO  DAGScheduler - Asked to cancel job group e36c180c-0f3e-423c-8226-319b29bb656a
07:19:46.239 INFO  SparkExecuteStatementOperation - Close statement with e36c180c-0f3e-423c-8226-319b29bb656a
07:19:46.810 WARN  HiveConf - HiveConf of name hive.server2.thrift.http.bind.host does not exist
07:19:46.810 INFO  SparkExecuteStatementOperation - Submitting query 'SHOW VIEWS IN singlecellrnaonedb' with 47d611b8-1fb1-47a4-a49d-484313c8c2b7
07:19:46.811 INFO  SparkExecuteStatementOperation - Running query with 47d611b8-1fb1-47a4-a49d-484313c8c2b7
07:19:46.815 ERROR GpuOverrideUtil - Encountered an exception applying GPU overrides java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: com/nvidia/spark/rapids/RuleNotFoundRunnableCommandMeta
java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: com/nvidia/spark/rapids/RuleNotFoundRunnableCommandMeta

Note:
I have tried to execute

SHOW VIEWS IN `table`

from my side to either beeline or PyHive. All are executable. Only SQL query from superset will fail.

0 replies

LIN-Yu-Ting · 2023-11-29T10:49:53Z

LIN-Yu-Ting
Nov 29, 2023
Author

Might the error be possibly generated by multiple sessions in Spark Thrift JDBC server ?

It seems that Superset will generate multiple sessions to Spark Thrift Server as following SparkUI shown.

When query is sent by either beeline or PyHive package, then only one session exists in Spark Thrift Server. In this case, no matter what SQL command you are executing, everything is fine.

0 replies

gerashegalov · 2023-11-29T18:49:26Z

gerashegalov
Nov 29, 2023
Collaborator

We have seen issues with multisession notebooks such as Databricks. I will give a try on Superset, not familiar with it yet.

0 replies

gerashegalov · 2023-11-29T22:32:49Z

gerashegalov
Nov 29, 2023
Collaborator

@LIN-Yu-Ting another suggestion to quickly unblock while we are looking at it.

Classloading issues are likely to go away if you build our artifact from scratch using the instructions for a single-spark-version build

To this end check out or download the source for the version tag.

In your case the Apache Spark version you want to build for is 3.3.0 which can be accomplished by running from local repo's root dir:

mvn package -pl dist -am -Dbuildver=330 -DallowConventionalDistJar=true -DskipTests

Since the tests are skipped you do not need a GPU on the machine used for the build.

The artifact will be under: dist/target/rapids-4-spark_2.12-<version>-cuda<cuda.version>.jar

0 replies

LIN-Yu-Ting · 2023-11-30T16:08:42Z

LIN-Yu-Ting
Nov 30, 2023
Author

@gerashegalov Thanks for providing this workaround. I have tried to build locally and replace the jar. However, unfortunately, I still got the same error as before. Anyway, I appreciate.

0 replies

gerashegalov · 2023-11-30T17:29:17Z

gerashegalov
Nov 30, 2023
Collaborator

@LIN-Yu-Ting Can you double-check that your jar is indeed "conventional"?

The output from running this command below should be 0

$ jar tvf dist/target/rapids-4-spark_2.12-23.12.0-SNAPSHOT-cuda11.jar | grep -c spark3xx
0

0 replies

LIN-Yu-Ting · 2023-12-01T03:01:09Z

LIN-Yu-Ting
Dec 1, 2023
Author

@gerashegalov Here is the printscreen after executing this command:

0 replies

tgravescs · 2023-12-01T14:39:24Z

tgravescs
Dec 1, 2023
Maintainer

with the new jar you built can you place it back into the $SPARK_HOME/jars/ directory and try it there? remove it from the --jars parameter.

0 replies

gerashegalov · 2023-12-01T19:23:17Z

gerashegalov
Dec 1, 2023
Collaborator

Thanks a lot for confirming @LIN-Yu-Ting.

Can we try one more thing? Can you start the thrift server with additional params to enable verbose classloading: --driver-java-options=-verbose:class --conf spark.executor.extraJavaOptions=-verbose:class

and grep the the thrift server / Driver log for rapids-4-spark jar to rule out additional jars on the classpath

$ grep -o 'Loaded.*rapids-4-spark.*\.jar*' $SPARK_HOME/logs/spark-gshegalov-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-gshegalov-dual-5760.out  | cut -d: -f 3 | sort -u

/some/path/rapids-4-spark_2.12-23.10.0-cuda11.jar

In branch-24.02 we also have a new feature of detecting duplicate jars automatically #9654. You may want to try this #9867 (comment) again but with HEAD of branch-24.02. You can try the default but better add --conf spark.rapids.sql.allowMultipleJars=NEVER to the thrift server conf.

0 replies

gerashegalov · 2023-12-03T07:37:05Z

gerashegalov
Dec 3, 2023
Collaborator

I was able to reproduce this NoClassDefFoundError.

I confirmed that it goes away with the simple -DconventionalDistJar=true build when it is used in a static classpath
$SPARK_HOME/jar or via --driver-class-path / spark.executor.extraClassPath

Even with the simple jar we have NoClassDefFoundError if the jar is passed via --jars

While the workaround for NoClassDefFoundError is correct, there are some org.apache.thrift.transport.TTransportException for metadata queries but I see them with the CPU Spark as well. But generally I can run SQL queries in SQL Lab fine.

0 replies

LIN-Yu-Ting · 2023-12-04T02:10:56Z

LIN-Yu-Ting
Dec 4, 2023
Author

I confirmed that with simple -DallowConventionalDistJar=true build and with a static classpath $SPARK_HOME/jar, there is no NoClassDefFoundError anymore.

Thanks a lot @gerashegalov.

0 replies

gerashegalov · 2023-12-04T19:07:07Z

gerashegalov
Dec 4, 2023
Collaborator

Going back to our multi-shim production jar, It looks like there is a race condition that affects all the sessions if the user classloader deployment --jars is used.

After superset sessions make GpuOverrides throw NoClassDeFoundErrors, connecting a single beeline session reproduces the same NoClassDeFoundError. The good news is that the reverse is also true. After the HiveThriftServer2 is "pre-warmed" with a single session from beeline, superset's metadata queries start succeeding without NoClassDefFoundError.

So another workaround is to run a single session beeline

$SPARK_HOME/bin/beeline -u jdbc:hive2://localhost:10000 -e 'USE `default`; SHOW FUNCTIONS; SHOW SCHEMAS; SHOW TABLES IN `default`; SHOW VIEWS IN `default`;'

before allowing traffic to the thrift server from superset.

We should review the usage of lazy vals

2 replies

sam-goodwin May 4, 2024

Is there a long term fix for this? I am running into this problem and it is painful and cryptic.

This workaround doesn't work when using Nessie/Iceberg jars because it complains about not being able to parse the response - probably something to do with the nessie/iceberg syntax.

sam-goodwin May 4, 2024

I have worked around this issue by putting the jars in /lib/spark/jars. Doesn't seem ideal but does work. Couldn't get the spark.executor.extraClassPath to work.

[BUG] Unable to use Spark Rapids with Spark Thrift Server #9966

LIN-Yu-Ting Nov 27, 2023

Replies: 17 comments · 2 replies

jlowe Nov 28, 2023 Maintainer

gerashegalov Nov 29, 2023 Collaborator

LIN-Yu-Ting Nov 29, 2023 Author

gerashegalov Nov 29, 2023 Collaborator

LIN-Yu-Ting Nov 29, 2023 Author

LIN-Yu-Ting Nov 29, 2023 Author

LIN-Yu-Ting Nov 29, 2023 Author

gerashegalov Nov 29, 2023 Collaborator

gerashegalov Nov 29, 2023 Collaborator

LIN-Yu-Ting Nov 30, 2023 Author

gerashegalov Nov 30, 2023 Collaborator

LIN-Yu-Ting Dec 1, 2023 Author

tgravescs Dec 1, 2023 Maintainer

gerashegalov Dec 1, 2023 Collaborator

gerashegalov Dec 3, 2023 Collaborator

LIN-Yu-Ting Dec 4, 2023 Author

gerashegalov Dec 4, 2023 Collaborator

sam-goodwin May 4, 2024

sam-goodwin May 4, 2024

LIN-Yu-Ting
Nov 27, 2023

Replies: 17 comments 2 replies

jlowe
Nov 28, 2023
Maintainer

gerashegalov
Nov 29, 2023
Collaborator

LIN-Yu-Ting
Nov 29, 2023
Author

gerashegalov
Nov 29, 2023
Collaborator

LIN-Yu-Ting
Nov 29, 2023
Author

LIN-Yu-Ting
Nov 29, 2023
Author

LIN-Yu-Ting
Nov 29, 2023
Author

gerashegalov
Nov 29, 2023
Collaborator

gerashegalov
Nov 29, 2023
Collaborator

LIN-Yu-Ting
Nov 30, 2023
Author

gerashegalov
Nov 30, 2023
Collaborator

LIN-Yu-Ting
Dec 1, 2023
Author

tgravescs
Dec 1, 2023
Maintainer

gerashegalov
Dec 1, 2023
Collaborator

gerashegalov
Dec 3, 2023
Collaborator

LIN-Yu-Ting
Dec 4, 2023
Author

gerashegalov
Dec 4, 2023
Collaborator