You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After having followed the documentation to connect Databricks to Pycharm, I am not able to run the sample example in https://docs.azuredatabricks.net/user-guide/dev-tools/db-connect.html#run-examples-from-your-ide car I get an error. Notice that the connection seem to work car at the beginning is checking the cluster status and is executing it; after that the error occurs on the spark command execution
19/04/24 15:07:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/04/24 15:07:08 WARN MetricsSystem: Using default name SparkStatusTracker for source because neither spark.metrics.namespace nor spark.app.id is set.
Testing simple count
19/04/24 15:07:10 WARN HTTPClient: Setting proxy configuration for HTTP client based on env var HTTPS_PROXY=https://proxy_name
19/04/24 15:07:13 WARN SparkClientManager: Cluster 1108-095209-xxx in state PENDING, waiting for it to start running...
19/04/24 15:07:24 WARN SparkClientManager: Cluster 1108-095209-xxx in state PENDING, waiting for it to start running...
19/04/24 15:07:34 WARN SparkClientManager: Cluster 1108-095209-xxx in state PENDING, waiting for it to start running...
Traceback (most recent call last):
File "C:/Users/my_name/PycharmProjects/Databricks/main.py", line 7, in <module>
print(spark.range(100).count())
File "C:\Users\my_name\AppData\Local\Continuum\anaconda3\envs\dbconnect\lib\site-packages\pyspark\sql\session.py", line 337, in range
jdf = self._jsparkSession.range(0, int(start), int(step), int(numPartitions))
File "C:\Users\my_name\AppData\Local\Continuum\anaconda3\envs\dbconnect\lib\site-packages\py4j\java_gateway.py", line 1257, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "C:\Users\my_name\AppData\Local\Continuum\anaconda3\envs\dbconnect\lib\site-packages\pyspark\sql\utils.py", line 63, in deco
return f(*a, **kw)
File "C:\Users\my_name\AppData\Local\Continuum\anaconda3\envs\dbconnect\lib\site-packages\py4j\protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o20.range.
: java.lang.NoClassDefFoundError: com/trueaccord/scalapb/GeneratedMessage
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(Unknown Source)
at java.security.SecureClassLoader.defineClass(Unknown Source)
at java.net.URLClassLoader.defineClass(Unknown Source)
at java.net.URLClassLoader.access$100(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at com.databricks.service.SparkServiceRPCClientStub.com$databricks$service$SparkServiceRPCClientStub$$buildRpc(SparkServiceRPCClientStub.scala:352)
at com.databricks.service.SparkServiceRPCClientStub$$anonfun$pollStatuses$1.apply(SparkServiceRPCClientStub.scala:458)
at com.databricks.service.SparkServiceRPCClientStub$$anonfun$pollStatuses$1.apply(SparkServiceRPCClientStub.scala:457)
at com.databricks.spark.util.Log4jUsageLogger.recordOperation(UsageLogger.scala:161)
at com.databricks.spark.util.UsageLogging$class.recordOperation(UsageLogger.scala:286)
at com.databricks.service.SparkServiceRPCClientStub.recordOperation(SparkServiceRPCClientStub.scala:48)
at com.databricks.service.SparkServiceRPCClientStub.pollStatuses(SparkServiceRPCClientStub.scala:457)
at com.databricks.service.SparkServiceRPCClientStub.com$databricks$service$SparkServiceRPCClientStub$$pollAndUpdateStatuses0(SparkServiceRPCClientStub.scala:428)
at com.databricks.service.SparkServiceRPCClientStub$$anonfun$pollAndUpdateStatuses$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SparkServiceRPCClientStub.scala:409)
at com.databricks.service.SparkServiceRPCClientStub$$anonfun$pollAndUpdateStatuses$1$$anonfun$apply$mcV$sp$1.apply(SparkServiceRPCClientStub.scala:407)
at com.databricks.service.SparkServiceRPCClientStub$$anonfun$pollAndUpdateStatuses$1$$anonfun$apply$mcV$sp$1.apply(SparkServiceRPCClientStub.scala:407)
at com.databricks.service.SparkServiceRPCClientStub.com$databricks$service$SparkServiceRPCClientStub$$withPollLock(SparkServiceRPCClientStub.scala:419)
at com.databricks.service.SparkServiceRPCClientStub$$anonfun$pollAndUpdateStatuses$1.apply$mcV$sp(SparkServiceRPCClientStub.scala:406)
at com.databricks.service.SparkServiceRPCClientStub$$anonfun$pollAndUpdateStatuses$1.apply(SparkServiceRPCClientStub.scala:404)
at com.databricks.service.SparkServiceRPCClientStub$$anonfun$pollAndUpdateStatuses$1.apply(SparkServiceRPCClientStub.scala:404)
at com.databricks.spark.util.Log4jUsageLogger.recordOperation(UsageLogger.scala:161)
at com.databricks.spark.util.UsageLogging$class.recordOperation(UsageLogger.scala:286)
at com.databricks.service.SparkServiceRPCClientStub.recordOperation(SparkServiceRPCClientStub.scala:48)
at com.databricks.service.SparkServiceRPCClientStub.pollAndUpdateStatuses(SparkServiceRPCClientStub.scala:404)
at com.databricks.service.SparkServiceRPCClientStub$$anonfun$getServerHadoopConf$1.apply(SparkServiceRPCClientStub.scala:382)
at com.databricks.service.SparkServiceRPCClientStub$$anonfun$getServerHadoopConf$1.apply(SparkServiceRPCClientStub.scala:381)
at com.databricks.service.SparkServiceRPCClientStub.com$databricks$service$SparkServiceRPCClientStub$$withPollLock(SparkServiceRPCClientStub.scala:419)
at com.databricks.service.SparkServiceRPCClientStub.getServerHadoopConf(SparkServiceRPCClientStub.scala:381)
at com.databricks.service.SparkClient$.getServerHadoopConf(SparkClient.scala:211)
at com.databricks.spark.util.SparkClientContext$.getServerHadoopConf(SparkClientContext.scala:217)
at org.apache.spark.SparkContext$$anonfun$hadoopConfiguration$1.apply(SparkContext.scala:316)
at org.apache.spark.SparkContext$$anonfun$hadoopConfiguration$1.apply(SparkContext.scala:311)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.SparkContext.hadoopConfiguration(SparkContext.scala:310)
at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:66)
at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:145)
at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:145)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:145)
at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:144)
at org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:291)
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1175)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:170)
at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:169)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:169)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:166)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:193)
at org.apache.spark.sql.SparkSession.range(SparkSession.scala:609)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassNotFoundException: com.trueaccord.scalapb.GeneratedMessage
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 67 more
Process finished with exit code 1
The text was updated successfully, but these errors were encountered:
After having followed the documentation to connect Databricks to Pycharm, I am not able to run the sample example in https://docs.azuredatabricks.net/user-guide/dev-tools/db-connect.html#run-examples-from-your-ide car I get an error. Notice that the connection seem to work car at the beginning is checking the cluster status and is executing it; after that the error occurs on the spark command execution
The text was updated successfully, but these errors were encountered: