Skip to content

Troubleshooting

Jan Ehmueller edited this page Jul 27, 2017 · 20 revisions

Checksum Exception when uploading a file to HDFS

Delete <filename>.crc in the local file system.



Spark program crashes due to lack of memory

  • Remove .cache from input file
  • When using a shuffle operatur (e.g., groupBy, reduceBy, repartition), specify / increase the numPartitions parameter.
    • More partitions → better parallelization, better memory partitioning
    • Number of partitions can be read via .numPartitions
  • Add --executor-memory 19G as a parameter for spark-submit or -e 19 as parameter for spark.sh.
  • Add --conf spark.storage.memoryFraction=0.4 as parameter for spark-submit (decreases the available main memory, increases the cache memory)



Spark Shell does not connect

Error:

17/03/31 13:15:03 ERROR bonecp.PoolWatchThread: Error in trying to obtain a connection. Retrying in 7000ms
java.sql.SQLException: A read-only user or a user in a read-only database is not permitted to disable read-only mode on a connection.

Delete file metastore_db/dbex.lck in the current directory.



View log of Spark program

yarn logs -applicationId <applicationId> | less



Cassandra Timeout

https://github.com/bpn1/ingestion/issues/427

Add --conf spark.cassandra.output.throughput_mb_per_sec=0.4 as a parameter for spark-submit (works for 23 executors). If the error still occurs, decrease the value, otherwise you may increase it to accelerate the execution.



companies.jar is missing

Where to get the companies.jar.



companies.jar does not work

How to get the companies.jar to work.



EOF Exception in SBT Console

Error message: Error wrapping InputStream in GZIPInputStream: java.io.EOFException.

Solution.



IntelliJ IDEA misconfiguration

java.lang.NoClassDefFoundError: org/apache/spark/SparkContext

or

org.apache.spark.SparkException: A master URL must be set in your configuration

Delete the .idea folder in the project directory.

Scalatest tests do not start

In the build.sbt, replace "com.holdenkarau" % "spark-testing-base_2.11" % "2.1.0_0.6.0" % "provided", with

"com.holdenkarau" % "spark-testing-base_2.11" % "2.1.0_0.6.0" % "test" excludeAll(
	ExclusionRule(organization = "org.scalacheck"),
	ExclusionRule(organization = "org.scalactic"),
	ExclusionRule(organization = "org.scalatest")
),

Refresh the settings locally, then reset the build.sbt without refreshing. For a reference, see https://github.com/holdenk/spark-testing-base/issues/170.