-
Notifications
You must be signed in to change notification settings - Fork 1
Troubleshooting
Delete <filename>.crc
in the local file system.
- Remove
.cache
from input file - When using a shuffle operatur (e.g.,
groupBy
,reduceBy
,repartition
), specify / increase thenumPartitions
parameter.- More partitions → better parallelization, better memory partitioning
- Number of partitions can be read via
.numPartitions
- Add
--executor-memory 19G
as a parameter forspark-submit
or-e 19
as parameter forspark.sh
. - Add
--conf spark.storage.memoryFraction=0.4
as parameter forspark-submit
(decreases the available main memory, increases the cache memory)
Error:
17/03/31 13:15:03 ERROR bonecp.PoolWatchThread: Error in trying to obtain a connection. Retrying in 7000ms
java.sql.SQLException: A read-only user or a user in a read-only database is not permitted to disable read-only mode on a connection.
Delete file metastore_db/dbex.lck
in the current directory.
yarn logs -applicationId <applicationId> | less
https://github.com/bpn1/ingestion/issues/427
Add --conf spark.cassandra.output.throughput_mb_per_sec=0.4
as a parameter for spark-submit
(works for 23 executors). If the error still occurs, decrease the value, otherwise you may increase it to accelerate the execution.
Where to get the companies.jar.
How to get the companies.jar to work.
Error message: Error wrapping InputStream in GZIPInputStream: java.io.EOFException
.
java.lang.NoClassDefFoundError: org/apache/spark/SparkContext
or
org.apache.spark.SparkException: A master URL must be set in your configuration
Delete the .idea
folder in the project directory.
In the build.sbt
, replace
"com.holdenkarau" % "spark-testing-base_2.11" % "2.1.0_0.6.0" % "provided",
with
"com.holdenkarau" % "spark-testing-base_2.11" % "2.1.0_0.6.0" % "test" excludeAll(
ExclusionRule(organization = "org.scalacheck"),
ExclusionRule(organization = "org.scalactic"),
ExclusionRule(organization = "org.scalatest")
),
Refresh the settings locally, then reset the build.sbt
without refreshing. For a reference, see https://github.com/holdenk/spark-testing-base/issues/170.