java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE #1548

Tradunsky · 2020-05-28T21:41:07Z

Tried to reproduce a benchmark test made here:
https://dzone.com/articles/joining-a-billion-rows-20x-faster-than-apache-spar

Basically, it is 26x times slower (25.95 - 26.31 sec) than clean apache spark (0.97 - 0.98 sec) on my laptop:

MacOS: Catalina ver. 10 
  Processor Name:	Quad-Core Intel Core i7
  Processor Speed:	2,9 GHz
  Total Number of Cores: 4
  Memory:	16 GB

Oracle jdk1.8.0_201.jdk
scala-sdk-2.11.8
snappydata-cluster_2.11:1.2.0 or 1.1.0

RuntimeMemoryManager org.apache.spark.memory.SnappyUnifiedMemoryManager@4c398c80 configuration:
		Total Usable Heap = 2.9 GB (3082926162)
		Storage Pool = 1470.1 MB (1541463081)
		Execution Pool = 1470.1 MB (1541463081)
		Max Storage Pool Size = 2.3 GB (2466340929)

I'm sure I have not tuned my environment well enough, but I though it is still important to post this issue, since it is not related to spark, but snappy spark distribution:

val rangeData = spark.range(1000L * 1000 * 1000).toDF()
rangeData.cache()
rangeData.count()

leads to the error:

java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
	at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:863)
	at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:102)
	at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:90)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1366)
	at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:104)
	at org.apache.spark.storage.BlockManager.getLocalValues(BlockManager.scala:468)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:704)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:41)
...

The text was updated successfully, but these errors were encountered:

Tradunsky · 2020-05-28T22:15:51Z

And this also can be reproduces just with any other big dataframe that could not fit into memory.
Good memory manager logs warns about that:

20/05/29 01:13:53 WARN SnappyUnifiedMemoryManager: Could not allocate memory for rdd_4_0 of _SPARK_CACHE_ size=1084871037. Memory pool size 2164224048
20/05/29 01:13:53 WARN MemoryStore: Not enough space to cache rdd_4_0 in memory! (computed 2.0 GB so far)
20/05/29 01:13:53 INFO MemoryStore: Memory use = 0.0 B (blocks) + 2.0 GB (scratch space shared across 1 tasks(s)) = 2.0 GB. Storage limit = 2.9 GB.
20/05/29 01:13:53 WARN BlockManager: Persisting block rdd_4_0 to disk instead.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE #1548

java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE #1548

Tradunsky commented May 28, 2020 •

edited

Loading

Tradunsky commented May 28, 2020

java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE #1548

java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE #1548

Comments

Tradunsky commented May 28, 2020 • edited Loading

Tradunsky commented May 28, 2020

Tradunsky commented May 28, 2020 •

edited

Loading