You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Basically, it is 26x times slower (25.95 - 26.31 sec) than clean apache spark (0.97 - 0.98 sec) on my laptop:
MacOS: Catalina ver. 10
Processor Name: Quad-Core Intel Core i7
Processor Speed: 2,9 GHz
Total Number of Cores: 4
Memory: 16 GB
Oracle jdk1.8.0_201.jdk
scala-sdk-2.11.8
snappydata-cluster_2.11:1.2.0 or 1.1.0
RuntimeMemoryManager org.apache.spark.memory.SnappyUnifiedMemoryManager@4c398c80 configuration:
Total Usable Heap = 2.9 GB (3082926162)
Storage Pool = 1470.1 MB (1541463081)
Execution Pool = 1470.1 MB (1541463081)
Max Storage Pool Size = 2.3 GB (2466340929)
I'm sure I have not tuned my environment well enough, but I though it is still important to post this issue, since it is not related to spark, but snappy spark distribution:
val rangeData = spark.range(1000L * 1000 * 1000).toDF()
rangeData.cache()
rangeData.count()
leads to the error:
java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:863)
at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:102)
at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:90)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1366)
at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:104)
at org.apache.spark.storage.BlockManager.getLocalValues(BlockManager.scala:468)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:704)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:41)
...
The text was updated successfully, but these errors were encountered:
And this also can be reproduces just with any other big dataframe that could not fit into memory.
Good memory manager logs warns about that:
20/05/29 01:13:53 WARN SnappyUnifiedMemoryManager: Could not allocate memory for rdd_4_0 of _SPARK_CACHE_ size=1084871037. Memory pool size 2164224048
20/05/29 01:13:53 WARN MemoryStore: Not enough space to cache rdd_4_0 in memory! (computed 2.0 GB so far)
20/05/29 01:13:53 INFO MemoryStore: Memory use = 0.0 B (blocks) + 2.0 GB (scratch space shared across 1 tasks(s)) = 2.0 GB. Storage limit = 2.9 GB.
20/05/29 01:13:53 WARN BlockManager: Persisting block rdd_4_0 to disk instead.
Tried to reproduce a benchmark test made here:
https://dzone.com/articles/joining-a-billion-rows-20x-faster-than-apache-spar
Basically, it is 26x times slower (25.95 - 26.31 sec) than clean apache spark (0.97 - 0.98 sec) on my laptop:
I'm sure I have not tuned my environment well enough, but I though it is still important to post this issue, since it is not related to spark, but snappy spark distribution:
leads to the error:
The text was updated successfully, but these errors were encountered: