You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
The ParquetCachedBatchSerializer does not have any code in it to grab the GPU semaphore before it reads in data using the GPU. GPUInMemoryTableScanExec does not do this either. This means that we can run into situations where we are increasing the load on the GPU memory.
To make things worse if we do run out of memory the serializer code has no retry block in it so we are only relying on spilling to solve all of the problems.
Steps/Code to reproduce bug
// produce 32 GiB of uncompressed data (RLE should make the cached version very small)
val df = spark.range(4294967296L).cache()
// Generate the cached data and process it...
df.filter("id > 100").selectExpr("COUNT(DISTINCT id)").show()
// First run passes, so run it again to read the cached data...
df.filter("id > 100").selectExpr("COUNT(DISTINCT id)").show()
// Failed with an OOM
25/01/21 19:09:55 ERROR Executor: Exception in task 6.0 in stage 3.0 (TID 23)
java.lang.OutOfMemoryError: Could not allocate native memory: std::bad_alloc: out_of_memory: RMM failure at:/home/roberte/src/spark-rapids-jni/target/libcudf/cmake-build/_deps/rmm-src/include/rmm/mr/device/limiting_resource_adaptor.hpp:152: Exceeded memory limit
at ai.rapids.cudf.Table.readParquet(Native Method)
at ai.rapids.cudf.Table.readParquet(Table.java:1433)
at ai.rapids.cudf.Table.readParquet(Table.java:1400)
at ai.rapids.cudf.Table.readParquet(Table.java:1413)
at com.nvidia.spark.rapids.ParquetCachedBatchSerializer.$anonfun$convertCachedBatchToColumnarInternal$1(ParquetCachedBatchSerializer.scala:500)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
at com.nvidia.spark.rapids.CollectTimeIterator.$anonfun$hasNext$1(GpuMetrics.scala:282)
at com.nvidia.spark.rapids.CollectTimeIterator.$anonfun$hasNext$1$adapted(GpuMetrics.scala:281)
...
Expected behavior
We should be able to run even with low memory.
The text was updated successfully, but these errors were encountered:
I think I have a fix for the semaphore problem, but I need some more time to evaluate it and then I will probably split this up and have a separate issue to deal with the retry code, as that looks to be quite a bit more complicated.
Describe the bug
The ParquetCachedBatchSerializer does not have any code in it to grab the GPU semaphore before it reads in data using the GPU. GPUInMemoryTableScanExec does not do this either. This means that we can run into situations where we are increasing the load on the GPU memory.
To make things worse if we do run out of memory the serializer code has no retry block in it so we are only relying on spilling to solve all of the problems.
Steps/Code to reproduce bug
Expected behavior
We should be able to run even with low memory.
The text was updated successfully, but these errors were encountered: