You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our QA found the following error message as they were debugging something else, and rightly called out it was confusing:
24/02/06 04:02:51 ERROR RapidsExecutorPlugin: Exception in the executor plugin, shutting down!
java.lang.IllegalArgumentException: The pool allocation of -498.875 MB (calculated from
spark.rapids.memory.gpu.allocFraction (=1.0) and 141.125 MB free memory) was less than the minimum allocation of
20252.015625 (calculated from spark.rapids.memory.gpu.minAllocFraction (=0.25) and 81008.0625 MB total memory)
The reason for the negative number is we are missing our reserve amount from the message. There are other things to clean up here like that there are no units in the minimum allocation umber. Also these should all be MiB not MB.
We could say:
The pool allocation of -498.875 MiB (gpu.free: 141.125 MiB, spark.rapids.memory.gpu.allocFraction: 1.0,
spark.rapids.memory.gpu.reserve: 640 MiB => (gpu.free - reserve) * allocFraction = -498.875 MiB) was less than the minimum
allocation of 20252.016 MiB (gpu.total: 81008.063 MiB, spark.rapids.memory.gpu.minAllocFraction: 0.25 => gpu.total *
minAllocFraction = 20252.016 MiB). Please ensure that the GPU has enough free memory, or adjust configuration accordingly.
Added some line breaks in the messages above to make it easier to read.
The text was updated successfully, but these errors were encountered:
Our QA found the following error message as they were debugging something else, and rightly called out it was confusing:
The reason for the negative number is we are missing our reserve amount from the message. There are other things to clean up here like that there are no units in the minimum allocation umber. Also these should all be MiB not MB.
We could say:
Added some line breaks in the messages above to make it easier to read.
The text was updated successfully, but these errors were encountered: