[FEA] Adjust the spill size heuristic rule #1415
Labels
feature request
New feature or request
user_tools
Scope the wrapper module running CSP, QualX, and reports (python)
Is your feature request related to a problem? Please describe.
Current spill size heuristic rule is 10G and it can be changed by env var
RAPIDS_USER_TOOLS_SPILL_BYTES_THRESHOLD
.This does not consider the factor that how many worker nodes and how many CPU cores this CPU job uses.
The more resources(especially the more SSDs used by each executor), the negative impact of the spill is less.
Describe the solution you'd like
Use the Total_Spill_Size/Total_number_of_CPU_COREs > certain_threashold as the new rule.
Eg. If the per_CPU_CORE_spill is >1GB, then we disqualify this job.
(Of course, if you have better algorithm, feel free to share)
The text was updated successfully, but these errors were encountered: