[FEA] Look at switching algorithms on split and retry in aggregation #8432
Labels
feature request
New feature or request
reliability
Features to improve reliability or bugs that severly impact the reliability of the plugin
Is your feature request related to a problem? Please describe.
Aggregations are kind of complicated. We currently have a sort fallback, but that might change #8391
However the fallback to sort, or whatever is next, is based on the intermediate size getting larger than the target batch size. If the user configured the job incorrectly where there is not enough memory for a target batch size input to complete, then we might want to look at falling back to another algorithm sooner.
For example I have seen stack traces in some extreme tests where we try to concat batches together and cannot because we are out of memory. I think this is likely due to fragmentation, but the concat code does not have a split and retry handling. It probably should fall back to doing sort based aggregations instead.
This should not be that common, but I thought we should still have it as something we could do more to improve reliability.
The text was updated successfully, but these errors were encountered: