[DISCUSSION]: Unified approach for joins to output batches close to `batch_size` #14238

comphead · 2025-01-22T16:27:37Z

          `BatchCoalescer` is not used in joins yet, since CoalesceBatchesExec appears after the joins having filter, in case of the output batches might have a  lower row count than target batch size. So, why cannot we follow the same pattern in SMJ? If collecting batches in the join itself is more performant, then we should also refactor the other joins as well?

On the other hand, BatchSplitter is used in other joins, and SMJ could (should) have it too, as there is no other way of splitting the batches according to target batch size.

I've thought about this, and I believe the most optimal solution is to make all join operators capable of performing both coalescing and splitting in a built-in manner. This is because the output of a join can either be smaller or larger than the target batch size. Ideally, there should be no need (or only minimal need) for CoalesceBatchesExec.

To achieve this built-in coalescing and splitting, we can leverage existing tools like BatchSplitter and BatchCoalescer (although there are no current examples of BatchCoalescer being used in joins). My suggestion is to generalize these tools so they can be utilized by any operator and applied wherever this mechanism is needed. As this pattern becomes more common, it will be easier to expand its usage and simplify its application.

Originally posted by @berkaysynnada in #14160 (comment)

The text was updated successfully, but these errors were encountered:

comphead · 2025-01-22T16:40:15Z

The direction proposed by @berkaysynnada is worth to discuss. The join specifics doesn't guarantee output batch size in records. It can much much smaller or even empty because of filtering, and it can be much larger because of join explosions.

The idea to discuss how we can make the output batches after joins to be more uniform and close to configured batch_size.

One of the options is to use BatchSplitter or BatchCoalesce plan nodes after the join is called.
Another is to align the batches in the join internally providing the coalescer/splitter or having custom implementation.

korowa · 2025-01-24T05:32:38Z

I'd suggest to rename "splitting" part of the problem to "restricting" -- if join is able to produce a batch that needs to be splitted (event if this batch exists only internally), than it already may be issue, which may hurt on some specific cases. I also think that BatchSplitter in it's current implementation (when it already has a batch to split) is not solving the problem, but just tries to fix/hide it (in addition if these batches for splitting are large enough, to start causing memory issues, BatchSplitter doesn't seem to be able to help).

In this case (for splitting / restricting), I think, what @berkaysynnada suggests:

to make all join operators capable of performing both coalescing and splitting in a built-in manner

is a better fit than separate operators on top of join -- each join operator should by itself be able to limit / restrict its internally created record batches to prevent excessive accumulation of data in memory (or at least, if it's required, to track them via memory reservations).

comphead · 2025-01-24T17:06:24Z

thanks @korowa totally agree for the memory perspective, having splitter won't help as the memory already allocated for the batch.

However another path related to coalesce might help downstream nodes or direct consumer not to struggle because of swarm of small batches. More uniform method for all joins is to call CoalesceBatchExec just after the join execution however builtin approach might be more efficient

korowa · 2025-01-24T19:13:28Z

However another path related to coalesce might help downstream nodes or direct consumer not to struggle because of swarm of small batches

I don't have a strong opinion here -- intuitively it seems like embedding coalescer into filtering operators (not only joins) could be beneficial for query execution time just because there will be less operators in the pipeline, but it still should be checked and somehow measured.

I'll try to come up with some POC during this weekend for coalescer in e.g. FilterExec (this one just seems to be the easiest to implement) -- the idea is that if it'll work well for filters, than joins would also benefit from it, otherwise -- having separate operator would make more sense (at least for now).

comphead mentioned this issue Jan 22, 2025

Merge SortMergeJoin filtered batches into larger batches #14160

Merged

comphead mentioned this issue Jan 22, 2025

Add tests for filtered SortMergeJoin output size #14239

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DISCUSSION]: Unified approach for joins to output batches close to `batch_size` #14238

[DISCUSSION]: Unified approach for joins to output batches close to `batch_size` #14238

comphead commented Jan 22, 2025 •

edited

Loading

comphead commented Jan 22, 2025

korowa commented Jan 24, 2025 •

edited

Loading

comphead commented Jan 24, 2025

korowa commented Jan 24, 2025 •

edited

Loading

[DISCUSSION]: Unified approach for joins to output batches close to batch_size #14238

[DISCUSSION]: Unified approach for joins to output batches close to batch_size #14238

Comments

comphead commented Jan 22, 2025 • edited Loading

comphead commented Jan 22, 2025

korowa commented Jan 24, 2025 • edited Loading

comphead commented Jan 24, 2025

korowa commented Jan 24, 2025 • edited Loading

[DISCUSSION]: Unified approach for joins to output batches close to `batch_size` #14238

[DISCUSSION]: Unified approach for joins to output batches close to `batch_size` #14238

comphead commented Jan 22, 2025 •

edited

Loading

korowa commented Jan 24, 2025 •

edited

Loading

korowa commented Jan 24, 2025 •

edited

Loading