Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCUSSION]: Unified approach for joins to output batches close to batch_size #14238

Open
comphead opened this issue Jan 22, 2025 · 4 comments

Comments

@comphead
Copy link
Contributor

comphead commented Jan 22, 2025

          `BatchCoalescer` is not used in joins yet, since CoalesceBatchesExec appears after the joins having filter, in case of the output batches might have a  lower row count than target batch size. So, why cannot we follow the same pattern in SMJ? If collecting batches in the join itself is more performant, then we should also refactor the other joins as well?

On the other hand, BatchSplitter is used in other joins, and SMJ could (should) have it too, as there is no other way of splitting the batches according to target batch size.

I've thought about this, and I believe the most optimal solution is to make all join operators capable of performing both coalescing and splitting in a built-in manner. This is because the output of a join can either be smaller or larger than the target batch size. Ideally, there should be no need (or only minimal need) for CoalesceBatchesExec.

To achieve this built-in coalescing and splitting, we can leverage existing tools like BatchSplitter and BatchCoalescer (although there are no current examples of BatchCoalescer being used in joins). My suggestion is to generalize these tools so they can be utilized by any operator and applied wherever this mechanism is needed. As this pattern becomes more common, it will be easier to expand its usage and simplify its application.

Originally posted by @berkaysynnada in #14160 (comment)

@comphead
Copy link
Contributor Author

The direction proposed by @berkaysynnada is worth to discuss. The join specifics doesn't guarantee output batch size in records. It can much much smaller or even empty because of filtering, and it can be much larger because of join explosions.

The idea to discuss how we can make the output batches after joins to be more uniform and close to configured batch_size.

One of the options is to use BatchSplitter or BatchCoalesce plan nodes after the join is called.
Another is to align the batches in the join internally providing the coalescer/splitter or having custom implementation.

@korowa
Copy link
Contributor

korowa commented Jan 24, 2025

I'd suggest to rename "splitting" part of the problem to "restricting" -- if join is able to produce a batch that needs to be splitted (event if this batch exists only internally), than it already may be issue, which may hurt on some specific cases. I also think that BatchSplitter in it's current implementation (when it already has a batch to split) is not solving the problem, but just tries to fix/hide it (in addition if these batches for splitting are large enough, to start causing memory issues, BatchSplitter doesn't seem to be able to help).

In this case (for splitting / restricting), I think, what @berkaysynnada suggests:

to make all join operators capable of performing both coalescing and splitting in a built-in manner

is a better fit than separate operators on top of join -- each join operator should by itself be able to limit / restrict its internally created record batches to prevent excessive accumulation of data in memory (or at least, if it's required, to track them via memory reservations).

@comphead
Copy link
Contributor Author

thanks @korowa totally agree for the memory perspective, having splitter won't help as the memory already allocated for the batch.

However another path related to coalesce might help downstream nodes or direct consumer not to struggle because of swarm of small batches. More uniform method for all joins is to call CoalesceBatchExec just after the join execution however builtin approach might be more efficient

@korowa
Copy link
Contributor

korowa commented Jan 24, 2025

However another path related to coalesce might help downstream nodes or direct consumer not to struggle because of swarm of small batches

I don't have a strong opinion here -- intuitively it seems like embedding coalescer into filtering operators (not only joins) could be beneficial for query execution time just because there will be less operators in the pipeline, but it still should be checked and somehow measured.

I'll try to come up with some POC during this weekend for coalescer in e.g. FilterExec (this one just seems to be the easiest to implement) -- the idea is that if it'll work well for filters, than joins would also benefit from it, otherwise -- having separate operator would make more sense (at least for now).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants