Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

only iterate over unique subset of icols #6630

Open
ben-schwen opened this issue Nov 30, 2024 · 3 comments
Open

only iterate over unique subset of icols #6630

ben-schwen opened this issue Nov 30, 2024 · 3 comments

Comments

@ben-schwen
Copy link
Member

ben-schwen commented Nov 30, 2024

          On first read I think "this can be doing a lot of duplicate work since it's basically `for (icol in icols) icol==icols`", but actually it only applies to the subset `icols[icols != xcols & i_merge_type %in% c("integer", "double", "complex")]`.

Originally posted by @MichaelChirico in #6603 (comment)

minimal example

x = data.table(a=1L)
y = data.table(c=1L, d=1)
y[x, on=.(c == a, d == a)]

In bmerge we iterate over icols with for (a in seq_along(icols), hence in this case we epxlore twice a=1 while once would be enough.

Corresponding bmerge line:

if (nrow(i)) for (a in seq_along(icols)) {

@TusharNaugain
Copy link

          On first read I think "this can be doing a lot of duplicate work since it's basically `for (icol in icols) icol==icols`", but actually it only applies to the subset `icols[icols != xcols & i_merge_type %in% c("integer", "double", "complex")]`.

Originally posted by @MichaelChirico in #6603 (comment)

minimal example

x = data.table(a=1L)
y = data.table(c=1L, d=1)
y[x, on=.(c == a, d == a)]

In bmerge we iterate over icols with for (a in seq_along(icols), hence in this case we epxlore twice a=1 while once would be enough.

Corresponding bmerge line:

if (nrow(i)) for (a in seq_along(icols)) {

Hey there,

I noticed that in the current implementation, we iterate over icols which might cause redundant processing. To optimize this, we can create a unique subset of icols that excludes elements from xcols and matches the required i_merge_type. This way, each element is processed only once.

Here's a code snippet to achieve this

const icols = [/* your array of columns /];
const xcols = [/
your array of columns to exclude /];
const i_merge_type = [/
your merge type */];
const uniqueIcols = icols.filter(icol => !xcols.includes(icol) && i_merge_type.includes(icol));
for (const a in uniqueIcols) {

}
By using filter, we ensure that the iteration is done over a unique subset, thus avoiding duplicate work. This should help streamline the processing in bmerge.

@Abhishek2634
Copy link

Hi @ben-schwen can we use this line: if (nrow(i)) for (a in seq_along(unique(icols)))
if its okay, can raise a PR for this ?

@MichaelChirico
Copy link
Member

I don't think it will be so easy -- icols and xcols are paired vectors, and that loop assumes we can pull icols[a] and xcols[a]. Something like cbind(icols, xcols)[!duplicated(icols), ] also won't work since it might drop some xcols that need to be checked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants