Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error messages from merge() somewhat confusingly "swap" 'x' and 'i' prefixes #6641

Open
MichaelChirico opened this issue Dec 9, 2024 · 4 comments
Assignees

Comments

@MichaelChirico
Copy link
Member

DT1=data.table(a=factor('a'))
DT2=data.table(a=1L)
merge(DT1, DT2, by='a')
# Error in bmerge(i, x, leftcols, rightcols, roll, rollends, nomatch, mult,  : 
#   Incompatible join types: x.a (integer) and i.a (factor). Factor columns must join to factor or character columns.

Note that the x. prefix applies to DT2, while the i. prefix applies to DT1, while the user provided them in order (DT1, DT2).

At root is that under the hood, merge(x,y, ...) is constructed as a join with y[x, ...].

It would be most consistent with users' expectations of merge() if the prefixes were DT1 : x., DT2: y., or perhaps even better if the errors matched suffixes=. A really subtle (and back-incompatible) way to thread the needle here would be to change the merge.data.table() defaults to be suffixes=c(".i", ".x").

A different approach (and probably the most practical at this point) is just to emphasize this consideration in the docs.

@rikivillalba
Copy link
Contributor

rikivillalba commented Dec 9, 2024

Perhaps one can raise an errorCondition with a custom 'call' argument such that 'bmerge(i, x, leftcols,...' becomes 'bmerge(i = DT2, x = DT1, leftcols,...' though not standard

@MichaelChirico
Copy link
Member Author

Great idea! I think we can populate custom attributes of the error object raised on the custom class and use those more reliably & transparently than messing with the call.

@Abhishek2634
Copy link

Abhishek2634 commented Dec 18, 2024

Hi @MichaelChirico, I would like to work on this issue. can you assign me this issue, As i have figure out what to do.
in file bmerge.R, line no. 81 is the error message stopf("Incompatible join types: %s (%s) and %s (%s). Factor columns must join to factor or character columns.", xname, x_merge_type, iname, i_merge_type). i have change this to stopf("Incompatible join types: %s (%s) and %s (%s). Factor columns must join to factor or character columns.", iname, i_merge_type, xname, x_merge_type) and its working fine with output error stating factor if D1 is factor type, and D2 as integer type resp.

for reference,
DT1=data.table(a=factor('a'))
DT2=data.table(a=1L)
merge(DT1, DT2, by='a')

its error output will be: Incompatible join types: i.a (factor) and x.a (integer). Factor columns must join to factor or character columns.

@MichaelChirico
Copy link
Member Author

Thanks @Abhishek2634, the issue with your proposed fix is it's incorrect for the "primary" merge interface, namely DT1[DT2, on='a'].

The root issue here is that merge(DT1, DT2, by='a') eventually runs DT2[DT1, on='a'], i.e., in the reverse order of the arguments to merge(). And moreover, DT2[DT1, on='a'] uses x prefix to refer to the "outer" table, i prefix for the inner table (matching the names of the arguments to [), whereas the merge() arguments are named x and y, respectively.

The goal of this issue is to match the output to users' expectations in both cases (joins with x[i] and merge(x, y)).

It will probably require some understanding of condition objects in R:

https://adv-r.hadley.nz/conditions.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants