-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doc fix? Clarify behavior of unequal joins/on
binary operators
#6623
Comments
on
binary operators
possibly related to non-equi join naming #6298 (comment) |
Is the behavior I'm describing expected? (It seems so from the existing docs, but seeing it possibly related to bugs, maybe not.) If so, would you accept a PR for the doc fixes I've described? |
I currently see these docs on ?data.table
I understand that your concern is whether expressions can be used with binary operators as in the last bullet point of the docs above. Would the following addition answer your question?
how about the addition below?
Please file a PR, it would be appreciated.
I believe that is the same as the non-equi join naming issue, right? #6298 (comment) please check doc changes suggested here https://github.com/Rdatatable/data.table/pull/3093/files#diff-810e8fb9987626f409e07bd5832078a942e5f89ea222c228a6b923b7dab41817 and tell me if that addresses your issue? See also what is written in this section of the new join vignette https://rdatatable.gitlab.io/data.table/articles/datatable-joins.html#operations-after-joining which I believe documents how the joined columns are named. (maybe we should add a link from ?data.table man page to join vignette?) |
Yes, that would clarify! Yes, the "old behavior" mentioned in the linked PR is exactly what I found confusing. I think this is indeed the "non-equi join naming issue". Even if this behavior will soon be phased out, I think it would still be nice to explain the current behavior clearly in user-facing documentation. I think the vignette still does not quite fully demonstrate or explain this behavior. Putting together "By default all columns are taking their source from the the x table," from the Operations after Joining section and "it returns all rows from the i table" from the Non equi joins section (and that the date in the joined table is Reading that section did help me simplify my solution:
But I think I still would really tricky to figure out the |
Summary
I think there may be a documentation gap specifically with
on
and binary operators. (Or maybe there's a vignette somewhere that I'm missing?)Details
Specifically, it wasn't clear to me from the documentation:
i
.All of these diverge from how SQL
ON
works so IMO worth pointing out to the user.What Currently Exists
In attempt to understand the proper usage of
on
, I read theon
section of the data table help and skimmed the linked Secondary indices and auto indexing looking for more explanation of non-equi joins.After I resolved the problem, I did see non-equal joins are covered here: https://rdatatable.gitlab.io/data.table/articles/datatable-joins.html#non-equi-join However, the example doesn't make the three points above terribly clear as it has columns with the same name on either side of the join. Although, they are certainly shown implicitly.
Example
Here is a slightly simplified version of my use-case. Trying to join on y and foo such that
foo - 2 < y < foo
.Suggested Remedy
data.table(
help page bullet point underon
that describes non-eq joins to the Non-equi Joins section of the Joins vignetteThe text was updated successfully, but these errors were encountered: