-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: join_many_to_one() and join_one_to_one() #427
Comments
@sfirke , If I make a PR for this, do you think it would be of interest? (And no worries if you think it's out of scope.) |
Sorry I missed this the first time. Can you explain a little more? I'm reading that as a wrapper for |
You've effectively got it. For In my personal package, I have a more complex |
If you don't put this in janitor, I'd love to see your personal code. I also run into these questions after a join. Looking at the row numbers before and after tells you something, but this sounds like it's much more specific. |
@jzadra , I just made a PR for them. Yes, the personal code is much more extensive in its testing. It creates a row number column for each of the input data.frames and then checks that the output row numbering is as-expected for missingness, uniqueness, etc. |
@billdenney Sounds great. It would be cool if it also provided a summary of which variables were used in the join match that ended up producing multiple rows (and showing the value of those variables), or that didn't have a match and resulted in NAs for the variables from the table being added. Happy to help with this, I think I'd use it a lot. |
@jzadra , It does that summary reporting, too. 😄 |
@billdenney Sweet! Is your personal package public? |
It's kinda, sorta public. It is public in that it's on GitHub. You're welcome to check it out (it's my bsd.report repo). But, it should not be relied upon to be stable. I change things in it all the time and do not consider the API to be stable at all. And, I'm aware of the parts of it that are well-thought-through and tested vs the ones that are one-off hacks that got some job done for a given day. |
Any reason why no suggesting this for the dplyr package? I think many people have similar issues when joining, and I prefer the idea of having all join functions in the same package - mentally easier to remember that one package does everything related to a topic rather than splitting across several. I also think that you join_control is the better selling point than the join_one_to_one(). Perhaps the latter convenience wrapper is better suited for jantitor? |
@sda030 , That's a good question. I thought I had proposed something like this in dplyr in the past, but I had not. I'll put a feeler out over there to see if it would be of interest for dplyr. |
@sda030 , And I've now added an issue over there to see if it would be of interest. |
I often need to perform join operations which confirm that the join is either 1:1 or many:1. I have written functions for this in my personal package, but I think that they may be of general interest. That said, they are a bit of a stretch for normal
janitor
functionality, so if you don't think it's a good fit, no worries.The text was updated successfully, but these errors were encountered: