Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow data.frame row-binding comparison (Fix #50) #284

Merged
merged 6 commits into from
Apr 20, 2019

Conversation

billdenney
Copy link
Collaborator

Fix #50

@sfirke
Copy link
Owner

sfirke commented Apr 12, 2019

You are the best! I will review in the AM. If you have any other thoughts, additions, etc. for the 4/21 CRAN release let me know. I could send a PR with my adorn_cumulative() for you to review, per #238 - I've used it in my own work and found it acceptable.

@sfirke
Copy link
Owner

sfirke commented Apr 14, 2019

Bah I prob can't get to this in full until at least Monday afternoon, sorry. But I'm excited to get this merged for 1.2 submission this week!

One thing I wondered while at the park today and just tried out: I don't think this runs on a list. Which is a common use case for bind_rows, specifically the classic sequence to read similar files in a folder: list.files() %>% purrr::map(read_excel) %>% bind_rows() but then bind_rows fails because of dirty data and so the user would switch to compare_df_types() to diagnose - but I think it would choke b/c the input would be a list.

Do you think it would be worthwhile to allow it to take either several unquoted data.frames or a list of data.frames, like how dplyr::bind_rows() can take either? And if so how hard would that be to add?

cars2 <- mtcars %>% mutate(big_cyl = cyl * 100)
identical(bind_rows(mtcars, cars2), bind_rows(list(mtcars, cars2))) # both work
compare_df_types(list(mtcars, cars2)) ## Errors

@billdenney
Copy link
Collaborator Author

It took a bit to work through the complexity of the mixed inputs, but the updated PR works for me with list inputs.

Copy link
Owner

@sfirke sfirke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I did a formal review to better organize my thoughts, hope you don't mind. I think most of it is cosmetic, like naming etc. Let me know what you think. I am so excited to see this function in action, I think it fills a big hole in the data cleaning world and your programming of it is 💯

NEWS.md Show resolved Hide resolved
NEWS.md Outdated Show resolved Hide resolved
R/compare_df_types.R Outdated Show resolved Hide resolved
R/compare_df_types.R Outdated Show resolved Hide resolved
R/compare_df_types.R Outdated Show resolved Hide resolved
R/compare_df_types.R Outdated Show resolved Hide resolved
R/compare_df_types.R Outdated Show resolved Hide resolved
R/compare_df_types.R Outdated Show resolved Hide resolved
R/compare_df_types.R Outdated Show resolved Hide resolved
R/compare_df_types.R Outdated Show resolved Hide resolved
@billdenney
Copy link
Collaborator Author

I think that I got everything, but if I missed something, please let me know.

#' than the default method.
#'
#' @param x The object to describe
#' @param strict_description Should the
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Should the" - then this trails off

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what to put here. I think I'll merge it - it's the last thing holding it up! - then play with the function; it'll be a test of my understanding it, to be able to figure out what you were going to write here 😆

@sfirke
Copy link
Owner

sfirke commented Apr 18, 2019

I think this is 99.9% done and I am really stoked about the function. We could merge even w/o these last details if needed at this point but might as well address here if you're up for it. Thank you for this contribution and for being open to the feedback.

There are two comments I have above about documentation that will take <5 minutes to fix. And then the only other item from before is whether the default value of bind_method should be bind_rows() or rbind(). I am enough tidyverse-immersed that I care more about bind_rows, so would be inclined to put that one first as this is a tidyverse-adjacent package. But I'm glad we're offering both and could go either way. Do you use rbind or bind_rows, and what do you prefer as the default here?

@billdenney
Copy link
Collaborator Author

In my mind, the decision about default method is your call as it is a choice of what fits best with the package. For the documentation changes, please make them to clarify as you see fits best (I won’t be back to my computer before the planned release day).

@sfirke
Copy link
Owner

sfirke commented Apr 19, 2019 via email

sfirke added 2 commits April 19, 2019 16:59
swapping it in for rbind, since this is a tidyverse-aligned package and my quick poll shows more peers using dplyr::bind_rows
@sfirke sfirke merged commit 51aa5df into sfirke:master Apr 20, 2019
@sfirke
Copy link
Owner

sfirke commented Apr 20, 2019

This rules! 🎉

@billdenney billdenney deleted the compare_df_types branch March 7, 2022 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

function to compare data.frames that should be the same - variable names and col type consistency
2 participants