-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build_taxonomy()
should check for scientificnames that occur more than once
#49
build_taxonomy()
should check for scientificnames that occur more than once
#49
Conversation
…or returning first duplicate also set the expectation for the warning class
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #49 +/- ##
==========================================
+ Coverage 93.18% 93.81% +0.63%
==========================================
Files 11 11
Lines 88 97 +9
==========================================
+ Hits 82 91 +9
Misses 6 6 ☔ View full report in Codecov by Sentry. |
based on janitor::remove_empty()
@peterdesmet I'm currently causing 4 warnings when testing how withr::with_options() is my first reflex, but I was wondering if there is a way to do this without using an extra (dev) dependency. |
@PietrH after looking for some alternatives in frictionless, I decided to keep using |
Ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
- I have the warning, it now lists the duplicates.
- I would indeed not mention this in the docs (it's an edge case)
- Nice that cols with only NA are removed. These could not only be the result of removed duplicates, so have updated that test
I have removed (currently 3) instances where we test on the error message in addition to the error class. From now only, only on class
is sufficient (note that a different approach is used in frictionless).
build_taxonomy()
didn't take into account that a species might be mentioned twice inx$taxonomic
, so we'll warn users when this happens and only use the first one.scientificName
collisionsThe function now drops columns that only contain NA from the output, this can be an artefact from filtering out the duplicate scientificNames. I was inspired by
janitor::remove_empty()
but didn't create a helper as it was only a few lines and I wasn't sure about reuse. Wouldbuild_taxonomy() be more readable if this part was wrapped into a helper?
Should we mention the behaviour from this PR in the function documentation?, if it's only expected to happen very rarely, it might add more confusion than clarity.
I welcome all pointers into improving readability!