Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build_taxonomy() should check for scientificnames that occur more than once #49

Conversation

PietrH
Copy link
Member

@PietrH PietrH commented Mar 25, 2024

build_taxonomy() didn't take into account that a species might be mentioned twice in x$taxonomic, so we'll warn users when this happens and only use the first one.

  • The warning doesn't specify what species is duplicated, this might be a nice to get
  • We are only checking on scientificName collisions
  • If some fields are provided in the first record, and some in the second, only the one's from the first record will be retained.

The function now drops columns that only contain NA from the output, this can be an artefact from filtering out the duplicate scientificNames. I was inspired by janitor::remove_empty() but didn't create a helper as it was only a few lines and I wasn't sure about reuse. Would build_taxonomy() be more readable if this part was wrapped into a helper?

Should we mention the behaviour from this PR in the function documentation?, if it's only expected to happen very rarely, it might add more confusion than clarity.

I welcome all pointers into improving readability!

Copy link

codecov bot commented Mar 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.81%. Comparing base (a6f768c) to head (86f51bd).
Report is 5 commits behind head on main.

❗ Current head 86f51bd differs from pull request most recent head 3749320. Consider uploading reports for the commit 3749320 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #49      +/-   ##
==========================================
+ Coverage   93.18%   93.81%   +0.63%     
==========================================
  Files          11       11              
  Lines          88       97       +9     
==========================================
+ Hits           82       91       +9     
  Misses          6        6              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@PietrH
Copy link
Member Author

PietrH commented Mar 25, 2024

@peterdesmet I'm currently causing 4 warnings when testing how build_taxonomy() handles situations where there's multiple scientificNames provided in x$taxonomic. How do I silence these warnings during testing?

withr::with_options() is my first reflex, but I was wondering if there is a way to do this without using an extra (dev) dependency.

@PietrH PietrH self-assigned this Mar 25, 2024
@PietrH PietrH marked this pull request as ready for review March 25, 2024 14:28
@peterdesmet
Copy link
Member

@PietrH after looking for some alternatives in frictionless, I decided to keep using suppressWarnings() around the function. It's easy to understand and doesn't add a dependency.

@PietrH
Copy link
Member Author

PietrH commented Apr 11, 2024

Ready for review.

Copy link
Member

@peterdesmet peterdesmet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

  • I have the warning, it now lists the duplicates.
  • I would indeed not mention this in the docs (it's an edge case)
  • Nice that cols with only NA are removed. These could not only be the result of removed duplicates, so have updated that test

I have removed (currently 3) instances where we test on the error message in addition to the error class. From now only, only on class is sufficient (note that a different approach is used in frictionless).

@peterdesmet peterdesmet merged commit c62c3ce into main Apr 25, 2024
7 checks passed
@peterdesmet peterdesmet deleted the 45-build_taxonomy-should-check-for-scientificnames-that-occur-more-than-once branch April 25, 2024 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

build_taxonomy() should check for scientificNames that occur more than once
2 participants