`build_taxonomy()` should check for scientificnames that occur more than once #49

PietrH · 2024-03-25T11:32:31Z

build_taxonomy() didn't take into account that a species might be mentioned twice in x$taxonomic, so we'll warn users when this happens and only use the first one.

The warning doesn't specify what species is duplicated, this might be a nice to get
We are only checking on scientificName collisions
If some fields are provided in the first record, and some in the second, only the one's from the first record will be retained.

The function now drops columns that only contain NA from the output, this can be an artefact from filtering out the duplicate scientificNames. I was inspired by janitor::remove_empty() but didn't create a helper as it was only a few lines and I wasn't sure about reuse. Would build_taxonomy() be more readable if this part was wrapped into a helper?

Should we mention the behaviour from this PR in the function documentation?, if it's only expected to happen very rarely, it might add more confusion than clarity.

I welcome all pointers into improving readability!

…or returning first duplicate also set the expectation for the warning class

codecov · 2024-03-25T11:36:39Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.81%. Comparing base (a6f768c) to head (86f51bd).
Report is 5 commits behind head on main.

❗ Current head 86f51bd differs from pull request most recent head 3749320. Consider uploading reports for the commit 3749320 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #49      +/-   ##
==========================================
+ Coverage   93.18%   93.81%   +0.63%     
==========================================
  Files          11       11              
  Lines          88       97       +9     
==========================================
+ Hits           82       91       +9     
  Misses          6        6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

based on janitor::remove_empty()

PietrH · 2024-03-25T14:21:00Z

@peterdesmet I'm currently causing 4 warnings when testing how build_taxonomy() handles situations where there's multiple scientificNames provided in x$taxonomic. How do I silence these warnings during testing?

withr::with_options() is my first reflex, but I was wondering if there is a way to do this without using an extra (dev) dependency.

peterdesmet · 2024-03-25T14:50:55Z

@PietrH after looking for some alternatives in frictionless, I decided to keep using suppressWarnings() around the function. It's easy to understand and doesn't add a dependency.

PietrH · 2024-04-11T08:43:31Z

Ready for review.

peterdesmet

Nice!

I have the warning, it now lists the duplicates.
I would indeed not mention this in the docs (it's an edge case)
Nice that cols with only NA are removed. These could not only be the result of removed duplicates, so have updated that test

I have removed (currently 3) instances where we test on the error message in addition to the error class. From now only, only on class is sufficient (note that a different approach is used in frictionless).

PietrH added 6 commits March 25, 2024 12:13

add test to warn for duplicate scientificNames in x$taxonomic

e914822

add expectation for placeholder warning message + create empty test f…

5f112b8

…or returning first duplicate also set the expectation for the warning class

Update expected warning message

a8276cc

store created data.frame as object

1418741

use cli to create warning, base for duplicate detection

45511b5

return at end of function, drop duplicates, keep all columns

9d0e676

PietrH linked an issue Mar 25, 2024 that may be closed by this pull request

build_taxonomy() should check for scientificNames that occur more than once #45

Closed

PietrH added 9 commits March 25, 2024 14:54

store data.frame in object so we can drop columns later

83e2c29

drop empty columns, explicit return for clarity

c7f8eba

based on janitor::remove_empty()

test for repeated scientificNames

4565ada

add empty test for outputting empty columns

857c946

Stylr

44685dd

Test that build_taxonomy() doesn't return NA

4ecf89a

Capitalize comments

d8861a8

Add comment about expectation

1293f98

add expectation for exact columns we want to return

86f51bd

PietrH self-assigned this Mar 25, 2024

PietrH marked this pull request as ready for review March 25, 2024 14:28

PietrH requested review from peterdesmet and damianooldoni March 25, 2024 14:28

suppressWarnings()

3749320

peterdesmet added 3 commits April 25, 2024 13:58

Update warning message and list duplicate names

b11df22

Simplify tests + make empty columns more general

ba20518

Don't test on error message (just on class)

14aacdc

peterdesmet approved these changes Apr 25, 2024

View reviewed changes

peterdesmet merged commit c62c3ce into main Apr 25, 2024
7 checks passed

peterdesmet deleted the 45-build_taxonomy-should-check-for-scientificnames-that-occur-more-than-once branch April 25, 2024 12:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`build_taxonomy()` should check for scientificnames that occur more than once #49

`build_taxonomy()` should check for scientificnames that occur more than once #49

PietrH commented Mar 25, 2024 •

edited

Loading

codecov bot commented Mar 25, 2024 •

edited

Loading

PietrH commented Mar 25, 2024

peterdesmet commented Mar 25, 2024

PietrH commented Apr 11, 2024

peterdesmet left a comment

build_taxonomy() should check for scientificnames that occur more than once #49

build_taxonomy() should check for scientificnames that occur more than once #49

Conversation

PietrH commented Mar 25, 2024 • edited Loading

codecov bot commented Mar 25, 2024 • edited Loading

Codecov Report

PietrH commented Mar 25, 2024

peterdesmet commented Mar 25, 2024

PietrH commented Apr 11, 2024

peterdesmet left a comment

Choose a reason for hiding this comment

`build_taxonomy()` should check for scientificnames that occur more than once #49

`build_taxonomy()` should check for scientificnames that occur more than once #49

PietrH commented Mar 25, 2024 •

edited

Loading

codecov bot commented Mar 25, 2024 •

edited

Loading