Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New function merge_camtrapdp() #112

Merged
merged 147 commits into from
Nov 21, 2024
Merged

New function merge_camtrapdp() #112

merged 147 commits into from
Nov 21, 2024

Conversation

sannegovaert
Copy link
Member

@sannegovaert sannegovaert commented Jul 25, 2024

fix #75

Remarks

  • Assume that locationID's and individualID's must not be unique as the same location or individual can be used in different data packages.
  • Assume that duplicatesd ID's are between packages, not within.
  • If duplicates are present, all values of that identifier get a prefix.

Helper functions
I created 6 helper functions. Maybe they can be simplified or reduced in number!
They now live at utils.R.

function functionality in merge_camtrapdp() suggestion
check_duplicate_ids() add prefix to identifiers with duplicates helper function @ utils.R
add_prefix() add prefix to identifiers with duplicates helper function @ utils.R
normalize_list() remove duplicated elements of lists helper function @ utils.R or separate file.R
is_subset() remove duplicated elements of lists helper function @ utils.R or separate file.R
update_unique() remove duplicated elements of lists helper function @ utils.R or separate file.R
remove_duplicates() remove duplicated elements of lists helper function @ utils.R or separate file.R

Copy link

codecov bot commented Jul 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.91%. Comparing base (4e52a04) to head (722b71f).
Report is 149 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #112      +/-   ##
==========================================
+ Coverage   99.89%   99.91%   +0.01%     
==========================================
  Files          23       25       +2     
  Lines         983     1151     +168     
==========================================
+ Hits          982     1150     +168     
  Misses          1        1              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@PietrH
Copy link
Member

PietrH commented Nov 21, 2024

@peterdesmet : 0f9a28a, why remove the subdir in tempdir()? This guarantees that there are no filename collisions in tempdir() which is shared over the whole R session.

  • Avoids accidental collisions (usage of same filename) between tests
  • Avoids collisions with developer writing to tempdir()

Much less an issue for CI, but still, I'm interested to see what negatives you see

@peterdesmet
Copy link
Member

@PietrH regarding removing subdir

  • I assumed that on.exit(unlink(temp_dir, recursive = TRUE)) clears everything from the tempdir after a test anyway? So collisions would only happen if a developer did not copy that line
  • The same subdir was used between tests, so collisions could happen anyway?

My reasoning was mainly: only create subdirs if these is necessary within a test.

Co-Authored-By: Pieter Huybrechts <[email protected]>
@PietrH
Copy link
Member

PietrH commented Nov 21, 2024

@PietrH regarding removing subdir

* I assumed that `on.exit(unlink(temp_dir, recursive = TRUE))` clears everything from the tempdir after a test anyway? So collisions would only happen if a developer did not copy that line

* The same subdir was used between tests, so collisions could happen anyway?

My reasoning was mainly: only create subdirs if these is necessary within a test.

Fair enough, I prefer leaving the clearing of the tempdir() up to the OS, and using withr if I really need to reset state. Since you could still leave tempdir() intact when a function fails before unlink() is called. withr really shines for things like environmental variables and (re)setting random seeds. I don't think it's use is warranted here. My curiosity is satisfied!

@peterdesmet peterdesmet requested a review from PietrH November 21, 2024 15:06
Copy link
Member

@PietrH PietrH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only checked my previous remarks. Well done Sanne! This wasn't an easy development 👍

@peterdesmet peterdesmet merged commit 4b4ff1a into main Nov 21, 2024
9 checks passed
@peterdesmet peterdesmet deleted the merge_datasets branch November 21, 2024 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create merge() to merge datasets
3 participants