-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX] Include relaxed FPR into the DP algorithm. #224
Conversation
Documentation preview available at https://docs.seqan.de/preview/seqan/hibf/224 |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #224 +/- ##
==========================================
- Coverage 99.68% 99.63% -0.06%
==========================================
Files 49 50 +1
Lines 1914 1911 -3
Branches 5 5
==========================================
- Hits 1908 1904 -4
- Misses 6 7 +1 ☔ View full report in Codecov by Sentry. |
std::vector<seqan::hibf::layout::layout::max_bin> expected_max_bins{{{1}, 22}, {{2}, 22}}; | ||
std::vector<seqan::hibf::layout::layout::max_bin> expected_max_bins{{{2}, 22}, {{3}, 0}}; | ||
|
||
std::vector<seqan::hibf::layout::layout::user_bin> expected_user_bins{{{}, 0, 1, 7}, | ||
{{1}, 0, 22, 4}, | ||
{{1}, 22, 21, 5}, | ||
{{1}, 43, 21, 6}, | ||
{{2}, 0, 22, 0}, | ||
{{2}, 22, 21, 2}, | ||
{{2}, 43, 21, 3}, | ||
{{}, 3, 1, 1}}; | ||
{{}, 1, 1, 6}, | ||
{{2}, 0, 22, 3}, | ||
{{2}, 22, 21, 4}, | ||
{{2}, 43, 21, 5}, | ||
{{3}, 0, 42, 1}, | ||
{{3}, 42, 11, 0}, | ||
{{3}, 53, 11, 2}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Input: (no union estimation enabled)
user bin ID | 0 1 2 3 4 5 6 7
size | 500, 1000, 500, 500, 500, 500, 500, 500
Before:
Top Level | 7 | 4,5,6 | 1 | 0,2,3 |
/ \
1st Level | 22x4 | 21x5 | 21x6 | | 22x0 | 21x2 | 21x3 |
After
Top Level | 7 | 6 | 3,4,5 | 0,1,2 |
/ \
1st Level | 22x3 | 21x4 | 21x5 | | 11x0 | 42x1 | 11x2 |
Afterwards it's better to merge the large UB-1. Since merging 0,1,2 (500+500+1000) corrected by the relaxed false positive rate (0.3 vs. 0.05) is smaller than having UB-1 with a size of 1000 at the top level as a single bin. This result is intuitive for the change.
bdb53fe
to
616c282
Compare
Otherwise, returns double, and may cause integer-to-double promotion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a review, check it out
double const numerator = | ||
std::log1p(-std::exp(std::log(config_.maximum_fpr) / config_.number_of_hash_functions)); | ||
double const denominator = | ||
std::log1p(-std::exp(std::log(config_.relaxed_fpr) / config_.number_of_hash_functions)); | ||
relaxed_fpr_correction_factor = numerator / denominator; | ||
assert(relaxed_fpr_correction_factor <= 1.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would at least test if the variable is non-zero, no?
I don't really like that the members must be st outside but are used only here..
LGTM Follow Up: 1:
|
This PR makes the DP aware of the relaxed false positive rate by down-scaling the size of merged bins.