Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRG: fix performance regression in manysearch by removing unnecessary downsampling #464

Merged
merged 10 commits into from
Oct 8, 2024

Conversation

ctb
Copy link
Collaborator

@ctb ctb commented Oct 2, 2024

Tackles #463

This PR adjusts manysearch so that downsampling is only done on sketches when actually needed. Prior to this, the downsample_max_hash/downsample_scaled code was running on sketches even when there was no need.

This PR also adds --ignore-abundance to manysearch which optionally turns off potentially expensive abundance estimation (which in practice is not that expensive, apparently; see #463).

Fixes #466

@ctb ctb changed the title WIP: add support for ignoring abundance in manysearch WIP: fix performance regression in manysearch Oct 6, 2024
@ctb ctb changed the title WIP: fix performance regression in manysearch MRG: fix performance regression in manysearch by removing unnecessary downsampling Oct 6, 2024
@ctb
Copy link
Collaborator Author

ctb commented Oct 6, 2024

@bluegenes ready for review & merge!

@ctb
Copy link
Collaborator Author

ctb commented Oct 7, 2024

bump :)

// avoid calculating details unless there is overlap
let overlap = query
.minhash
.count_common(against_mh, true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, count_common handles downsampling only if needed. Didn't realized downsample_scaled always downsampled, even if not needed!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep!

Copy link
Contributor

@bluegenes bluegenes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ctb ctb merged commit 88e406f into main Oct 8, 2024
1 check passed
@ctb ctb deleted the toggle_manysearch_abund branch October 8, 2024 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

refactor nested code?
2 participants