Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MISC] Extend insert_iterator #237

Merged
merged 4 commits into from
Oct 24, 2024
Merged

[MISC] Extend insert_iterator #237

merged 4 commits into from
Oct 24, 2024

Conversation

eseiler
Copy link
Member

@eseiler eseiler commented Oct 18, 2024

For our compute_sketches benchmark, this is around 10-14 times faster.

We could also add an overload for the ibf, which then just calls emplace.
This could be used in insert_into_ibf and the IBF ctor, where we currently first put the hashes in a set or vector before inserting.

Numbers are iterations.
Clang apparently optimizes way better than GCC.
Clang with inline = remove [[gnu::noinline]] from void invoke_without_inlining

Benchmark GCC before GCC Clang before Clang Clang with inline
manual_construct/64 32 32 44 54 41
manual_construct/128 34 34 52 54 42
manual_construct/256 34 34 54 54 44
manual_construct/512 33 33 51 54 44
manual_construct/1024 34 28 52 54 44
config_construct/64 8 21 10 28 14
config_construct/128 9 20 11 30 16
config_construct/256 9 20 11 30 16
config_construct/512 9 18 11 29 16
config_construct/1024 9 16 12 26 15
config_and_max_construct/64 13 27 17 39 23
config_and_max_construct/128 13 28 17 39 23
config_and_max_construct/256 13 28 18 40 23
config_and_max_construct/512 11 28 17 40 24
config_and_max_construct/1024 13 27 18 40 24

@seqan-actions seqan-actions added lint [INTERNAL] used for linting and removed lint [INTERNAL] used for linting labels Oct 18, 2024
@seqan-actions
Copy link
Member

Documentation preview available at https://docs.seqan.de/preview/seqan/hibf/237

Copy link

codecov bot commented Oct 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.58%. Comparing base (6e1fd5a) to head (76202c1).
Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #237      +/-   ##
==========================================
- Coverage   99.63%   99.58%   -0.05%     
==========================================
  Files          51       51              
  Lines        1929     1951      +22     
  Branches        5        5              
==========================================
+ Hits         1922     1943      +21     
- Misses          7        8       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@seqan-actions seqan-actions added lint [INTERNAL] used for linting and removed lint [INTERNAL] used for linting labels Oct 21, 2024
@seqan-actions seqan-actions added lint [INTERNAL] used for linting and removed lint [INTERNAL] used for linting labels Oct 23, 2024
@seqan-actions seqan-actions added lint [INTERNAL] used for linting and removed lint [INTERNAL] used for linting labels Oct 23, 2024
@eseiler eseiler changed the title [WIP] Extend insert_iterator [MISC] Extend insert_iterator Oct 23, 2024
@seqan-actions seqan-actions added lint [INTERNAL] used for linting and removed lint [INTERNAL] used for linting labels Oct 23, 2024
@seqan-actions seqan-actions added lint [INTERNAL] used for linting and removed lint [INTERNAL] used for linting labels Oct 23, 2024
@seqan-actions seqan-actions added lint [INTERNAL] used for linting and removed lint [INTERNAL] used for linting labels Oct 23, 2024
@seqan-actions seqan-actions added lint [INTERNAL] used for linting and removed lint [INTERNAL] used for linting labels Oct 23, 2024
@seqan-actions seqan-actions added lint [INTERNAL] used for linting and removed lint [INTERNAL] used for linting labels Oct 23, 2024
@seqan-actions seqan-actions added lint [INTERNAL] used for linting and removed lint [INTERNAL] used for linting labels Oct 23, 2024
@seqan-actions seqan-actions added lint [INTERNAL] used for linting and removed lint [INTERNAL] used for linting labels Oct 23, 2024
Copy link
Member

@smehringer smehringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sorry! one more thing: When moving to a fast layout or partitioned IBF I need to compute not only hyperloglog sketches but at the same time minHash sketches. Before I had the kmer set and then computed the respective sketches from that. Now I could use the function type I guess? Or I could still use the set.

(min hash sketches is a small heap that gets updated with every new kmer for the XXX smallest kmers in the set).

explicit constexpr insert_iterator(function_t & fun) : ptr{std::addressof(fun)}, type{data_type::function}
{}

[[gnu::always_inline, gnu::flatten]] constexpr insert_iterator & operator=(uint64_t const value) noexcept
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can this function be constexpr? The type in the switch case is not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type would be known in a constant expression, but what isn't allowed in constexpr is casting from void *.
Made it inline instead.

@seqan-actions seqan-actions added lint [INTERNAL] used for linting and removed lint [INTERNAL] used for linting labels Oct 24, 2024
@eseiler
Copy link
Member Author

eseiler commented Oct 24, 2024

ah sorry! one more thing: When moving to a fast layout or partitioned IBF I need to compute not only hyperloglog sketches but at the same time minHash sketches. Before I had the kmer set and then computed the respective sketches from that. Now I could use the function type I guess? Or I could still use the set.

(min hash sketches is a small heap that gets updated with every new kmer for the XXX smallest kmers in the set).

Didn't really get faster when using a function.
The main thing is that you cannot get rid of kmers, because you still need it.

@smehringer smehringer merged commit 9e631fd into seqan:main Oct 24, 2024
37 checks passed
@eseiler eseiler deleted the infra/sketchy branch October 24, 2024 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants