[FIX] Actually reduce memory usage #192

eseiler · 2024-01-26T16:18:18Z

The capacity remains the same after calling clear.

I also renamed kmers to local_kmers in loop_over_children because it might be shadowing kmers.

vercel · 2024-01-26T16:18:22Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated (UTC)
hibf	✅ Ready (Inspect)	Visit Preview	Jan 26, 2024 4:24pm

codecov · 2024-01-26T16:22:25Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.63%. Comparing base (dbbfb3d) to head (abad908).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #192   +/-   ##
=======================================
  Coverage   99.63%   99.63%           
=======================================
  Files          51       51           
  Lines        1930     1930           
  Branches        5        5           
=======================================
  Hits         1923     1923           
  Misses          7        7

Flag	Coverage Δ
	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

smehringer · 2024-01-26T18:33:41Z

I thought we did this deliberately because it is faster if kmers doesn't have to reallocate memory Everytime?

eseiler · 2024-01-27T09:04:14Z

Usually yes, in this particular case we wanted to reduce the memory consumption, because we do the recursion afterwards. We didn't want to keep the kmers around, because they are not needed anymore. And clear doesn't deallocate.

I can also run raptor with and without that change and check how it affects timings and ram.

smehringer · 2024-02-29T08:19:23Z

I think this change needs at least one benchmark on refseq to check if we don't downgrade the performance too much.

eseiler · 2024-02-29T09:09:40Z

I think this change needs at least one benchmark on refseq to check if we don't downgrade the performance too much.

Yes, I'll run refseq with and without. I'd figure this is more or less i/o limited, but let's see :)

seqan-actions · 2024-10-07T12:09:57Z

Documentation preview available at https://docs.seqan.de/preview/seqan/hibf/192

smehringer · 2024-10-11T08:38:07Z

on 40k Refseq genomes with tmax 256 there is no change in RAM usage with this patch

eseiler · 2024-10-11T08:55:53Z

on 40k Refseq genomes with tmax 256 there is no change in RAM usage with this patch

I guess that makes sense because kmers only holds the kmers of the maximum bin (i. e. a single bin)

Then the only (probably) alternative is, to read files multiple times

smehringer · 2024-10-11T11:39:49Z

or use less threads :D

eseiler · 2024-10-11T11:49:54Z

For this PR, we could also think about some refactoring.

For example, we could do something like

auto & ibf = hibf.ibf_vector[ibf_pos];

{
    robin_hood::unordered_flat_set<uint64_t> kmers{};

    auto initialise_max_bin_kmers = [&]() -> size_t
    {
        if (current_node.max_bin_is_merged())
        {
            // recursively initialize favourite child first
            technical_bin_to_ibf_id[current_node.max_bin_index] =
                hierarchical_build(hibf,
                                    kmers,
                                    current_node.children[current_node.favourite_child_idx.value()],
                                    data,
                                    false);
            return 1;
        }
        else // max bin is not a merged bin
        {
            // we assume that the max record is at the beginning of the list of remaining records.
            auto const & record = current_node.remaining_records[0];
            build::compute_kmers(kmers, data, record);
            build::update_user_bins(technical_bin_to_user_bin_id, record);
            return record.number_of_technical_bins;
        }
    };
    // initialize lower level IBF
    size_t const max_bin_tbs = initialise_max_bin_kmers();
    ibf = construct_ibf(parent_kmers, kmers, max_bin_tbs, current_node, data, is_root);
}

// parse all other children (merged bins) of the current ibf
auto loop_over_children = [&]()
{
    /* ... */
};
   
loop_over_children();

robin_hood::unordered_flat_set<uint64_t> kmers{};

// If max bin was a merged bin, process all remaining records, otherwise the first one has already been processed
size_t const start{(current_node.max_bin_is_merged()) ? 0u : 1u};
for (size_t i = start; i < current_node.remaining_records.size(); ++i)
{

We put the kmers into a scope for the first use. Then we do loop_over_children.
And then we have a new kmers set that's used for filling the current IBF.

Or something else...

eseiler requested a review from smehringer January 26, 2024 16:18

seqan-actions added lint [INTERNAL] used for linting and removed lint [INTERNAL] used for linting labels Jan 26, 2024

vercel bot temporarily deployed to Preview January 26, 2024 16:24 Inactive

smehringer approved these changes Oct 7, 2024

View reviewed changes

[FIX] Actually reduce memory usage

abad908

eseiler force-pushed the fix/actual_mem_reduction branch from 4f1d194 to abad908 Compare October 7, 2024 12:08

seqan-actions added lint [INTERNAL] used for linting and removed lint [INTERNAL] used for linting labels Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX] Actually reduce memory usage #192

[FIX] Actually reduce memory usage #192

eseiler commented Jan 26, 2024

vercel bot commented Jan 26, 2024 •

edited

Loading

codecov bot commented Jan 26, 2024 •

edited

Loading

smehringer commented Jan 26, 2024

eseiler commented Jan 27, 2024

smehringer commented Feb 29, 2024

eseiler commented Feb 29, 2024

seqan-actions commented Oct 7, 2024

smehringer commented Oct 11, 2024

eseiler commented Oct 11, 2024 •

edited

Loading

smehringer commented Oct 11, 2024

eseiler commented Oct 11, 2024

[FIX] Actually reduce memory usage #192

Are you sure you want to change the base?

[FIX] Actually reduce memory usage #192

Conversation

eseiler commented Jan 26, 2024

vercel bot commented Jan 26, 2024 • edited Loading

codecov bot commented Jan 26, 2024 • edited Loading

Codecov Report

smehringer commented Jan 26, 2024

eseiler commented Jan 27, 2024

smehringer commented Feb 29, 2024

eseiler commented Feb 29, 2024

seqan-actions commented Oct 7, 2024

smehringer commented Oct 11, 2024

eseiler commented Oct 11, 2024 • edited Loading

smehringer commented Oct 11, 2024

eseiler commented Oct 11, 2024

vercel bot commented Jan 26, 2024 •

edited

Loading

codecov bot commented Jan 26, 2024 •

edited

Loading

eseiler commented Oct 11, 2024 •

edited

Loading