Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggressively shrink ancient storages when shrink isn't too busy. #2946

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

dmakarov
Copy link

@dmakarov dmakarov commented Sep 16, 2024

Problem

Ancient packing when skipping rewrites has some non-ideal behavior.
It can sometimes be true that an ancient storage might never meet the 90%(?) threshold for shrinking. However, every dead account that an ancient storage keeps present causes the account to remain in the index in memory and starts a chain reaction of other accounts, such as zero lamport accounts, that must be kept alive.

Summary of Changes

Add another slot for shrinking when the number of shrink candidate slots is too small (less than 10). The additional slot's storage has the largest number of dead bytes. This aggressively shrinks ancient storages, even when they are below the normal threshold. This allows the system to keep itself towards the ideal of storing each non-zero account once and having no zero lamport accounts.

reworked #2849

@dmakarov dmakarov force-pushed the packing branch 2 times, most recently from 9a2729a to 838e063 Compare September 16, 2024 21:44
@dmakarov dmakarov marked this pull request as ready for review September 17, 2024 01:09
@dmakarov dmakarov force-pushed the packing branch 2 times, most recently from 4d869c2 to 003ba1f Compare September 27, 2024 18:12
@jeffwashington
Copy link

@dmakarov, where did you end up on this? We can move this forward monday. Hopefully you have a machine running this?

@dmakarov
Copy link
Author

@dmakarov, where did you end up on this? We can move this forward monday. Hopefully you have a machine running this?

I had it running on my dev machine for a few days. Now I added stat counters and will restart it with the new stats. I’d like to experiment a bit more with this.

@dmakarov dmakarov force-pushed the packing branch 2 times, most recently from b18e934 to 1bfd6d0 Compare September 27, 2024 20:59
@jeffwashington
Copy link

please add a test that shows we add an ancient storage to shrink. May be helpful to refactor the shrink fn so that it calculates the storages to shrink in a separate fn so we can just check the output of that fn. Or, you can do a more full test which actually verifies the capacity is what you expect after running shrink. There should be tests that do this similarly already. Look for tests that call shrink_candidate_slots

@dmakarov
Copy link
Author

dmakarov commented Oct 1, 2024

please add a test that shows we add an ancient storage to shrink.

yes, I'm working on it.

@jeffwashington
Copy link

Here's what we get with this change and the rest of the ancient packing tweaks:
image

purple and blue are the prior code which maybe added up to 10 ancient storages to pack. Now we're consistently adding 1. Are we making enough progress? I'm not sure. Seems like it may take 4 days to shrink each ancient storage once.

@jeffwashington
Copy link

jeffwashington commented Oct 1, 2024

jw10 is the machine I just added this change onto (with other ancient packing changes.
This pr is definitely not wrong. It may be insufficient. A metric to watch is # in-mem index entries. We'll have in-mem idx entries if we have dead accounts accumulating in ancient storages. Dead accounts means we have refcount > 1, thus they will remain in memory.
image

Another metric is # zero lamport ancient accounts. This will be higher if we aren't aggressively cleaning ancient storages.
image

@jeffwashington jeffwashington changed the title Tweak ancient packing algorithm Aggressively shrink ancient storages when shrink isn't too busy. Oct 1, 2024
@jeffwashington
Copy link

this pr should have zero effect unless skipping rewrites is enabled by cli.

@jeffwashington jeffwashington self-requested a review October 1, 2024 15:49
HaoranYi
HaoranYi previously approved these changes Oct 1, 2024
Copy link

@HaoranYi HaoranYi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea in the PR looks good to me.
I don't see too much downside for adding one ancient to shrink when we are not busy.

accounts-db/src/accounts_db.rs Outdated Show resolved Hide resolved
accounts-db/src/accounts_db.rs Outdated Show resolved Hide resolved
accounts-db/src/accounts_db.rs Outdated Show resolved Hide resolved
accounts-db/src/accounts_db.rs Outdated Show resolved Hide resolved
&& *capacity == store.capacity()
&& Self::is_candidate_for_shrink(self, &store)
{
*capacity = 0;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not overload the u64, and instead create an enum to indicate if this storage is pre or post shrunk?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. Isn't capacity checked for being 0 in other logic, so that if we add an enum we still have to set capacity to 0 here for other code to work correctly?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are overloading the 0 here. Yes. We could do an enum and it would be much more clear what we're trying to do.
{AlreadyShrunk, CanBeShrunk(capacity: u64)}

Copy link

@jeffwashington jeffwashington Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also just remove the element from the vec, but that could be expsneive. I was assuming marking it as 'already shrunk' would be sufficient. maybe none of htis is necessary because we'll see that the new capacity doesn't match the old capacity and skip it anyway... Then we don't need to iter mut at all and we can just iter. That seems simplest of all and we already have to handle that case anyway.

This does cause us to look up way more storages.
An oldie but goodie: https://en.wikichip.org/wiki/schlemiel_the_painter%27s_algorithm

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the suggested change? Not to change capacity?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enum is fine with me. iterating. alternatively, vec sorted in reverse and you pop the last one off the end and reduce the count. This would not require a re-allocation and would avoid revisiting ancient storages we already previously shrunk.

accounts-db/src/accounts_db.rs Outdated Show resolved Hide resolved
accounts-db/src/accounts_db.rs Outdated Show resolved Hide resolved
accounts-db/src/ancient_append_vecs.rs Show resolved Hide resolved
@dmakarov
Copy link
Author

dmakarov commented Oct 2, 2024

it looks like i need to rebase to fix the vulnerability check errors.

@dmakarov
Copy link
Author

dmakarov commented Oct 2, 2024

rebased to resolve conflicts. I'm still working on unit tests. When ready, I'll renew review requests. Thanks.

accounts-db/src/accounts_db.rs Outdated Show resolved Hide resolved
accounts-db/src/accounts_db.rs Outdated Show resolved Hide resolved
@dmakarov
Copy link
Author

dmakarov commented Oct 4, 2024

conflicts --> no choice but rebase.

@dmakarov
Copy link
Author

dmakarov commented Oct 9, 2024

a more full test which actually verifies the capacity is what you expect after running shrink.

Do you mean the capacity of an ancient slot added to the shrink candidates should be 0, or the test should verify some other capacity? If the latter what capacity did you mean?

@jeffwashington
Copy link

a more full test which actually verifies the capacity is what you expect after running shrink.

Do you mean the capacity of an ancient slot added to the shrink candidates should be 0, or the test should verify some other capacity? If the latter what capacity did you mean?

the capacity of the storage itself shrinks to what you'd expect.

accounts-db/src/accounts_db.rs Outdated Show resolved Hide resolved
Comment on lines 12116 to 12119
// At this point the dead ancient account should be removed
// and storage capacity shrunk to the sum of live bytes of
// accounts it holds. This is the data lengths of the
// accounts plus the length of their metadata.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we scan the storage to ensure the modified_account_pubkey is not present?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have any particular scan method in mind or just iterating over stored_accounts? please, be more specific.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also can you explain why checking the capacity of the storage is not sufficient in this specific test case?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain why checking the capacity of the storage is not sufficient in this specific test case?

So for two reasons:

  1. The sizes are specific to the AppendVec storage format. This means when we move to Tiered Storage, we'll need to tweak the test to support both formats, as the stored sizes will be different.
  2. To be defensive against refactors. To me the test reads as it is trying to ensure the dead ancient account has been removed from the shrunk storage. And the way to guarantee this is by checking the accounts in the storage and making sure none of them are the to-be-removed one.

do you have any particular scan method in mind or just iterating over stored_accounts?

Yes, I think iterating over stored accounts will work here.

Another option would be to do something like this:

store.accounts.scan_pubkeys(|pubkey| {
    assert_ne!(pubkey, modified_account_pubkey);
});

And more info related to the 136 byte storage overhead. In append_vec.rs, there is this function:

append_vec::aligned_stored_size(data_len)

which can handle both the 136 byte overhead and the 8-byte alignment requirement for AppendVec entries. So we could use that in the assert as:

assert_eq!(created_accounts.capacity, append_vec::aligned_stored_size(1000) + append_vec::aligned_stored_size(2000));

Comment on lines +510 to +514
(
"initial_candidates_count",
self.initial_candidates_count.swap(0, Ordering::Relaxed),
i64
),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can all the changes to this file be reverted?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there's no functional changes, just moving the lines. But I still had to look and spend time to realize there were not actual changes to the metrics. IMO feels orthogonal to the problem that this PR is solving. Or in other words, I don't understand why this file was changed.

accounts-db/src/accounts_db.rs Outdated Show resolved Hide resolved
accounts-db/src/accounts_db.rs Outdated Show resolved Hide resolved
HaoranYi
HaoranYi previously approved these changes Oct 14, 2024
Copy link

@HaoranYi HaoranYi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the new test. It looks correct to me.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants