-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/lazy load deployment #2519
Conversation
a334232
to
61872f3
Compare
Would love some unit tests for this as well! An easy way to do this would be to use a |
synthesizer/process/src/lib.rs
Outdated
// try to lazy load the stack | ||
#[cfg(feature = "rocks")] | ||
let store = ConsensusStore::<N, ConsensusDB<N>>::open(storage_mode); | ||
#[cfg(not(feature = "rocks"))] | ||
let store = ConsensusStore::<N, ConsensusMemory<N>>::open(storage_mode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be easier to either have C: ConsensusStorage
as a generic or to pass in storage
somehow.
It might break some APIs, but it feels cleaner than reloading the storage it at each instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this was the original idea. but adding the generic cascaded everywhere... Will look into that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cascades too much. The perf hit to open storage is about 2s (on a 15GB ledger). That's not negligible, but hopefully we don't need to keep so many programs in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
found a way to avoid opening storage every time 🎉
2e79b48
to
ab3fb4b
Compare
ab3fb4b
to
dc9340b
Compare
When running the snarkOS Typically, the logs look similar to this:
1 validator logs:
|
|
Yes, all nodes had it. On aws, validator 0 worked, all other validators got stuck on this:
(the |
did you put |
} | ||
} | ||
|
||
impl<N: Network> Process<N> { | ||
/// Initializes a new process. | ||
#[inline] | ||
pub fn load() -> Result<Self> { | ||
// Assumption: this is only called in test code. | ||
Process::load_from_storage(Some(aleo_std::StorageMode::Development(0))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting this may be the source of our troubles, and we may want to enclose the function in #[cfg(any(test, feature = "test"))]
if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this code needs to be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
marking as draft again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4aa125b looks good.
If enclosing UPDATE: See new comment below.fn load()
in something like #[cfg(any(test, feature = "test"))]
is too difficult (perhaps because its called both by tests and benchmarks?), another safer approach could be to get rid of fn load()
alltogether. Should take you just one search and replace to call Process::load_from_storage(Some(aleo_std::StorageMode::Development(0)))
directly, and that keeps the aleo_std::StorageMode::Development(0)
logic contained to within tests and benchmarks. That makes a future developer less likely to make mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realise also that the current approach can lead to deadlocks on the storage when running tests, because all the tests will try to access the same database... Looks like we also didn't run CI on this PR yet (you can ask @ljedrz for how he has been running CI with a separate -ci branch).
New proposal: what about we let fn load
use store: None
. Just like load_from_storage
, it should load the credits.aleo
program because the tests need it. To avoid adding too many lines of code, you can still abstract similar logic to a shared function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can not use None
as this breaks many tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clear. Then we should rename load
to load_testing_only
.
@@ -0,0 +1,92 @@ | |||
// Copyright (C) 2019-2023 Aleo Systems Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This license needs to be updated, see e.g. https://github.com/AleoNet/snarkVM/blob/mainnet-staging/fields/src/fp12_2over3over2.rs#L1
@@ -93,9 +96,12 @@ package = "snarkvm-utilities" | |||
path = "../../utilities" | |||
version = "=0.16.19" | |||
|
|||
[dependencies.tracing] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hate to nit, but these are ordered alphabetically, should be moved after [dependencies.serde_json]
@@ -78,6 +78,9 @@ package = "snarkvm-ledger-store" | |||
path = "../../ledger/store" | |||
version = "=0.16.19" | |||
|
|||
[dependencies.lru] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same nit, all the non-Aleo dependencies are below
Closed by #2553 |
Motivation
We've seen that RSS size is correlated with deployments loaded (and staying) in memory. This PR keeps
MAX_STACKS
programs in memory (chosen arbitrarily at 1000).Test Plan
Tested by running a local devnet, deploying programs and executing them. Then restarting the network and trying to execute these programs. The
Stack
s were lazily loaded from storage, execution worked. Also tested executing programs that are not deployed.