-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for requesting max SHM size from SDL #179
Comments
Per Feb 20 call: We are leaning towards implementing full support for SHM (not just the workaround with bid attributes). @boz is planning to take this on (Thanks Adam!) |
In the interim @troian and @chainzero are going to look into the workaround with using bid attributes + a daemon running on the provider that checks the attributes and applies SHM using kubectl commands |
* Implement `"ram"` storage class with "empty dir" memory-backed volumes. * No changes to resource accounting - service memory size must include size allocated to ram storage. refs akash-network/support#179 Signed-off-by: Adam Bozanich <[email protected]>
* Add SDL support for `"ram"` storage class. * `"ram"` volumes cannot be persistent or `ReadOnly`. refs akash-network/support#179 Signed-off-by: Adam Bozanich <[email protected]>
* Implement `"ram"` storage class with "empty dir" memory-backed volumes. * No changes to resource accounting - service memory size must include size allocated to ram storage. refs akash-network/support#179
* Add SDL support for `"ram"` storage class. * `"ram"` volumes cannot be persistent or `ReadOnly`. refs akash-network/support#179
* Add SDL support for `"ram"` storage class. * `"ram"` volumes cannot be persistent or `ReadOnly`. refs akash-network/support#179
March 4th, 2024:
|
* Implement `"ram"` storage class with "empty dir" memory-backed volumes. * No changes to resource accounting - service memory size must include size allocated to ram storage. refs akash-network/support#179 Signed-off-by: Adam Bozanich <[email protected]>
* Implement `"ram"` storage class with "empty dir" memory-backed volumes. * No changes to resource accounting - service memory size must include size allocated to ram storage. refs akash-network/support#179 Signed-off-by: Adam Bozanich <[email protected]>
March 12th, 2024:
Does not need a network upgrade. No SDL changes. |
* Implement `"ram"` storage class with "empty dir" memory-backed volumes. * No changes to resource accounting - service memory size must include size allocated to ram storage. refs akash-network/support#179 Signed-off-by: Adam Bozanich <[email protected]>
* Implement `"ram"` storage class with "empty dir" memory-backed volumes. * No changes to resource accounting - service memory size must include size allocated to ram storage. refs akash-network/support#179 Signed-off-by: Adam Bozanich <[email protected]>
* Implement `"ram"` storage class with "empty dir" memory-backed volumes. * No changes to resource accounting - service memory size must include size allocated to ram storage. refs akash-network/support#179 Signed-off-by: Adam Bozanich <[email protected]>
akash network 0.32.2 shm doesn't seem to be working yet. provider attributes
SDL
after send-manifest:
|
SDL (pers.volume + /dev/shm)In the case of two volumes - pers.volume + shm (ram) I'm getting "manifest version validation failed" from provider. SDL:
Client:
Provider (v0.5.9):
|
* feat(cluster/kube/builder): `"ram"` storage class * Implement `"ram"` storage class with "empty dir" memory-backed volumes. * No changes to resource accounting - service memory size must include size allocated to ram storage. refs akash-network/support#179 Signed-off-by: Adam Bozanich <[email protected]> * feat(shm): add e2e tests Signed-off-by: Artur Troian <[email protected]> --------- Signed-off-by: Adam Bozanich <[email protected]> Signed-off-by: Artur Troian <[email protected]> Co-authored-by: Adam Bozanich <[email protected]>
I've tested the provider-services 0.5.11 - everything is working there. Details |
Is your feature request related to a problem? Please describe.
Customers (particularly AI/ ML training workloads) frequently need to be able to have multiple services share storage - for example one service that is downloading data and labeling is CPU bound, while another that uses the data for training is GPU bound and they can run in parallel but need to access large shared memory. We currently don't allow the max SHM size to be controllable by the user which makes it hard to run such workloads.
Describe the solution you'd like
Support being able to specify and request SHM size as part of the SDL
Describe alternatives you've considered
Note that we have tested being able to apply these changes on the provider side manually during our work with Thumper training on the FoundryStaking provider.
Search
Code of Conduct
Additional context
No response
The text was updated successfully, but these errors were encountered: