-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement pre-packed blobs serialization on disk and their memory mapping on load #23069
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
and pre-packed blobs sharing when weights sharing is not enabled. Memory map pre-packed blobs. Recurse into subgraphs in ToGraphProtoWithExternalInitializers to make sure all big weights are serialized along with their pre-packs that is to be shared between the subgraphs.
yuslepukhin
commented
Dec 10, 2024
yuslepukhin
commented
Dec 10, 2024
yuslepukhin
commented
Dec 10, 2024
yuslepukhin
force-pushed
the
yuslepukhin/prepack_serialize
branch
from
December 10, 2024 23:00
7fc9a93
to
fab27a7
Compare
tianleiwu
reviewed
Dec 12, 2024
tianleiwu
reviewed
Dec 12, 2024
tianleiwu
reviewed
Dec 12, 2024
tianleiwu
reviewed
Dec 12, 2024
tianleiwu
reviewed
Dec 12, 2024
skottmckay
reviewed
Dec 13, 2024
include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h
Outdated
Show resolved
Hide resolved
edgchen1
reviewed
Dec 13, 2024
skottmckay
approved these changes
Dec 20, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tianleiwu
approved these changes
Dec 20, 2024
tarekziade
pushed a commit
to tarekziade/onnxruntime
that referenced
this pull request
Jan 10, 2025
…ping on load (microsoft#23069) ### Description <!-- Describe your changes. --> Pre-packing is a feature, that allows kernels to re-arrange weights data to gain performance at interference time Currently, pre-packed blobs are shared when a cross-session weight sharing is enabled and only for those weights that are marked as shared by the user. Otherwise, data resides on the heap, the kernels own the data which may be duplicated. This change enables pre-packed data to be stored on disk alongside with the external initializers. The pre-packed blobs are memory mapped and are loaded into either the X-session shared container or a new container that shares pre-packed blobs within the session. With the new approach, pre-packed blobs are always owned by the shared container using the existing pre-pack mechanism for sharing. When X-session sharing is enabled, then the external container owns the data. A separate container owned by a root `SessionState` owns and shares the data when X-session sharing is not enabled. To facilitate this new approach, we introduce a new container that works in two modes. When an optimized model is being saved, and pre-packed weights saving is enabled, the new container will record pre-packed blobs and serialize them to disk using existing `ToGraphProtoWithExternalInitializers` function. To externalize the pre-packed weights, we introduce a new session option `kOrtSessionOptionsSavePrePackedConstantInitializers.` Note, that pre-packing should be enabled (default) for this to work. `ToGraphProtoWithExternalInitializers`function is modified to recurse into subgraphs to make sure we properly account for local initializer names. In the second mode, the container would simply hold the pre-packed weights memory-mapped from disk and share them with the kernels. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Reduce memory usage by pre-packed initializers and externalize them.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Pre-packing is a feature, that allows kernels to re-arrange weights data
to gain performance at interference time
Currently, pre-packed blobs are shared when a cross-session weight sharing is enabled and only for those weights that are marked as shared by the user. Otherwise, data resides on the heap, the kernels own the data which may be duplicated.
This change enables pre-packed data to be stored on disk alongside with the external initializers.
The pre-packed blobs are memory mapped and are loaded into either the X-session shared container
or a new container that shares pre-packed blobs within the session.
With the new approach, pre-packed blobs are always owned by the shared container using the existing pre-pack mechanism for sharing. When X-session sharing is enabled, then the external container owns the data.
A separate container owned by a root
SessionState
owns and shares the data when X-session sharing is not enabled.To facilitate this new approach, we introduce a new container that works in two modes. When an optimized model is being saved, and pre-packed weights saving is enabled, the new container will record pre-packed blobs and serialize them to disk using existing
ToGraphProtoWithExternalInitializers
function.To externalize the pre-packed weights, we introduce a new session option
kOrtSessionOptionsSavePrePackedConstantInitializers.
Note, that pre-packing should be enabled (default) for this to work.ToGraphProtoWithExternalInitializers
function is modified to recurse into subgraphs to make sure we properly account for local initializer names.In the second mode, the container would simply hold the pre-packed weights memory-mapped from disk and share them with the kernels.
Motivation and Context
Reduce memory usage by pre-packed initializers and externalize them.