Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement pre-packed blobs serialization on disk and their memory mapping on load #23069

Merged
merged 10 commits into from
Dec 20, 2024

Conversation

yuslepukhin
Copy link
Member

@yuslepukhin yuslepukhin commented Dec 10, 2024

Description

Pre-packing is a feature, that allows kernels to re-arrange weights data
to gain performance at interference time

Currently, pre-packed blobs are shared when a cross-session weight sharing is enabled and only for those weights that are marked as shared by the user. Otherwise, data resides on the heap, the kernels own the data which may be duplicated.

This change enables pre-packed data to be stored on disk alongside with the external initializers.
The pre-packed blobs are memory mapped and are loaded into either the X-session shared container
or a new container that shares pre-packed blobs within the session.

With the new approach, pre-packed blobs are always owned by the shared container using the existing pre-pack mechanism for sharing. When X-session sharing is enabled, then the external container owns the data.
A separate container owned by a root SessionState owns and shares the data when X-session sharing is not enabled.

To facilitate this new approach, we introduce a new container that works in two modes. When an optimized model is being saved, and pre-packed weights saving is enabled, the new container will record pre-packed blobs and serialize them to disk using existing ToGraphProtoWithExternalInitializers function.

To externalize the pre-packed weights, we introduce a new session option kOrtSessionOptionsSavePrePackedConstantInitializers. Note, that pre-packing should be enabled (default) for this to work.

ToGraphProtoWithExternalInitializersfunction is modified to recurse into subgraphs to make sure we properly account for local initializer names.

In the second mode, the container would simply hold the pre-packed weights memory-mapped from disk and share them with the kernels.

Motivation and Context

Reduce memory usage by pre-packed initializers and externalize them.

 and pre-packed blobs sharing when weights sharing is not enabled.
 Memory map pre-packed blobs.
 Recurse into subgraphs in ToGraphProtoWithExternalInitializers
 to make sure all big weights are serialized along with their
 pre-packs that is to be shared between the subgraphs.
@yuslepukhin yuslepukhin force-pushed the yuslepukhin/prepack_serialize branch from 7fc9a93 to fab27a7 Compare December 10, 2024 23:00
@yuslepukhin yuslepukhin marked this pull request as ready for review December 11, 2024 18:38
@yuslepukhin yuslepukhin requested a review from edgchen1 December 12, 2024 22:47
include/onnxruntime/core/graph/graph.h Outdated Show resolved Hide resolved
include/onnxruntime/core/graph/model_saving_options.h Outdated Show resolved Hide resolved
include/onnxruntime/core/graph/model_saving_options.h Outdated Show resolved Hide resolved
onnxruntime/core/graph/graph.cc Outdated Show resolved Hide resolved
onnxruntime/core/graph/graph.cc Show resolved Hide resolved
onnxruntime/core/graph/graph.cc Outdated Show resolved Hide resolved
onnxruntime/core/graph/graph.cc Outdated Show resolved Hide resolved
include/onnxruntime/core/graph/graph.h Outdated Show resolved Hide resolved
Copy link
Contributor

@skottmckay skottmckay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@yuslepukhin yuslepukhin merged commit 00b262d into main Dec 20, 2024
96 checks passed
@yuslepukhin yuslepukhin deleted the yuslepukhin/prepack_serialize branch December 20, 2024 18:49
tarekziade pushed a commit to tarekziade/onnxruntime that referenced this pull request Jan 10, 2025
…ping on load (microsoft#23069)

### Description
<!-- Describe your changes. -->
Pre-packing is a feature, that allows kernels to re-arrange weights data
to gain performance at interference time

Currently, pre-packed blobs are shared when a cross-session weight
sharing is enabled and only for those weights that are marked as shared
by the user. Otherwise, data resides on the heap, the kernels own the
data which may be duplicated.

This change enables pre-packed data to be stored on disk alongside with
the external initializers.
The pre-packed blobs are memory mapped and are loaded into either the
X-session shared container
or a new container that shares pre-packed blobs within the session.

With the new approach, pre-packed blobs are always owned by the shared
container using the existing pre-pack mechanism for sharing. When
X-session sharing is enabled, then the external container owns the data.
A separate container owned by a root `SessionState` owns and shares the
data when X-session sharing is not enabled.

To facilitate this new approach, we introduce a new container that works
in two modes. When an optimized model is being saved, and pre-packed
weights saving is enabled, the new container will record pre-packed
blobs and serialize them to disk using existing
`ToGraphProtoWithExternalInitializers` function.

To externalize the pre-packed weights, we introduce a new session option
`kOrtSessionOptionsSavePrePackedConstantInitializers.` Note, that
pre-packing should be enabled (default) for this to work.

`ToGraphProtoWithExternalInitializers`function is modified to recurse
into subgraphs to make sure we properly account for local initializer
names.

In the second mode, the container would simply hold the pre-packed
weights memory-mapped from disk and share them with the kernels.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Reduce memory usage by pre-packed initializers and externalize them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants