A relatively optimal model automatic partitioning scheme #744
ExcellentHH
started this conversation in
Ideas
Replies: 1 comment 1 reply
-
best bet is to call gen-settings on the partitions -- this will give you an exact number of rows used. Ultimately something akin to BFS may work -- it gets a bit complex for larger / branching models (where this would really shine). |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
When I first tried ezkl, I greatly admired the efforts of the ezkl project team in making the validation of ONNX models simple and accessible. However, a common issue that arose during usage was the large scale of the models to be validated, leading to constraint numbers exceeding even 2 to the power of 26. This resulted in significantly prolonged prove time and even instances of insufficient hardware memory, limiting the utility of ezkl on nodes with weaker computing capabilities. Hence, I pondered over the possibility of partitioning the models into multiple sub-models, committing shared values among them to ensure consistency. It was delightful to discover that the updates in ezkl adopted a similar approach to my thought process, employing hash functions or commitment schemes to commit intermediate feature values of the model, as shown in
https://blog.ezkl.xyz/post/splitting/
. Personally, I believe there is still room for optimization here, which does not conflict with the GPU acceleration supported by the current version of ezkl.Clearly, different ways of partitioning the model will result in varying numbers of intermediate feature values to be committed. Therefore, when partitioning the model, we need to consider not only the prove and verification costs of the normal model inference process but also the additional costs arising from the intermediate feature values. Additionally, we must be mindful of the upper limit on the number of constraints that hardware can support for prove and verification.
My initial idea was to traverse the ONNX graph using a DFS or BFS approach, calculating the number of rows needed in the layout process of each layer based on parameters of different layer types (e.g., input size, kernel size, stride, padding for Conv layers), along with the additional overhead generated by commitments, in an attempt to determine a relatively optimal partitioning strategy for the model. However, I encountered challenges in calculating the rows. I am currently unsure about how to deduce the required rows based on the parameters of different layer types or to calculate the rows needed for commitments based on the number of elements to be committed. Could you provide me with some guidance on this? Alternatively, do you have any better insights on partitioning strategies?
Any insights are appreciated, thanks for the hard work!
Beta Was this translation helpful? Give feedback.
All reactions