-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PoC of lowering compilation time using Python threading #3
base: develop
Are you sure you want to change the base?
Conversation
current_group.operations, current_group.quantizers, quantized_model, quantized_model_graph | ||
) | ||
|
||
modified_models.append(modified_model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general comment is that the proposed solution is not memory optimal. What I mean is that this solution requires storing as many copies of the model in memory as there are groups in groups_to_rank. For some models, the number of groups for ranking is in the hundreds. Probably, algorithm can be crashed by memory for huge models.
Changes
INFO:nncf:Calculating ranking score for groups of quantizers
has been shortened from ~4 minutes to a little over 2 minutesRelated tickets
119274