-
Notifications
You must be signed in to change notification settings - Fork 233
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[ONNX] Add support of correct quantization of MatMul (#1917)
### Changes Add correct handling MatMul ops during quantization for every case with only activations and activation with weight. - Introduce ONNXLayerAttributes, which are assigned to every NNCFNode. - Split weight_port_ids to constan_port_ids and possible_weight_ports. possible_weight_ports are used to determine weight dynamically. - Add logic to determine whether a node has weight (_get_weight_edge_name) - Add transpose attribute for GEMM node ### Reason for changes To get the most optimized performance after quantization. ### Related tickets 112530 95156 ### Tests Synthetic models were added in graph tests gpt-2 and bertsquad were added in graph test Add test on scales after quantization Add test on transpose_axis func
- Loading branch information
Showing
56 changed files
with
16,774 additions
and
6,038 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.