You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
and I get the fp8 quantized version onnx,
but with trtexec( trt ver 10.7) building the plan file, errors show up, log below; and the resnet_fp8.trt plan file is created and could be loaded for inference, but it takes more time inferencing than fp16 version plan file. Could any help me analysis why these errors happen? and is it the right way to do quantizing with fp8? or any better ways of quantizing?
building command: trtexec --onnx=resnet18_fp8.onnx --fp8 --fp16 --saveEngine=resnet18_fp8.trt
logs(parts):
[12/17/2024-08:28:27] [I] [TRT] Loaded 2570905 bytes of timing cache from ../timingcache
[12/17/2024-08:28:27] [I] [TRT] Global timing cache in use. Profiling results in this builder pass will be stored.
[12/17/2024-08:28:27] [I] [TRT] Compiler backend is used during engine build.
[12/17/2024-08:28:27] [E] Error[9]: Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [type.cpp:infer_type:145] Could not infer output types for operation: 40: quantize: /maxpool/MaxPool_output_0_QuantizeLinear_Output'-(fp8[7,64,192,256][]so[], mem_prop=0) | /maxpool/MaxPool_output_0'-(f32[7,64,192,256][]so[3,2,1,0]p[0,0,0,0], mem_prop=0), /maxpool/MaxPool_output_0_QuantizeLinear scale weightsHalf-0.0448303H:(f16[][]so[], mem_prop=0)<entry>, stream = 0 // /maxpool/MaxPool_output_0_QuantizeLinear, axis = 0, No matching rules found for input operand types
[12/17/2024-08:28:27] [E] Error[9]: Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [type.cpp:infer_type:145] Could not infer output types for operation: 41: quantize: onnx__Conv_199_QuantizeLinear_Output'-(fp8[64,64,3,3][]so[], mem_prop=0) | onnx__Conv_199_constantFloat-{0.0190582, -0.0767822, -0.0035038, -0.0632935, -0.242554, -0.0754395, -0.0421753, -0.140137, ...}(f32[64,64,3,3][576,9,3,1]so[3,2,1,0], mem_prop=0)<entry>, onnx__Conv_199_QuantizeLinear scale weightsHalf-{0.00673676, 0.00426483, 0.00368881, 0.00978088, 0.00222588, 0.00957489, 0.00311661, 0.00770187, ...}(f16[64][1]so[0], mem_prop=0)<entry>, stream = 0 // onnx::Conv_199_QuantizeLinear, axis = 0, No matching rules found for input operand types
[12/17/2024-08:28:27] [E] Error[9]: Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [type.cpp:infer_type:145] Could not infer output types for operation: 43: quantize: onnx__Conv_202_QuantizeLinear_Output'-(fp8[64,64,3,3][]so[], mem_prop=0) | onnx__Conv_202_constantFloat-{0.00885773, -0.0023632, -0.00169086, -0.00882721, -0.00554657, -0.0158234, 0.0228271, 0.0340271, ...}(f32[64,64,3,3][576,9,3,1]so[3,2,1,0], mem_prop=0)<entry>, onnx__Conv_202_QuantizeLinear scale weightsHalf-{0.00281715, 0.00680923, 0.00307655, 0.00264931, 0.00502396, 0.00421524, 0.00231934, 0.00679016, ...}(f16[64][1]so[0], mem_prop=0)<entry>, stream = 0 // onnx::Conv_202_QuantizeLinear, axis = 0, No matching rules found for input operand types
[12/17/2024-08:28:28] [E] Error[9]: Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [type.cpp:infer_type:145] Could not infer output types for operation: 41: quantize: onnx__Conv_205_QuantizeLinear_Output'-(fp8[64,64,3,3][]so[], mem_prop=0) | onnx__Conv_205_constantFloat-{-0.0250549, -0.00530624, 0.00528336, -0.00950623, 0.0484924, 0.0267334, -0.0103683, 0.0666504, ...}(f32[64,64,3,3][576,9,3,1]so[3,2,1,0], mem_prop=0)<entry>, onnx__Conv_205_QuantizeLinear scale weightsHalf-{0.00647736, 0.0150909, 0.00805664, 0.0115967, 0.00300217, 0.0188751, 0.00980377, 0.0198517, ...}(f16[64][1]so[0], mem_prop=0)<entry>, stream = 0 // onnx::Conv_205_QuantizeLinear, axis = 0, No matching rules found for input operand types
[12/17/2024-08:28:28] [E] Error[9]: Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [type.cpp:infer_type:145] Could not infer output types for operation: 43: quantize: onnx__Conv_208_QuantizeLinear_Output'-(fp8[128,64,3,3][]so[], mem_prop=0) | onnx__Conv_208_constantFloat-{-0.0284271, -0.0438232, -0.0544739, 0.0280457, -0.00587082, -0.0399475, 0.0474243, 0.0346985, ...}(f32[128,64,3,3][576,9,3,1]so[3,2,1,0], mem_prop=0)<entry>, onnx__Conv_208_QuantizeLinear scale weightsHalf-{0.0019722, 0.00211334, 0.00171089, 0.00146389, 0.00204468, 0.00277519, 0.00121593, 0.00243187, ...}(f16[128][1]so[0], mem_prop=0)<entry>, stream = 0 // onnx::Conv_208_QuantizeLinear, axis = 0, No matching rules found for input operand types
[12/17/2024-08:28:28] [E] Error[9]: Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [type.cpp:infer_type:145] Could not infer output types for operation: 73: quantize: onnx__Conv_214_QuantizeLinear_Output'-(fp8[128,64,1,1][]so[], mem_prop=0) | onnx__Conv_214_constantFloat-{0.0108261, -0.211548, 0.00858307, 0.00590515, -0.0189819, 0.0228424, 0.101624, 0.00101185, ...}(f32[128,64,1,1][64,1,1,1]so[3,2,1,0], mem_prop=0)<entry>, onnx__Conv_214_QuantizeLinear scale weightsHalf-{0.00953674, 0.00139427, 0.00201035, 0.0103455, 0.00303841, 0.00309563, 0.00431824, 0.00777435, ...}(f16[128][1]so[0], mem_prop=0)<entry>, stream = 0 // onnx::Conv_214_QuantizeLinear, axis = 0, No matching rules found for input operand types
[12/17/2024-08:28:28] [E] Error[9]: Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [type.cpp:infer_type:145] Could not infer output types for operation: 43: quantize: onnx__Conv_217_QuantizeLinear_Output'-(fp8[128,128,3,3][]so[], mem_prop=0) | onnx__Conv_217_constantFloat-{-0.000638008, -0.00498962, -0.0051384, 0.0160065, 0.00120068, 0.00375938, 0.0061264, -0.0106125, ...}(f32[128,128,3,3][1152,9,3,1]so[3,2,1,0], mem_prop=0)<entry>, onnx__Conv_217_QuantizeLinear scale weightsHalf-{0.00455475, 0.00330925, 0.00310898, 0.00251007, 0.00468826, 0.00300598, 0.00230789, 0.00641632, ...}(f16[128][1]so[0], mem_prop=0)<entry>, stream = 0 // onnx::Conv_217_QuantizeLinear, axis = 0, No matching rules found for input operand types
[12/17/2024-08:28:28] [E] Error[9]: Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [type.cpp:infer_type:145] Could not infer output types for operation: 41: quantize: onnx__Conv_220_QuantizeLinear_Output'-(fp8[128,128,3,3][]so[], mem_prop=0) | onnx__Conv_220_constantFloat-{-0.0142441, 0.00442123, -0.000795364, -0.00779343, -0.0171051, -0.0213165, 0.00555801, 0.0090332, ...}(f32[128,128,3,3][1152,9,3,1]so[3,2,1,0], mem_prop=0)<entry>, onnx__Conv_220_QuantizeLinear scale weightsHalf-{0.00333023, 0.00293732, 0.00777817, 0.0100327, 0.00688934, 0.00879669, 0.00788116, 0.0043335, ...}(f16[128][1]so[0], mem_prop=0)<entry>, stream = 0 // onnx::Conv_220_QuantizeLinear, axis = 0, No matching rules found for input operand types
[12/17/2024-08:28:28] [E] Error[9]: Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [type.cpp:infer_type:145] Could not infer output types for operation: 43: quantize: onnx__Conv_223_QuantizeLinear_Output'-(fp8[256,128,3,3][]so[], mem_prop=0) | onnx__Conv_223_constantFloat-{-0.00917053, -0.00958252, -0.00919342, -0.00304222, 0.00871277, 0.00569916, -0.00856781, 0.000209093, ...}(f32[256,128,3,3][1152,9,3,1]so[3,2,1,0], mem_prop=0)<entry>, onnx__Conv_223_QuantizeLinear scale weightsHalf-{0.00279427, 0.00354576, 0.00203514, 0.00306511, 0.0017395, 0.00217438, 0.00256538, 0.00204659, ...}(f16[256][1]so[0], mem_prop=0)<entry>, stream = 0 // onnx::Conv_223_QuantizeLinear, axis = 0, No matching rules found for input operand types
[12/17/2024-08:28:28] [E] Error[9]: Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [type.cpp:infer_type:145] Could not infer output types for operation: 73: quantize: onnx__Conv_229_QuantizeLinear_Output'-(fp8[256,128,1,1][]so[], mem_prop=0) | onnx__Conv_229_constantFloat-{0.00479889, -0.0113983, -0.0102463, 0.00808716, -0.0241241, -0.0216675, -0.00894928, 0.0211182, ...}(f32[256,128,1,1][128,1,1,1]so[3,2,1,0], mem_prop=0)<entry>, onnx__Conv_229_QuantizeLinear scale weightsHalf-{0.00191116, 0.00129223, 0.0010891, 0.00492096, 0.00165272, 0.0015316, 0.00255966, 0.0020771, ...}(f16[256][1]so[0], mem_prop=0)<entry>, stream = 0 // onnx::Conv_229_QuantizeLinear, axis = 0, No matching rules found for input operand types
[12/17/2024-08:28:28] [E] Error[9]: Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [type.cpp:infer_type:145] Could not infer output types for operation: 43: quantize: onnx__Conv_232_QuantizeLinear_Output'-(fp8[256,256,3,3][]so[], mem_prop=0) | onnx__Conv_232_constantFloat-{0.0268097, 0.0266266, 0.0213165, 0.0276489, 0.0305939, 0.0314331, 0.0134048, 0.00744629, ...}(f32[256,256,3,3][2304,9,3,1]so[3,2,1,0], mem_prop=0)<entry>, onnx__Conv_232_QuantizeLinear scale weightsHalf-{0.00175095, 0.00225449, 0.00241852, 0.00459671, 0.00247955, 0.0021534, 0.00415802, 0.00390434, ...}(f16[256][1]so[0], mem_prop=0)<entry>, stream = 0 // onnx::Conv_232_QuantizeLinear, axis = 0, No matching rules found for input operand types
[12/17/2024-08:28:28] [E] Error[9]: Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [type.cpp:infer_type:145] Could not infer output types for operation: 41: quantize: onnx__Conv_235_QuantizeLinear_Output'-(fp8[256,256,3,3][]so[], mem_prop=0) | onnx__Conv_235_constantFloat-{-0.0624695, -0.0383911, -0.0323181, -0.0254364, -0.0111465, -0.0106201, -0.00262451, 0.0349426, ...}(f32[256,256,3,3][2304,9,3,1]so[3,2,1,0], mem_prop=0)<entry>, onnx__Conv_235_QuantizeLinear scale weightsHalf-{0.00393677, 0.00412369, 0.00238228, 0.00409698, 0.00298119, 0.00185871, 0.00465012, 0.00352478, ...}(f16[256][1]so[0], mem_prop=0)<entry>, stream = 0 // onnx::Conv_235_QuantizeLinear, axis = 0, No matching rules found for input operand types
[12/17/2024-08:28:29] [E] Error[9]: Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [type.cpp:infer_type:145] Could not infer output types for operation: 43: quantize: onnx__Conv_238_QuantizeLinear_Output'-(fp8[512,256,3,3][]so[], mem_prop=0) | onnx__Conv_238_constantFloat-{-0.00893402, -0.0145874, -0.0167847, 0.0157166, 0.0183868, 0.0223846, 0.0335083, 0.0255432, ...}(f32[512,256,3,3][2304,9,3,1]so[3,2,1,0], mem_prop=0)<entry>, onnx__Conv_238_QuantizeLinear scale weightsHalf-{0.00216866, 0.00140667, 0.00249672, 0.00204086, 0.00139999, 0.00204086, 0.00163174, 0.00369263, ...}(f16[512][1]so[0], mem_prop=0)<entry>, stream = 0 // onnx::Conv_238_QuantizeLinear, axis = 0, No matching rules found for input operand types
[12/17/2024-08:28:29] [E] Error[9]: Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [type.cpp:infer_type:145] Could not infer output types for operation: 73: quantize: onnx__Conv_244_QuantizeLinear_Output'-(fp8[512,256,1,1][]so[], mem_prop=0) | onnx__Conv_244_constantFloat-{0.00789642, 0.00282097, 0.0231323, 0.00636673, 0.013443, -0.0136948, 0.00136471, -0.067688, ...}(f32[512,256,1,1][256,1,1,1]so[3,2,1,0], mem_prop=0)<entry>, onnx__Conv_244_QuantizeLinear scale weightsHalf-{0.00360107, 0.00611877, 0.00516129, 0.0156555, 0.00354767, 0.00492859, 0.0070076, 0.0146866, ...}(f16[512][1]so[0], mem_prop=0)<entry>, stream = 0 // onnx::Conv_244_QuantizeLinear, axis = 0, No matching rules found for input operand types
[12/17/2024-08:28:29] [E] Error[9]: Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [type.cpp:infer_type:145] Could not infer output types for operation: 43: quantize: onnx__Conv_247_QuantizeLinear_Output'-(fp8[512,512,3,3][]so[], mem_prop=0) | onnx__Conv_247_constantFloat-{-0.00535583, -0.00385475, 0.00428009, 0.00337029, -0.00452423, 0.00846863, 0.00889587, 0.00968933, ...}(f32[512,512,3,3][4608,9,3,1]so[3,2,1,0], mem_prop=0)<entry>, onnx__Conv_247_QuantizeLinear scale weightsHalf-{0.00160885, 0.00256729, 0.001647, 0.00158596, 0.00154018, 0.00148392, 0.00161552, 0.00206375, ...}(f16[512][1]so[0], mem_prop=0)<entry>, stream = 0 // onnx::Conv_247_QuantizeLinear, axis = 0, No matching rules found for input operand types
[12/17/2024-08:28:30] [E] Error[9]: Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [type.cpp:infer_type:145] Could not infer output types for operation: 41: quantize: onnx__Conv_250_QuantizeLinear_Output'-(fp8[512,512,3,3][]so[], mem_prop=0) | onnx__Conv_250_constantFloat-{0.00472641, 0.0701294, -0.0333557, 0.00321007, 0.0401306, -0.0964966, 0.0737305, 0.142456, ...}(f32[512,512,3,3][4608,9,3,1]so[3,2,1,0], mem_prop=0)<entry>, onnx__Conv_250_QuantizeLinear scale weightsHalf-{0.0312805, 0.0317688, 0.0265808, 0.0308075, 0.0382996, 0.0361633, 0.0439758, 0.0685425, ...}(f16[512][1]so[0], mem_prop=0)<entry>, stream = 0 // onnx::Conv_250_QuantizeLinear, axis = 0, No matching rules found for input operand types
[12/17/2024-08:28:30] [E] Error[9]: Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [type.cpp:infer_type:145] Could not infer output types for operation: 80: quantize: fc_weight_QuantizeLinear_Output'-(fp8[1000,512][]so[], mem_prop=0) | fc_weight_constantFloat-{-0.0184784, -0.0704346, -0.0517578, -0.00983429, 0.014679, -0.0132065, -0.0382385, 0.250732, ...}(f32[1000,512][512,1]so[1,0], mem_prop=0)<entry>, fc_weight_QuantizeLinear scale weightsHalf-{0.00839233, 0.0100479, 0.0089798, 0.00801849, 0.00840759, 0.00848389, 0.00928497, 0.0119247, ...}(f16[1000][1]so[0], mem_prop=0)<entry>, stream = 0 // fc.weight_QuantizeLinear, axis = 0, No matching rules found for input operand types
[12/17/2024-08:28:30] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[12/17/2024-08:28:30] [I] [TRT] Total Host Persistent Memory: 149104 bytes
The text was updated successfully, but these errors were encountered:
Hi all,
I'm try to quantize resnet18 onnx with fp8 using the command below
python -m modelopt.onnx.quantization --onnx_path=resnet18.onnx --quantize_mode=fp8 --output_path=resnet18_fp8.onnx
and I get the fp8 quantized version onnx,
but with trtexec( trt ver 10.7) building the plan file, errors show up, log below; and the resnet_fp8.trt plan file is created and could be loaded for inference, but it takes more time inferencing than fp16 version plan file.
Could any help me analysis why these errors happen? and is it the right way to do quantizing with fp8? or any better ways of quantizing?
building command:
trtexec --onnx=resnet18_fp8.onnx --fp8 --fp16 --saveEngine=resnet18_fp8.trt
logs(parts):
The text was updated successfully, but these errors were encountered: