Adding a MatMulInteger-like contrib/custom op #22991
-
Hello, We've recently added Transformers.js + onnxruntime (WASM) into Firefox as our inference runtime, and it works great, but we want to improve its speed in particular when running against quantized modules using int8 because it's a good tradeoff in terms of download size for our uses cases. Right now it's slower than fp32 and way slower than the native runtime. In Firefox Translations, which uses another WASM runtime (Bergamot), we were able to get a significant performance boost by plugging in the WASM a JS import hook to call https://github.com/mozilla/gemmology that is vendored into Firefox. My goal is to try the same thing with onnxruntime, by writing a new contrib op that will replace The first step I took was to take a model that we use, and rewrite its graph. See for instance our fine-tuned mobilebert here https://huggingface.co/Mozilla/mobilebert-uncased-finetuned-LoRA-intent-classifier/tree/main/onnx My script to rewrite is here https://gist.github.com/tarekziade/08404a890a3cf56878042892f6ce58aa From there I added a first version of custom op here that mimics MatMulInteger for now tarekziade@bcfe7cb (planning to do the WASM hook as a second step) The op seems to be correctly registered in the ops list, but I get the following error >>> import onnxruntime as ort; sess_opt = ort.SessionOptions() ; sess_opt.log_severity_level = 0 ; ort_sess = ort.InferenceSession('model_quantized.firefox.onnx', sess_opt);
onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Could not find an implementation for FirefoxMatMulInteger8(1) node with name '/distilbert/transformer/layer.0/attention/q_lin/MatMul_quant Suggesting that the schema part is not well implemented. I noticed that the call to But this is what I have done by adding class ONNX_OPERATOR_KERNEL_CLASS_NAME(kCpuExecutionProvider, kMSDomain, 1, FirefoxMatMulInteger8); in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc I am not entirely sure how to solve this, any help would be appreciated, thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 6 replies
-
you need to also add it by appending a new line of |
Beta Was this translation helpful? Give feedback.
-
Thanks so much, that worked as expected, I was able to replace MatMulInteger with my own class and the rewritten graph works. I pushed a demo here https://github.com/tarekziade/onnx-custom-op and will now see how to plug Javascript as an extern. @fs-eire is this something that you have already done in onnxruntime ? I am planning to expose the JS API as an |
Beta Was this translation helpful? Give feedback.
you need to also add it by appending a new line of
BuildKernelCreateInfo
in functionRegisterCpuContribKernels
in onnxruntime/contrib_ops/cpu/cpu_contrib_kernels.cc.