Questions about conversion of Hugging Face transformer to ONNX #7051
Unanswered
Matthieu-Tinycoaching
asked this question in
Other Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi community,
I have tried the
convert_graph_to_onnx.py
script (https://huggingface.co/transformers/serialization.html) to convert one transformer model from PyTorch to ONNX format. I have a few questions :I have installed
onnxrutime-gpu
. Does the model generated with the script will be functionning only with GPU or will it work also with CPU onnx runtime ? So, do I have to generate one onnx model per device?Does the ONNX model dependant of the hardware it bas been generated from or do I have to generate the ONNX model on the target hardware where will be run the inference ?
Are the outputs of the ONNX model identical wherever hardware the inference is run on? So, can I use the embeddings generated from the ONNX model but from different hardware platforms?
How can I apply quantization on ONNX model for both CPU and GPU devices ?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions