-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml inference time is significantly slower than onnxruntime #841
Comments
Did you build in Release? What do the The |
Thanks for your reply. I build in master branch.
|
I tested mobilenetv2 inference on the release branch code, and the inference time was about the same. |
By Release I mean to build with |
I build with |
Make sure you are building with AVX2 support and ramp up the threads a bit: const int n_threads = 4;
ggml_graph_compute_with_ctx(ctx0, gf, n_threads); |
I use ggml to deploy the mobilenetv2 model, and compared with the deployment using onnxruntime, I found that the inference time of ggml is nearly 100 times that of onnxruntime. My ggml inference code part is as follows:
Is there something wrong when I build the model? Do you have any suggestions? Thanks in advance.
The text was updated successfully, but these errors were encountered: