Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow CUDA inference speed #763

Open
ShelbyJenkins opened this issue Sep 8, 2024 · 2 comments
Open

Slow CUDA inference speed #763

ShelbyJenkins opened this issue Sep 8, 2024 · 2 comments

Comments

@ShelbyJenkins
Copy link

ShelbyJenkins commented Sep 8, 2024

This reports mistral.rs as being faster than llama.cpp: #612

But I'm seeing much slower speeds for the same prompt/settings.

Mistral.rs
Usage { completion_tokens: 501, prompt_tokens: 28, total_tokens: 529, avg_tok_per_sec: 16.980707, avg_prompt_tok_per_sec: 76.08695, avg_compl_tok_per_sec: 16.27416, total_time_sec: 31.153, total_prompt_time_sec: 0.368, total_completion_time_sec: 30.785 }

llama.cpp
timings: {\"predicted_ms\": 4007.64, \"prompt_per_token_ms\": 0.7041786, \"predicted_per_token_ms\": 8.01528, \"prompt_ms\": 19.717, \"prompt_per_second\": 1420.0944, \"predicted_n\": 500.0, \"prompt_n\": 28.0, \"predicted_per_second\": 124.7617},

The code I'm using to init mistral.rs:
https://github.com/ShelbyJenkins/llm_client/blob/b1edca89bbdc34b884907fd39be6eedabf10d81b/src/llm_backends/mistral_rs/builder.rs#L110

I'm using the basic completion tests here:
https://github.com/ShelbyJenkins/llm_client/blob/b1edca89bbdc34b884907fd39be6eedabf10d81b/src/basic_completion.rs#L158

Testing on ubuntu running an ubuntu docker container (FROM nvidia/cuda:12.3.2-cudnn9-devel-ubuntu22.04). I've tried loading the layers on to a single GPU using the device dummy map, and loading on both GPUs using the device mapper. These are 3090s and testing is done with Phi 3 mini.

@ShelbyJenkins ShelbyJenkins added the bug Something isn't working label Sep 8, 2024
@EricLBuehler EricLBuehler added optimization and removed bug Something isn't working labels Sep 8, 2024
@ShelbyJenkins
Copy link
Author

I need to test out the version of cuda specified in the docker container and if that doesn't work I will test the benchmark following the instructions from the announcement linked above.

@ShelbyJenkins
Copy link
Author

Updated to the same docker image but not the dockerfile. No changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants