You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Testing on ubuntu running an ubuntu docker container (FROM nvidia/cuda:12.3.2-cudnn9-devel-ubuntu22.04). I've tried loading the layers on to a single GPU using the device dummy map, and loading on both GPUs using the device mapper. These are 3090s and testing is done with Phi 3 mini.
The text was updated successfully, but these errors were encountered:
I need to test out the version of cuda specified in the docker container and if that doesn't work I will test the benchmark following the instructions from the announcement linked above.
This reports mistral.rs as being faster than llama.cpp: #612
But I'm seeing much slower speeds for the same prompt/settings.
Mistral.rs
Usage { completion_tokens: 501, prompt_tokens: 28, total_tokens: 529, avg_tok_per_sec: 16.980707, avg_prompt_tok_per_sec: 76.08695, avg_compl_tok_per_sec: 16.27416, total_time_sec: 31.153, total_prompt_time_sec: 0.368, total_completion_time_sec: 30.785 }
llama.cpp
timings: {\"predicted_ms\": 4007.64, \"prompt_per_token_ms\": 0.7041786, \"predicted_per_token_ms\": 8.01528, \"prompt_ms\": 19.717, \"prompt_per_second\": 1420.0944, \"predicted_n\": 500.0, \"prompt_n\": 28.0, \"predicted_per_second\": 124.7617},
The code I'm using to init mistral.rs:
https://github.com/ShelbyJenkins/llm_client/blob/b1edca89bbdc34b884907fd39be6eedabf10d81b/src/llm_backends/mistral_rs/builder.rs#L110
I'm using the basic completion tests here:
https://github.com/ShelbyJenkins/llm_client/blob/b1edca89bbdc34b884907fd39be6eedabf10d81b/src/basic_completion.rs#L158
Testing on ubuntu running an ubuntu docker container (FROM nvidia/cuda:12.3.2-cudnn9-devel-ubuntu22.04). I've tried loading the layers on to a single GPU using the device dummy map, and loading on both GPUs using the device mapper. These are 3090s and testing is done with Phi 3 mini.
The text was updated successfully, but these errors were encountered: