running inference in parallel in multiple threads #565

oppiliappan · 2023-10-10T07:55:06Z

oppiliappan
Oct 10, 2023

some context about this:

i have implemented the BERT architecture for ggml, albeit into the rust crate: llm in this PR
attempting to embed multiple sequences in parallel across multiple threads slows things down by a lot - pointing to some sort of internal contention
this could be an issue with the rust crate itself, in which case this discussion is better moved there

i suppose my question here is then: is it possible that there is some sort of internal contention going on when attempting to run multiple embeddings across threads (in the same process)? i should mention that this is a checkout of ggml before GGUF. i've noticed this happen the following configurations:

macos with metal
macos with cpu
linux with cpu

and if this is the case, can this be fixed by maybe running separate processes altogether?

Answered by slaren

Oct 10, 2023

Most (all?) of the synchronization is done through spin locks, so using more threads than physically available can have disastrous effects on the performance. You are likely to get better performance if you serialize the requests. I also suggest looking into batched decoding in llama.cpp, that should be the best way to process multiple sequences simultaneously.

View full answer

slaren · 2023-10-10T10:42:28Z

slaren
Oct 10, 2023
Collaborator

Most (all?) of the synchronization is done through spin locks, so using more threads than physically available can have disastrous effects on the performance. You are likely to get better performance if you serialize the requests. I also suggest looking into batched decoding in llama.cpp, that should be the best way to process multiple sequences simultaneously.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

running inference in parallel in multiple threads #565

{{title}}

Replies: 1 comment

{{title}}

Select a reply

running inference in parallel in multiple threads #565

oppiliappan Oct 10, 2023

Replies: 1 comment

slaren Oct 10, 2023 Collaborator

oppiliappan
Oct 10, 2023

slaren
Oct 10, 2023
Collaborator