You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm not familiar with llama-cpp-python (actually not familiar with cpp) but I have to use gguf model for my project.
I want to generate answer from pre-computed embedding vecotrs(torch.Tensor) with size (1, n_tokens, 4096), not from query text.
(Just like inputs_embeds argument of generate() function of transformers model)
Is this feature already implemented? If not, please anyone help me where should I begin.
The text was updated successfully, but these errors were encountered:
Hi, I'm not familiar with llama-cpp-python (actually not familiar with cpp) but I have to use gguf model for my project.
I want to generate answer from pre-computed embedding vecotrs(torch.Tensor) with size (1, n_tokens, 4096), not from query text.
(Just like inputs_embeds argument of generate() function of transformers model)
Is this feature already implemented? If not, please anyone help me where should I begin.
The text was updated successfully, but these errors were encountered: