Replies: 4 comments 1 reply
-
It currently only supports "in-house" loras. https://github.com/ggerganov/llama.cpp/tree/master/examples/finetune I missed this one. So much going on in this project. 😅 https://github.com/ggerganov/llama.cpp/tree/master/examples/export-lora Don't be shy. Poke around the examples. That's the whole point. |
Beta Was this translation helpful? Give feedback.
-
Here is a copy/paste from my terminal session... chris@Chris-Mac-mini llama.cpp % ./main --help | grep LoRA |
Beta Was this translation helpful? Give feedback.
-
Thanks for the response guys! I have been creating LoRA adapters with mlx_lm.lora but the output is in safetensors and sense the convert-lora-to-ggml.py script has been dropped from the project, I opened a request in the mlx project to export LoRAs as ggml - they were asking if llama.cpp supported LoRA in gguf - I think because they already can merge/fuse base models and lora adapters in the gguf format. I was able to modify an old copy of convert-lora-to-ggml.py to export the safetensors as ggml - now I see why it was dropped, it looks like layer names and formats keep changing.
|
Beta Was this translation helpful? Give feedback.
-
I've run into more or less all of the problems mentioned here. I can load a ggla into memory, but I don't think they're being properly applied. I'm still going through all of the lora code to see if the problem is w/ inference or w/ ggml misapplying the loras during load time. As an aside, I ended up writing my own npz/safetensor converter -> ggla, but I think one of the problems there is the attn_q layers need to be transposed correctly for llama3 because of the way ggml works w/ llama3. This is going to be more or less the same for loras w/ any model which llama.cpp is expecting to be in a particular format. |
Beta Was this translation helpful? Give feedback.
-
I know llama.cpp support GGML but I was wonder if GGUF is supported.
Beta Was this translation helpful? Give feedback.
All reactions