How to approach 4-Bit LoRAs #1101
mcmonkey4eva
started this conversation in
General
Replies: 1 comment
-
For anyone following this, ooba didn't reply here but did in one of the linked threads -- the answer is Option 1, apply the forks and patches the janky-but-immediately-working way - and pushed #1200 And it does work, immediately and perfectly! (At least on Linux, I've seen users complain about trouble on Windows) I've also got 4-bit training working in this PR: #1334 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This is pretty much a direct request for comment from @oobabooga , just, uh, public discussion post format.
As it currently stands, I see three primary routes to achieve 4-bit (or 3-bit) LoRA support (inference & training):
Option 1
Option 1 is follow the example of https://github.com/Ph0rk0z/text-generation-webui-testing/tree/DualModel and use a pile of highly specific forks as dependencies.
Pros: can get it working quickly and directly, should Just Work (TM) from there.
Cons: Not a good long-term plan (fork maintenance), messy (probably painful for users to install)
Option 2
Option 2 is wait on johnsmith0031/alpaca_lora_4bit#13 - convince the people developing support for these things to PR their work upstream (to peft & GPTQ in particular) so that we can just use it from there.
Pros: Dirt simple on the text-gen-webui side of things, more beneficial to the broader community outside of here.
Cons: Seems to be going slow, and at least one of the applicable devs is just explicitly refusing to try to make a proper PR out of it right now it seems.
Option 3
Option 3 is xturing #1088 - an external project that has their own tooling for this.
Pros: their code at a glance looks really high-quality and clean, and int4 LoRAs work out-of-the-box.
Cons: they have their whole own system and format and all. We'd have to essentially either replace GPTQ with it, or support both GPTQ and xturing side by side with different impl details and make a mess of it. Also doesn't look to have 3-bit, only 4-bit. Also, their repo is a bit... odd. The only 4bit code they appear to have is a python notebook that just downloads an external docker image? Something isn't adding up there. Their website appears to be a paid remote-access-only AI web-platform, not the innocent university research group project for AI tech their github pretends to be.
Personally, I really want Option 2 to come to fruition. I wish I could just slap people and yell your code works! that's good enough! Open a PR! Get upstream eyes on it at least! AAAH!!
Option 1 is my second-favorite, but I'm not even going to attempt that unless ooba really wants the codebase mess that comes with it.
Option 3 looks really promising in theory, but the major rewrites to support it are likely to be painful, and considering the lack of active development relative to GPTQ, it might be harmful in the long-term. Not to mention the suspicious oddities with that repo.
Beta Was this translation helpful? Give feedback.
All reactions