Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How long did it take? #3

Open
fareesh opened this issue Mar 25, 2023 · 7 comments
Open

How long did it take? #3

fareesh opened this issue Mar 25, 2023 · 7 comments

Comments

@fareesh
Copy link

fareesh commented Mar 25, 2023

Assuming Lambda Labs 8xA100 and 80gb, which is 12 bucks. Can get a reasonable $ estimate that way.

@teknium1
Copy link

Are you talking about fine tuning or generating dataset?

You use gpt api's to generate the dataset, it doesn't require running anything on a gpu cloud vps

@MarkSchmidty
Copy link

MarkSchmidty commented Mar 25, 2023

For comparison Alpaca-7B took 3 hours on 3xA100 and LoRA/PEFT reduces compute requirements two orders of magnitude for similar results.

So likely only a couple of hours and also likely that can be reduced to a few minutes on a single consumer GPU by swapping out the fine tuning process with LoRA using the peft library.

Even Alpaca-30B can be trained in a few hours on a single 3090 using 4bit peft (not officially supported in the peft library yet, but has been done).

@vgoklani
Copy link

Hey @MarkSchmidty do you have a link for the 4-bit peft - i'd like to see those results.

I think this is one of the few repos that actually fine-tuned llama as opposed to just using LoRA. I personally find LoRA suspicious, how is it possible that we can just freeze the model and add low-rank tensors to the query/key in the attention matrix and get comparable results to fine tuning (which is much more expensive). I did see the results in the Microsoft paper, but I'm still finding it hard to believe...

@MarkSchmidty
Copy link

@vgoklani Generally you must merge the 16bit peft into a 16bit model and then quantize the resulting merged model down to 4bit if you want 4bit inference. The quality of the peft part falls apart at this point. So native finetuning does have a benefit over LoRA/peft if you're planning to quantize down to 4bit.

That said, this is the project which finetunes LoRAs in 4bit directly. This avoids the quality loss of quantizing after fine tuning, producing 4bit pefts about as good as native finetunes: https://github.com/johnsmith0031/alpaca_lora_4bit There are finetunes of all sizes mentioned in the Issues section.

@teknium1
Copy link

Hey @MarkSchmidty do you have a link for the 4-bit peft - i'd like to see those results.

I think this is one of the few repos that actually fine-tuned llama as opposed to just using LoRA. I personally find LoRA suspicious, how is it possible that we can just freeze the model and add low-rank tensors to the query/key in the attention matrix and get comparable results to fine tuning (which is much more expensive). I did see the results in the Microsoft paper, but I'm still finding it hard to believe...

I've tried 7B full fine tune alpaca and a 7b LORA and I find the lora to be greatly lacking

@MarkSchmidty
Copy link

I've tried 7B full fine tune alpaca and a 7b LORA and I find the lora to be greatly lacking

But was the LoRA created in 16bit or in 4bit and where you running inference in 16bit or in 4bit? LoRA made in 16bit with inference in 16bit is quite good, same with LoRA made in 4bit with inference in 4bit.

It's LoRAs made in 16bit with inference in 4bit that are "greatly lacking" I find.

@teknium1
Copy link

I've tried 7B full fine tune alpaca and a 7b LORA and I find the lora to be greatly lacking

But was the LoRA created in 16bit or in 4bit and where you running inference in 16bit or in 4bit? LoRA made in 16bit with inference in 16bit is quite good, same with LoRA made in 4bit with inference in 4bit.

It's LoRAs made in 16bit with inference in 4bit that are "greatly lacking" I find.

I haven't run inference on any llama/based model in 4bit so I can't comment on that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants