Testing larger (30B and 65B) models #164

codybum · 2023-05-17T00:35:15Z

codybum
May 17, 2023

I understand that typically llm are undertrained and simply having a larger model does not mean that results will be superior to smaller models. This being said, has there been an attempt to train or fine-tune 30B or 65B LLaMA-based models? If not, what is preventing the effort other than access to physical hardware?

Cody

haotian-liu · 2023-05-17T18:50:13Z

haotian-liu
May 17, 2023
Maintainer

Hi Cody, thank you for your interest in our work. Currently the main reason is the compute. But we are planning to scale it up if the resources are met. Besides, we are also seeking for other efficient tuning / optimization methods like bitsandbytes.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing larger (30B and 65B) models #164

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Testing larger (30B and 65B) models #164

codybum May 17, 2023

Replies: 1 comment

haotian-liu May 17, 2023 Maintainer

codybum
May 17, 2023

haotian-liu
May 17, 2023
Maintainer