LocalLLaMA

14 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Why is no one releasing 70b models? (alien.top)

submitted 2 years ago by Longjumping-Bake-557@alien.top to c/localllama@poweruser.forum

35 comments fedilink hide all child comments

There has been a lot of movement around and below the 13b parameter bracket in the last few months but it's wild to think the best 70b models are still llama2 based. Why is that?

We have 13b models like 8bit bartowski/Orca-2-13b-exl2 approaching or even surpassing the best 70b models now

you are viewing a single comment's thread
view the rest of the comments

[–] Armym@alien.top 1 points 2 years ago (1 children)

Do you think that finetuning models with more parameters requires more data to actually do something?

[–] thereisonlythedance@alien.top 1 points 2 years ago

With a full finetune I don't think so -- the LIMA paper showed that 1000 high quality samples is enough with a 65B model. With QLoRA and LoRA, I don't know. The number of parameters you're affecting is set by the rank you choose. It's important to get the balance between the rank, dataset size, and learning rate right. Style and structure is easy to impart, but other things not so much. I often wonder how clean the merge process actually is. I'm still learning.