LocalLLaMA

4 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Yi-23B-Llama: Distil version of Yi-34B-Llama (huggingface.co)

submitted 2 years ago by Covid-Plannedemic_@alien.top to c/localllama@poweruser.forum

17 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] kristaller486@alien.top 1 points 2 years ago (2 children)

Is there a code for distillation?

[–] mcmoose1900@alien.top 1 points 2 years ago

Oh yeah, it be busted.

[–] llama_in_sunglasses@alien.top 1 points 2 years ago

I had okayish results blowing up layers from 70b... but messing with the first or last 20% lobotomizes the model, and I didn't snip more than a couple layers from any one place. By the time I got the model far enough down in size that q2_K could load in 24GB of VRAM it fell apart, so I didn't consider mergekit all that useful of a distillation/parameter reduction process.