LocalLLaMA

11 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Why is Mistral-7b so capable? Any ideas re: dataset? (alien.top)

submitted 2 years ago by Fun_Tangerine_1086@alien.top to c/localllama@poweruser.forum

24 comments fedilink hide all child comments

So Mistral-7b is a pretty impressive 7B param model ... but why is it so capable? Do we have any insights into its dataset? Was it trained very far beyond the scaling limit? Any attempts at open reproductions or merges to scale up # of params?

you are viewing a single comment's thread
view the rest of the comments

[–] kindacognizant@alien.top 1 points 2 years ago (1 children)

I'm guessing GQA helped. Llama2 70b and 34b used Grouped Query Attention but it wasn't used for Llama2 7/13b.

https://preview.redd.it/je2q9vhllq0c1.png?width=871&format=png&auto=webp&s=d23b1cdd307dfa54fb4dd788a0f6ea90ee23fa94

[–] Monkey_1505@alien.top 1 points 2 years ago

Knowledge is a strange goal for any model when we have the internet. IMO. Just connect your model to a web search.