LocalLLaMA

14 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

When training an LLM how do you decide to use a 7b, 30b, 120b, etc model (assuming you can run them all)? (alien.top)

submitted 2 years ago by paradigm11235@alien.top to c/localllama@poweruser.forum

9 comments fedilink hide all child comments

I guess the question is what is the order we're talking about for requiring to step up to more parameters? I understand its in billions of parameters and that they are basically the weights between the data it was trained on and is used to predict words (I think of it as a big weight map), so like you can expect "sharp sword" more often than "asprin sword."

Is there a limit to the data-size used to train the model to the point that you'll hit a plateau? Like, I imagine training against Shakespire would be harder than Poe because of all the made up words Shakespire uses. I'd probably train Shakespire with his works + wikis and discussions on his work.

I know that's kind of all over the place, I'm kind of fumbling at the topic trying to get a grasp so I can start prying it open.

you are viewing a single comment's thread
view the rest of the comments

[–] rvitor@alien.top 1 points 2 years ago

for training sometimes is better to pick a small model to do some tests and get faster feedback, then you can train in a larger model if you want to, and see how it goes.