this post was submitted on 28 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

There has been a lot of movement around and below the 13b parameter bracket in the last few months but it's wild to think the best 70b models are still llama2 based. Why is that?

We have 13b models like 8bit bartowski/Orca-2-13b-exl2 approaching or even surpassing the best 70b models now

you are viewing a single comment's thread
view the rest of the comments
[–] __JockY__@alien.top 1 points 11 months ago (7 children)

It took 3,311,616 hours of training for the llama2 70b base model. At $1/hour for an A100 GPU you’d spend just over $3M and it would take approximately 380 years to train the model.

Scale that across 10,000 GPUs and you’re looking at 2 weeks and a couple of million dollars.

Fine tune training is much, much faster and cheaper.

[–] Exotic-Estimate8355@alien.top 1 points 11 months ago (2 children)

$1/hour for an A100 ? Where? I can barely get one in GCE and it’s almost 4$ / hr

[–] __JockY__@alien.top 1 points 11 months ago

Yes, but you don't have Meta's purchasing power to rent 10,000 GPUs for a month. Economies of scale, my friend!

load more comments (1 replies)
load more comments (5 replies)