LocalLLaMA

1 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago

MODERATORS

submitted 10 months ago by rihard7854@alien.top to c/localllama@poweruser.forum

23 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] a_beautiful_rhind@alien.top 1 points 10 months ago

70b with 2048 context and 128 reply is about 303 t/s.

That sounds more reasonable. And assuming they aren't quantized. The batch size is just theoretical batch I think.