this post was submitted on 27 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

Hi all,

Just curious if anybody knows the power required to make a llama server which can serve multiple users at once.

Any discussion is welcome:)

you are viewing a single comment's thread
view the rest of the comments
[โ€“] Aggressive-Drama-899@alien.top 1 points 11 months ago (1 children)

We run llama 2 70b for around 20-30 active users using TGI and 4xA100 80gb on Kubernetes. If 2 users send a request at the exact same time, there is about a 3-4 second delay for the second user. Never really had any complaints around speed from people as of yet. We do have the ability to spin up multiple new containers if it became a problem though. This is all on prem

[โ€“] Appropriate-Tax-9585@alien.top 1 points 11 months ago

Thank you, this is really good to hear!