this post was submitted on 28 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

I need a bit more info from people who installed Llama2 locally and using it to support web apps, or just local information.

  • What is the ideal hardware for the 65b version?
  • How many tokens can this hardware process per second, input, and output?
  • Regarding safety, since it is used for business, what is the change that this model will end up arguing with the customer ๐Ÿ˜Š ?
you are viewing a single comment's thread
view the rest of the comments
[โ€“] Prudent-Artichoke-19@alien.top 1 points 9 months ago (1 children)

You need a load balancer of some sort but an A6000 would be a good start. 15-20 tps as a single user.

In vanilla form, Llama 2 may do silly stuff. Instructs, tuning, etc. will decrease the likelihood.

If you are taking something to prod, I'd advise picking up a consultant to work with you.

[โ€“] No-Activity-4824@alien.top 1 points 9 months ago
  1. Does it work well with other consumer graphics cards?
  2. Is the 15-20 t/s output or input?
  3. Regarding the fine tuning, Meta is working on it anyway, so hopefully another release at the beginning of 2024 of the same platforms but finetuned.