LocalLLaMA

1 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago

MODERATORS

submitted 9 months ago by No-Activity-4824@alien.top to c/localllama@poweruser.forum

2 comments fedilink hide all child comments

I need a bit more info from people who installed Llama2 locally and using it to support web apps, or just local information.

What is the ideal hardware for the 65b version?
How many tokens can this hardware process per second, input, and output?
Regarding safety, since it is used for business, what is the change that this model will end up arguing with the customer 😊 ?

you are viewing a single comment's thread
view the rest of the comments

[–] Prudent-Artichoke-19@alien.top 1 points 9 months ago (1 children)

You need a load balancer of some sort but an A6000 would be a good start. 15-20 tps as a single user.

In vanilla form, Llama 2 may do silly stuff. Instructs, tuning, etc. will decrease the likelihood.

If you are taking something to prod, I'd advise picking up a consultant to work with you.

[–] No-Activity-4824@alien.top 1 points 9 months ago

Does it work well with other consumer graphics cards?
Is the 15-20 t/s output or input?
Regarding the fine tuning, Meta is working on it anyway, so hopefully another release at the beginning of 2024 of the same platforms but finetuned.