pablines

joined 1 year ago

What kind of specs to run local llm and serve to say up to 20-50 users in c/localllama@poweruser.forum

[–] pablines@alien.top 1 points 11 months ago

Hugging face text inference can handle concurrency you just need to power with gpus

Rocket 🦝 - smol model that overcomes models much larger in size in c/localllama@poweruser.forum

[–] pablines@alien.top 1 points 11 months ago

Woooooooow!