home
-
all
|
technology
-
piracy
-
linux
-
asklemmy
-
memes
-
selfhosted
-
technology
-
nostupidquestions
-
mildlyinfuriating
-
games
-
privacy
-
gaming
-
opensource
-
programmerhumor
-
showerthoughts
-
lemmyworld
-
fediverse
-
android
-
asklemmy
-
lemmyshitpost
-
more ยป
log in
or
sign up
|
settings
pablines@alien.top
overview
[+]
[โ]
pablines
joined 1 year ago
sorted by:
new
top
controversial
old
What kind of specs to run local llm and serve to say up to 20-50 users
in
c/localllama@poweruser.forum
[โ]
pablines@alien.top
1 points
11 months ago
Hugging face text inference can handle concurrency you just need to power with gpus
permalink
fedilink
source
Rocket ๐ฆ - smol model that overcomes models much larger in size
in
c/localllama@poweruser.forum
[โ]
pablines@alien.top
1 points
11 months ago
Woooooooow!
permalink
fedilink
source
Hugging face text inference can handle concurrency you just need to power with gpus