LocalLLaMA

3 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago

MODERATORS

communick@poweruser.forum

Spread LLM to multiple hosts (alien.top)

submitted 1 year ago by erick-fear@alien.top to c/localllama@poweruser.forum

3 comments fedilink hide all child comments

Hi all I'm wondering if is there a possibility to spread load of localLLM on multiple hosts instead of adding gpu's to speed up responses. My host do not have gpu's since I want to be power effective, but they have decent ammont of ram 128. Thx for all ideas.

top 3 comments

sorted by: hot top controversial new old

[–] Feeling-Currency-360@alien.top 1 points 1 year ago

Check this out https://github.com/ggerganov/llama.cpp#mpi-build

[–] Feeling-Currency-360@alien.top 1 points 1 year ago

On another note these gpu manufactures must get their head out of their ass and start cranking out cards with much higher memory capacities. First one to do it cost effectively will gain massive market share and huge profits. Nvidia's A100 etc doesn't qualify for this as it's prohibitively expensive.

[–] dodo13333@alien.top 1 points 1 year ago

If i didnt missunderstood your question, answer is Petals:

https://research.yandex.com/blog/petals-decentralized-inference-and-finetuning-of-large-language-models