LocalLLaMA

3 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago

MODERATORS

communick@poweruser.forum

GPU-over-IP for LLM inference? (alien.top)

submitted 11 months ago by Jugg3rnaut@alien.top to c/localllama@poweruser.forum

1 comments fedilink hide all child comments

Has anyone tried to combine a server with a moderately powerful GPU with a server with a lot of RAM to run inference? Especially with llama. Cpp where you can offload just some of the layers to GPU?

https://github.com/Juice-Labs/Juice-Labs/wiki

you are viewing a single comment's thread
view the rest of the comments

[–] Brave-Decision-1944@alien.top 1 points 11 months ago

I seen something like that in LOLLMs UI, it's called petal, and basically it bandwidth the processing along computers connected to that network. There was also other remote "binding" from same maker as the UI. But I didn't tired those.