Hi all
I'm wondering if is there a possibility to spread load of localLLM on multiple hosts instead of adding gpu's to speed up responses.
My host do not have gpu's since I want to be power effective, but they have decent ammont of ram 128.
Thx for all ideas.