TheTerrasque

joined 2 years ago
[–] TheTerrasque@alien.top 1 points 2 years ago

70b? Q4, llama.cpp, some layers on gpu.

Might need to run Linux to get the system ram usage low enough

[–] TheTerrasque@alien.top 1 points 2 years ago

I don't know an alternative, but I did some experimenting with it. I kinda rewrote large parts of it, and I also used a custom build of llama.cpp dll's. I'm pretty sure it'll still work with the newest llama.cpp build, you might need to update some native calls if they've been expanded or renamed.

My changes are at https://github.com/TheTerrasque/LLamaSharp/tree/feature/clblast - I haven't really documented it much, but maybe the git history will help

[–] TheTerrasque@alien.top 1 points 2 years ago (1 children)

Well, it gets posted a few times a week, so it kinda is..

[–] TheTerrasque@alien.top 1 points 2 years ago (1 children)

Transferring the state over the internet so the next card can take over is sloooow. You'd want cards that can take a lot of layers to minimize that.

In other words, you want few and big gpu's in the network, not a bunch of small ones.