TheTerrasque

joined 1 year ago
[–] TheTerrasque@alien.top 1 points 11 months ago

70b? Q4, llama.cpp, some layers on gpu.

Might need to run Linux to get the system ram usage low enough

[–] TheTerrasque@alien.top 1 points 11 months ago

I don't know an alternative, but I did some experimenting with it. I kinda rewrote large parts of it, and I also used a custom build of llama.cpp dll's. I'm pretty sure it'll still work with the newest llama.cpp build, you might need to update some native calls if they've been expanded or renamed.

My changes are at https://github.com/TheTerrasque/LLamaSharp/tree/feature/clblast - I haven't really documented it much, but maybe the git history will help

[–] TheTerrasque@alien.top 1 points 11 months ago (1 children)

Well, it gets posted a few times a week, so it kinda is..

[–] TheTerrasque@alien.top 1 points 1 year ago (1 children)

Transferring the state over the internet so the next card can take over is sloooow. You'd want cards that can take a lot of layers to minimize that.

In other words, you want few and big gpu's in the network, not a bunch of small ones.