Pashax22

joined 10 months ago
[–] Pashax22@alien.top 1 points 10 months ago

Yes, of course. I'm running them on a 4070Ti with only 12Gb of RAM - sometimes I have to accept slower speeds, but I can still run more or less anything I might reasonably want to (and a few things that are distinctly UNreasonable).

[–] Pashax22@alien.top 1 points 10 months ago

Best? Goliath-120b. It's good, better than the 70b models I've used. Currently available on KoboldHorde if you want to try it, or there are GGUFs etc if you have the compute to run it locally. If that's just a little too rich for your tastes, then Xwin-70b is probably the go-to at high parameter counts.

Best with any sort of reasonable hardware requirements? Mlewd-20b is good, Xwin-Mlewd-13b is good, and some of the Mistral-7b merges are punching way above their weight. Check out Dolphin-2.2.1-Mistral-7b, and be amazed at the comparison with 7b models from 3 months back.

[–] Pashax22@alien.top 1 points 10 months ago

I get about 30 t/s on my 12Gb 4070Ti with Zephyr, so something is definitely borked. 0.8 is what I would expect from a 70b model running on CPU and system RAM. Make sure you're offloading as many layers to GPU as your system can handle (in this case, all of them).