LocalLLaMA

11 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Best model-setup for CPU-only? (alien.top)

submitted 2 years ago by andromedians@alien.top to c/localllama@poweruser.forum

11 comments fedilink hide all child comments

Have got some server hardware: One i use for games @ 18c36t 3,2Ghz 128Gb RAM (GTX970 so GPU processing is a no-go i assume) the other similar, but will have 256Gb. What’s best for these?

i’m only starting out and don’t understand the terms and measurements yet, but i am in the process and preparing the softwares to try. i would like to focus around the best options available to me.

Thanks

top 11 comments

sorted by: hot top controversial new old

[–] pulse77@alien.top 1 points 2 years ago

GPU - even with low VRAM - will speedup your prompt evaluation...

[–] candre23@alien.top 1 points 2 years ago (1 children)

Yes, your GPU is too old to be useful for offloading, but you could still use it for prompt processing acceleration at least.

With your hardware, you want to use koboldCPP. This uses models in GGML/GGUF format. You should have no issue running models up to 120b with that much RAM, but large models will be incredibly slow (like 10+ minutes per response) running on CPU only. Recommend sticking to 13b models unless you're incredibly patient.

[–] andromedians@alien.top 1 points 2 years ago (1 children)

Thanks, was preparing a bit already. Have some variations of the software- and Wizard-Vicuna13B-Uncensored.Q5_K_M, which stood out to me for some reason…lzlv (q5km) from a review in this sub, a 70b one (the best). Can alternate depending on need. All the ones from gpt4all, and 7b(+unfiltered) q4 Lora-s, which for some reason are the only ones hosted on torrent sites.

[–] candre23@alien.top 1 points 2 years ago (1 children)

70b models will be extremely slow on pure CPU, but you're welcome to try. There's no point in looking on "torrent sites" for LLMs - literally everything is hosted on huggingface.

[–] andromedians@alien.top 1 points 2 years ago

i now know…what i was looking for was if maybe some disallowed, more powerful versions are on there…nay…time to start exploring soon

[–] uti24@alien.top 1 points 2 years ago (1 children)

For not biggest models are the best, so there is no best model for CPU, there is best model you are ready and willing to wait an answer.

Like Goliath-120B is great, and I am using it on i5-12400, having 0.4 tokens/second and I don't want anything less now.

[–] andromedians@alien.top 1 points 2 years ago

All right - the wait emphasises the quality of the question then. Will have a go - thanks!

[–] uhuge@alien.top 1 points 2 years ago

Some 70b llama in 8bit GGUF would be cool, you can play with Goliath 120b in <8 bpw.

[–] NorthCryptographer39@alien.top 1 points 2 years ago (1 children)

Are you on x99 platform

[–] andromedians@alien.top 1 points 2 years ago

The Chinese variations to be more specific, made to run all-core turbo. Have a E5-2696 V3 and a dual E5-2687 V4. Plus twelve E5-2697 V3 for later, an experimental project. They should all be great value - as good as it gets.

[–] andromedians@alien.top 1 points 2 years ago

Well, there is a good chance the household will have a system with the powerful Nvidias at some point…by then will have familiarized with the whole thing anyway. Should you have any guides that are very practical, they are appreciated.