overview for Helpful-Gene9733

Models Megathread #2 - What models are you currently using? in c/localllama@poweruser.forum

[–] Helpful-Gene9733@alien.top 1 points 11 months ago

Yes that’s the one from The Bloke. I imagine you could, but try it! I can run it on an old i5 3.4 GHz chip with 8GB RAM and it seems to run as long as I’m not trying to keep a bunch of stuff open and using up RAM. I haven’t really used it a lot so can’t tell fully yet.

Models Megathread #2 - What models are you currently using? in c/localllama@poweruser.forum

[–] Helpful-Gene9733@alien.top 1 points 11 months ago (2 children)

With a system limited machine (2017 i5 iMac Cpu only) I am getting very pleasing results with:

Openhermes2-mistral (7B 4bit K_M quant) for general chat, desktop assistant, and some coding assistance - Ollama backend with my own front end U/I and llama-index libraries implementation. Haven’t tried 2.5 but may.

Synatra 7B mistral fine tune (4bit K_M quant) seems to produce longer responses and spicier with same system prompt (same use case as above)

Deepseek-coder 6.7B (4bit quant) as a coding assistant alternative to GPT-3.5 - just trying out in last week or so and building the personalized coding assistant front end u/I for fun

OrcaMini-3B - for chat when I just want something smaller and faster to run on my machine - the 7B quants are about max for the old iMac. But OrcaMini sometimes doesn’t give great stuff for me.

llama.cpp for normies: FreeChat is now live on the mac app store in c/localllama@poweruser.forum

[–] Helpful-Gene9733@alien.top 1 points 11 months ago (1 children)

I just want to add for those that might wonder … this will support running at least up to 7B models (e.g. some nice newer Mistral models!) on a 8GB ram 2017 i5 iMac 3.4 intel … I can get about 4.5 - 6 t/s on the old beast on the 7B model … about 7-8 t/s running the 3B orca_mini. So there’s some hope even for old machines. Thanks for making an app and running it thru the App Store process too!