overview for Murky-Ladder8684

Running Multiple WebUI instances (follow up from my question yesterday) in c/localllama@poweruser.forum

[–] Murky-Ladder8684@alien.top 1 points 11 months ago

If you learn AutoGen you could assign each model to a different agent and have them interact. If using the same model and having multiple char talk is your thing than the sillytavern group option is the way.

Goliath120B.exl2 way too emotional in c/localllama@poweruser.forum

[–] Murky-Ladder8684@alien.top 1 points 11 months ago

I've been checking out the latest models of people tweaking goliath120b. I found this one to be the best by far with that issue and the strange spelling stuff. Might be worth giving a try to compare for yourself: https://huggingface.co/LoneStriker/Tess-XL-v1.0-4.85bpw-h6-exl2 (Lonestriker has other bpw)

Exllama outside of text generation webui? in c/localllama@poweruser.forum

[–] Murky-Ladder8684@alien.top 1 points 11 months ago

Check out turbo's project https://github.com/turboderp/exui

He just put it up not long ago and he has Speculative Decoding working on it. I tried it with Goliath 120b 4.85bpw exl2 and was getting 11-13 t/s vs 6-8 t/s without it. It's barebones but works.

XTTS = Garbage Results (Help!) in c/localllama@poweruser.forum

[–] Murky-Ladder8684@alien.top 1 points 11 months ago

In the instructions on github it said to use mono 24000 wav. Double check the info though.

Is it worth using a bunch of old GTX 10 series cards ( like 1060 1070 1080 ) for running local LLM? in c/localllama@poweruser.forum

[–] Murky-Ladder8684@alien.top 1 points 11 months ago

Those series of Nvidia gpus didn't have tensor cores yet and believe they started in 20xx series. I not sure how much it impacts inference purposes vs training/fine tuning but worth doing more research. From what I gathered the answer is "no" unless you use a 10xx for like monitor output, TTS, or other smaller co-llm use that you don't want taking vram away from your main LLM GPUs.

How much more stupid is the 120B goliath Q3_K_M than the larger options? in c/localllama@poweruser.forum

[–] Murky-Ladder8684@alien.top 1 points 1 year ago

for comparison sake EXL2 4.85bpw version runs around 6-8 t/s on 4x3090s at 8k context it's the lower end.

How much more stupid is the 120B goliath Q3_K_M than the larger options? in c/localllama@poweruser.forum

[–] Murky-Ladder8684@alien.top 1 points 1 year ago

4x3090s will run it at over 4bits.