overview for Saofiqlord

llama2 70B from HF vs code-llama 34B in c/localllama@poweruser.forum

[–] Saofiqlord@alien.top 1 points 11 months ago (1 children)

Did you forget to unset the rope settings?

Codellama requires different rope than regular llama.

Also check your sampler settings.

Is there any way to speed up the MythoMax-L2-13B on a 6GB GPU? in c/localllama@poweruser.forum

[–] Saofiqlord@alien.top 1 points 11 months ago (1 children)

Your issue is using q8. Be real, you only have 6gb of vram, not 24.

Your hardware can't run q8 at a decent speed.

Use q4_k_s, you can offload much more to gpu. There's degradation yes, but its not so bad.

Venus-120b: A merge of three different models in the style of Goliath-120b in c/localllama@poweruser.forum

[–] Saofiqlord@alien.top 1 points 11 months ago (2 children)

Huh, interesting weave, it did feel like it made less spelling and simple errors when comparing it to goliath.

Once again Euryale's included. The lack of xwin makes it better imo, Xwin may be smart but it has repetition issues at long context, that's just my opinion.

I'd honestly scale it down, there's really no need to go 120b, from testing a while back ~90-100b frankenmerges have the same effect.

OpenChat finetunes? in c/localllama@poweruser.forum

[–] Saofiqlord@alien.top 1 points 11 months ago

Isn't OpenChat a fine tune of Mistral?

Why would anyone finetune on top of that?

It's not a good idea.