Saofiqlord

joined 11 months ago
[–] Saofiqlord@alien.top 1 points 11 months ago (1 children)

Did you forget to unset the rope settings?

Codellama requires different rope than regular llama.

Also check your sampler settings.

[–] Saofiqlord@alien.top 1 points 11 months ago (1 children)

Your issue is using q8. Be real, you only have 6gb of vram, not 24.

Your hardware can't run q8 at a decent speed.

Use q4_k_s, you can offload much more to gpu. There's degradation yes, but its not so bad.

[–] Saofiqlord@alien.top 1 points 11 months ago (2 children)

Huh, interesting weave, it did feel like it made less spelling and simple errors when comparing it to goliath.

Once again Euryale's included. The lack of xwin makes it better imo, Xwin may be smart but it has repetition issues at long context, that's just my opinion.

I'd honestly scale it down, there's really no need to go 120b, from testing a while back ~90-100b frankenmerges have the same effect.

[–] Saofiqlord@alien.top 1 points 11 months ago

Isn't OpenChat a fine tune of Mistral?

Why would anyone finetune on top of that?

It's not a good idea.