HvskyAI

joined 11 months ago
[–] HvskyAI@alien.top 1 points 11 months ago (1 children)

I’m late to the party on this one.

I’ve been loving the 2.4BPW EXL2 quants from Lone Striker recently, specifically using Euryale 1.3 70B and LZLV 70B.

Even at the smaller quant, they’re very capable, and leagues ahead of smaller models in terms of comprehension and reasoning. Min-P sampling parameters have been a big step forward, as well.

The only downside I can see is the limitation to context length on a single 24GB VRAM card. Perhaps further testing of Nous-Capyabara 34B at 4.65BPW on EXL2 is in order.

[–] HvskyAI@alien.top 1 points 11 months ago

Agreed - I’m personally using 70B models at 2.4BPW EXL2 quants, as well. They hold up great even at a small quantization as long as sampling parameters are set correctly, and the models are subjectively more pleasant in prose (Euryale 1.3 and LZLV both come to mind).

At 2.4BPW, they fit into 24GB of VRAM and inference is extremely fast, and EXL2 also appears to be very promising as a quantization method. I believe the potential upsides are yet to be fully leveraged.

[–] HvskyAI@alien.top 1 points 11 months ago

I am having better luck with 2.4BPW EXL2 quants of 70B models from Lone_Striker lately - Euryale 1.3, LZLV, etc.

Even at the smaller quants, they are quite strong at the correct settings. Easily comparable to a 34B at Q4_K_M, from my experience.

[–] HvskyAI@alien.top 1 points 11 months ago

I see, the model does tend to run a bit hot as-is. I’ll go ahead and try these settings out tomorrow.

[–] HvskyAI@alien.top 1 points 11 months ago

Yes, the BOS token is disabled in my parameters

 

Messing around with Yi-34B based models (Nous-Capyabara, Dolphin 2.2) lately, I’ve been experiencing repetition in model output, where sections of previous outputs are included in later generations.

This appears to persist with both GGUF and EXL2 quants, and happens regardless of Sampling Parameters or Mirostat Tau settings.

I was wondering if anyone else has experienced similar issues with the latest finetunes, and if they were able to resolve the issue. The models appear to be very promising from Wolfram’s evaluation, so I’m wondering what error I could be making.

Currently using Text Generation Web UI with SillyTavern as a front-end, Mirostat at Tau values between 2~5, or Midnight Enigma with Rep. Penalty at 1.0.