Longjumping-Bake-557

joined 1 year ago
[โ€“] Longjumping-Bake-557@alien.top 1 points 11 months ago (1 children)

Which is hilarious seeing how he kept joking about "closedai" for a while

 

There has been a lot of movement around and below the 13b parameter bracket in the last few months but it's wild to think the best 70b models are still llama2 based. Why is that?

We have 13b models like 8bit bartowski/Orca-2-13b-exl2 approaching or even surpassing the best 70b models now

 

And that's on a die just slightly bigger than the 4090. Unless they increased the size compared to h100?

 

CPU is a ryzen 7 3700x, with 32gb of ddr4 3000mhz

I loaded the model with ExLlamav2_HF and a 2048 sequence length. It spills, a lot. 11.5gb to be exact, but I read with the right specs I could expect 2-7tokens/s which would be more than bearable.

Is there any way I could optimize it further?