overview for Longjumping-Bake-557

X.AI Grok could potentially be open sourced on a 6 month delay from launch in c/localllama@poweruser.forum

[–] Longjumping-Bake-557@alien.top 1 points 2 years ago (1 children)

Which is hilarious seeing how he kept joking about "closedai" for a while

1

Why is no one releasing 70b models? (alien.top)

submitted 2 years ago by Longjumping-Bake-557@alien.top to c/localllama@poweruser.forum

35 comments fedilink

There has been a lot of movement around and below the 13b parameter bracket in the last few months but it's wild to think the best 70b models are still llama2 based. Why is that?

We have 13b models like 8bit bartowski/Orca-2-13b-exl2 approaching or even surpassing the best 70b models now

1

Discrepancy between TheBloke_Orca-2-13B-GPTQ and the original one with the tested logic question (alien.top)

submitted 2 years ago by Longjumping-Bake-557@alien.top to c/localllama@poweruser.forum

2 comments fedilink

NVidia H200 achieves nearly 12,000 tokens/sec on Llama2-13B with TensorRT-LLM in c/localllama@poweruser.forum

[–] Longjumping-Bake-557@alien.top 1 points 2 years ago

And that's on a die just slightly bigger than the 4090. Unless they increased the size compared to h100?

1

SynthIA 70b on a single 3090, 0.17tokens/s (alien.top)

submitted 2 years ago by Longjumping-Bake-557@alien.top to c/localllama@poweruser.forum

1 comments fedilink

CPU is a ryzen 7 3700x, with 32gb of ddr4 3000mhz

I loaded the model with ExLlamav2_HF and a 2048 sequence length. It spills, a lot. 11.5gb to be exact, but I read with the right specs I could expect 2-7tokens/s which would be more than bearable.

Is there any way I could optimize it further?