LocalLLaMA

11 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

SynthIA 70b on a single 3090, 0.17tokens/s (alien.top)

submitted 2 years ago by Longjumping-Bake-557@alien.top to c/localllama@poweruser.forum

1 comments fedilink hide all child comments

CPU is a ryzen 7 3700x, with 32gb of ddr4 3000mhz

I loaded the model with ExLlamav2_HF and a 2048 sequence length. It spills, a lot. 11.5gb to be exact, but I read with the right specs I could expect 2-7tokens/s which would be more than bearable.

Is there any way I could optimize it further?

you are viewing a single comment's thread
view the rest of the comments

[–] Aaaaaaaaaeeeee@alien.top 1 points 2 years ago

What model exl2 BPW is used?

permalink
fedilink
source