this post was submitted on 30 Oct 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

CPU is a ryzen 7 3700x, with 32gb of ddr4 3000mhz

I loaded the model with ExLlamav2_HF and a 2048 sequence length. It spills, a lot. 11.5gb to be exact, but I read with the right specs I could expect 2-7tokens/s which would be more than bearable.

Is there any way I could optimize it further?

you are viewing a single comment's thread
view the rest of the comments
[–] Aaaaaaaaaeeeee@alien.top 1 points 1 year ago

What model exl2 BPW is used?