overview for DustGrouchy1792

Any tricks to speed up 13B models on a 3090? in c/localllama@poweruser.forum

[–] DustGrouchy1792@alien.top 1 points 2 years ago (1 children)

Can I get koboldcpp working with sillytavern without too much of a headache?

Any tricks to speed up 13B models on a 3090? in c/localllama@poweruser.forum

[–] DustGrouchy1792@alien.top 1 points 2 years ago

I'm now using a 4bit GPTQ version of the same model. After generation completes the VRAM goes up to 16.2 GB (out of 24 GB) and I have nothing else using GPU as best I can tell (no browser windows with youtube, etc).

Still only getting a bit under 4.00 tokens per second. So I don't think stuff is getting offloaded to CPU.

1

Any tricks to speed up 13B models on a 3090? (alien.top)

submitted 2 years ago by DustGrouchy1792@alien.top to c/localllama@poweruser.forum

5 comments fedilink

Are there any tricks to speed up 13B models on a 3090?

Currently using the regular huggingface model quantized to 8bit by a GPTQ capable fork of KoboldAI.

Especially when the context limit changes, it's pretty slow and far from even remotely real time.