this post was submitted on 31 Oct 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

Are there any tricks to speed up 13B models on a 3090?

Currently using the regular huggingface model quantized to 8bit by a GPTQ capable fork of KoboldAI.

Especially when the context limit changes, it's pretty slow and far from even remotely real time.

you are viewing a single comment's thread
view the rest of the comments
[โ€“] DustGrouchy1792@alien.top 1 points 10 months ago (1 children)

Can I get koboldcpp working with sillytavern without too much of a headache?

[โ€“] StaplerGiraffe@alien.top 1 points 10 months ago

Sure, it provides the same API as KoboldAI.