LocalLLaMA

3 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago

MODERATORS

communick@poweruser.forum

Is there any way to speed up the MythoMax-L2-13B on a 6GB GPU? (alien.top)

submitted 11 months ago by OverallBit9@alien.top to c/localllama@poweruser.forum

10 comments fedilink hide all child comments

First time testing local text model I don't know much yet.I've seen people with 8GB cards complaining that text generation is very slow so I don't have much hope about that but still... I think I need to do some configuration, when generating text my SSD is at 100% reading 1~2gb/s while my GPU does not reach 15% usage.
Using RTX 2060 6GB, 16GB RAM.
This is the model I am testing ( mythomax-l2-13b.Q8_0.gguf): https://huggingface.co/TheBloke/MythoMax-L2-13B-GGUF/tree/main

you are viewing a single comment's thread
view the rest of the comments

[–] uti24@alien.top 1 points 11 months ago (1 children)

SSD is at 100% reading 1~2gb/s

If your SSD swapping then model does not fit into RAM.

Use smaller quant, like 4_K_M from your own link.

[–] OverallBit9@alien.top 1 points 11 months ago

I am using it now, much better than before.