this post was submitted on 29 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

First time testing local text model I don't know much yet.I've seen people with 8GB cards complaining that text generation is very slow so I don't have much hope about that but still... I think I need to do some configuration, when generating text my SSD is at 100% reading 1~2gb/s while my GPU does not reach 15% usage.
Using RTX 2060 6GB, 16GB RAM.
This is the model I am testing ( mythomax-l2-13b.Q8_0.gguf): https://huggingface.co/TheBloke/MythoMax-L2-13B-GGUF/tree/main

โ€‹

you are viewing a single comment's thread
view the rest of the comments
[โ€“] aseichter2007@alien.top 1 points 11 months ago

So, you dont have enough ram to fit that model. It's actually overrunning your ram entirely and using the wrong kind of vram, virtual ram, aka paged memory.

Idk what you're trying to do but the best answer is openhermes 2.5 mistral 7B Q3 and 4k context or similar or maybe Rocket 3B Q6 would be even faster.

Hermes is king. I understand why you want that model, but 13bQ8 is huge, 17GBish memory at 8k context.

it will speed up if you get it off the hard drive at least, try a Q3k_l if you're determined to run mythomax.