this post was submitted on 02 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

Using gptq if there is not enough video memory on the GPU. How do others do it?

I read somewhere that a video card can use RAM to compensate for the lack of its own memory, but the memory taken from the RAM will be 10 times slower. How to do it? If I'm not mistaken, then for this you need to install a specific version of the video card driver. I have a 3060 12GB and 64GB of RAM.

Maybe this is not the smartest idea, considering that I can get good speed using GGUF, but I heard that if I use exllama2, the speed will be 2 times faster when using a video card.

Help me figure out what's what.

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here