LocalLLaMA

1 readers

1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago

MODERATORS

communick@poweruser.forum

Using gptq if there is not enough video memory on the GPU. How do others do it? (alien.top)

submitted 10 months ago by Secret_Joke_2262@alien.top to c/localllama@poweruser.forum

0 comments fedilink hide all child comments

Using gptq if there is not enough video memory on the GPU. How do others do it?

I read somewhere that a video card can use RAM to compensate for the lack of its own memory, but the memory taken from the RAM will be 10 times slower. How to do it? If I'm not mistaken, then for this you need to install a specific version of the video card driver. I have a 3060 12GB and 64GB of RAM.

Maybe this is not the smartest idea, considering that I can get good speed using GGUF, but I heard that if I use exllama2, the speed will be 2 times faster when using a video card.

Help me figure out what's what.

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here