LocalLLaMA

11 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Local generation not utilising GPU? (alien.top)

submitted 2 years ago by AlternativeParfait47@alien.top to c/localllama@poweruser.forum

3 comments fedilink hide all child comments

Hello, this has probably been asked a bazillion times, but I can't find an example. I have installed stable diffusion and LLaMA on my new PC. However, it does not appear to be utilising my new RTX 4080 for generation. Generation of text or images is very slow, and the GPU utilisation stays at 0% - 4% throughout. Any idea how this could be addressed? I am no expert, so I have not a clue what I could change for this.

It is on a laptop by the way, NVIDIA RTX 4080 (Laptop) and 12th Gen Intel CPU.

Thanks in advance!

top 3 comments

sorted by: hot top controversial new old

[–] cndvcndv@alien.top 1 points 2 years ago

I am not sure what installing llama means. There are different ways of running llama. But if the program you installed is supposed to utilize gpu, it could be a cuda issue.

[–] Lup0Grigi0@alien.top 1 points 2 years ago

If you have installed or use Oogabooga tex-generation-webui download a model that has ben quantized for nVidia GPU. Those are the models with GTPQ and the newer AWQ suffixes.
On hugging face, the user "thebloke" has aggregated dozens and dozens, maybe hundreds, of models.
the youtube channel Aitrepreneur a couple good videos on installing ooga and how to run the GPU quantized models

[–] FlishFlashman@alien.top 1 points 2 years ago

What software are you using to run LLaMA and Stable Diffusion?

What version of the LLaMA model are you trying to run? How many parameters? What quantization?