this post was submitted on 10 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

I tried everything at this point i think i am doing something wrong or i have discovered some very strange bug. i was thinking on posting on their github but i am not sure if i am not simply making a very stupid error.

```

in a fresh conda install set up with python 3.12

i used export LLAMA_CUBLAS=1

then i copied this:

CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python

it runs without complaint creating a working llama-cpp-python install but without cuda support. I know that i have cuda working in the wsl because nvidia-sim shows cuda version 12.

i have tried to set up multiple environments i tried removing and reinstalling, i tried different things besides cuda that also dont work so something seems to be off with the backend part but i dont know what. Best guess i do something very basic wrong like not setting the environmental variable correctly or somthing.

when i reinstalled i used this option

pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir

Also it does simply not create the llama_cpp_cuda folder in so llama-cpp-python not using NVIDIA GPU CUDA - Stack Overflow does not seem to be the problem.

Hardware:

Ryzen 5800H

RTX 3060

16gb of ddr4 RAM

WSL2 Ubuntu

TO test it i run the following code and look at the gpu mem usage which stays at about 0

from llama_cpp import Llama

llm = Llama(model_path="/mnt/d/Maschine learning/llm models/llama_2_7b/llama27bchat.Q4_K_M.gguf", n_gpu_layers=20,

n_threads=6, n_ctx=3584, n_batch=521, verbose=True)

output = llm("Q: Name the planets in the solar system? A: ", max_tokens=32, stop=["Q:", "\n"], echo=True)

So any help or idea what could be going on here would be of great help because i am out of ideas. Thank you very much :)

top 7 comments
sorted by: hot top controversial new old
[–] labloke11@alien.top 1 points 1 year ago (1 children)

What does nvcc --version return?

[–] mrjackspade@alien.top 1 points 1 year ago (1 children)

/u/Noxusequal

I just went through this exact issue building Llama.cpp inside an Ubuntu docker container. If you don't have nvcc installed it will compile without error, but wont include CUDA support regardless of what you set for options. Check to make sure nvcc is installed in the machine

[–] Noxusequal@alien.top 1 points 1 year ago

jup that was part of it :) its working now tahnk you

[–] sdfgeoff@alien.top 1 points 1 year ago (1 children)

You may have to tell it to build with cublas AND force the reinstall: CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir otherwise the build may be going 'it's already done' and the install may be reinstalling the non-gpu version

Also, no idea about WSL, I've only tried this on actual linux installs.

[–] Noxusequal@alien.top 1 points 1 year ago

I did this :) should have specified but when reinstalling i set both flags as env variables again.

[–] vec1nu@alien.top 1 points 1 year ago

I think you don't have cuda properly setup. Use pip install --verbose to see the compilation messages when it's trying to build llamacpp with cuda. You might need to manually set the CUDA_HOME environment variable.

[–] Noxusequal@alien.top 1 points 1 year ago

Okay its working now i need to install nvcc seperatly and change the CUDA_HOME evironment. Also to install nvcc i needed to get the simlinks working manually. but with 15 minutes of google search i got it to work :D thank you all :)