this post was submitted on 10 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

I tried everything at this point i think i am doing something wrong or i have discovered some very strange bug. i was thinking on posting on their github but i am not sure if i am not simply making a very stupid error.

```

in a fresh conda install set up with python 3.12

i used export LLAMA_CUBLAS=1

then i copied this:

CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python

it runs without complaint creating a working llama-cpp-python install but without cuda support. I know that i have cuda working in the wsl because nvidia-sim shows cuda version 12.

i have tried to set up multiple environments i tried removing and reinstalling, i tried different things besides cuda that also dont work so something seems to be off with the backend part but i dont know what. Best guess i do something very basic wrong like not setting the environmental variable correctly or somthing.

when i reinstalled i used this option

pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir

Also it does simply not create the llama_cpp_cuda folder in so llama-cpp-python not using NVIDIA GPU CUDA - Stack Overflow does not seem to be the problem.

Hardware:

Ryzen 5800H

RTX 3060

16gb of ddr4 RAM

WSL2 Ubuntu

TO test it i run the following code and look at the gpu mem usage which stays at about 0

from llama_cpp import Llama

llm = Llama(model_path="/mnt/d/Maschine learning/llm models/llama_2_7b/llama27bchat.Q4_K_M.gguf", n_gpu_layers=20,

n_threads=6, n_ctx=3584, n_batch=521, verbose=True)

output = llm("Q: Name the planets in the solar system? A: ", max_tokens=32, stop=["Q:", "\n"], echo=True)

So any help or idea what could be going on here would be of great help because i am out of ideas. Thank you very much :)

you are viewing a single comment's thread
view the rest of the comments
[–] sdfgeoff@alien.top 1 points 1 year ago (1 children)

You may have to tell it to build with cublas AND force the reinstall: CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir otherwise the build may be going 'it's already done' and the install may be reinstalling the non-gpu version

Also, no idea about WSL, I've only tried this on actual linux installs.

[–] Noxusequal@alien.top 1 points 1 year ago

I did this :) should have specified but when reinstalling i set both flags as env variables again.