turamura

joined 11 months ago
[โ€“] turamura@alien.top 1 points 11 months ago

Got it to work! Thank you!!

[โ€“] turamura@alien.top 1 points 11 months ago (1 children)

Hi, thanks for your comment!

I saw e.g., the "inference.py" in the repo which I think I could utilize. It actually looks kind of simple. However, I am struggling with what to provide as the "model directory". Should I just download a Huggingface model (for example, I would like to work with TheBloke/Llama-2-70B-GPTQ), and then specify this as model directory? Or what kind of structure does ExLlama expect as model directory?

 

I want to use the ExLlama models because it enables me to use the Llama 70b version with my 2 RTX 4090. I managed to get it to work pretty easily via text generation webui and inference is really fast! So far so good...

However, I need the model in python to do some large scale analyses. I cannot seem to find any guide/tutorial in which it is explained how to use ExLlama in the usual python/huggingface setup.

Is this just not possible? If it is, can someone pinpoint me to some examplary code in which ExLlama is used in python.

Much appreciated!