LocalLLaMA

4 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Exllama outside of text generation webui? (alien.top)

submitted 2 years ago by turamura@alien.top to c/localllama@poweruser.forum

5 comments fedilink hide all child comments

I want to use the ExLlama models because it enables me to use the Llama 70b version with my 2 RTX 4090. I managed to get it to work pretty easily via text generation webui and inference is really fast! So far so good...

However, I need the model in python to do some large scale analyses. I cannot seem to find any guide/tutorial in which it is explained how to use ExLlama in the usual python/huggingface setup.

Is this just not possible? If it is, can someone pinpoint me to some examplary code in which ExLlama is used in python.

Much appreciated!

you are viewing a single comment's thread
view the rest of the comments

[–] turamura@alien.top 1 points 2 years ago (1 children)

Hi, thanks for your comment!

I saw e.g., the "inference.py" in the repo which I think I could utilize. It actually looks kind of simple. However, I am struggling with what to provide as the "model directory". Should I just download a Huggingface model (for example, I would like to work with TheBloke/Llama-2-70B-GPTQ), and then specify this as model directory? Or what kind of structure does ExLlama expect as model directory?

[–] ReturningTarzan@alien.top 1 points 2 years ago

Yes, the model directory is just all the files from a HF model, in one folder. You can download them directly from the "files" tab of a HF model by clicking all the little download arrows, or there's huggingface-cli. Also git can be used to clone models if you've got git-lfs installed.

It specifically needs the following files:

config.json
*.safetensors
tokenizer.model (preferable) or tokenizer.json
added_tokens.json (if the model has one)

But it may utilize other files in the future such as tokenizer_config.json, so best just to download all the files and keep them in one folder.