LocalLLaMA

11 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Can I run LLM models on RTX 2070 with 8GB? (alien.top)

submitted 2 years ago by jacek2023@alien.top to c/localllama@poweruser.forum

1 comments fedilink hide all child comments

I was trying to run some models using text-generation-webui. I was testing mistralai_Mistral-7B-v0.1, NousResearch_Llama-2-7b-hf and stabilityai_StableBeluga-7B. Without much success. I remember I needed to change to 4 bits and I couldn't make them respond in a good way for chat. So before I go further, is 8GB a problem? Because Stable Diffusion works beautifuly. Could you share your experiences with 8GB GPU?

top 1 comments

sorted by: hot top controversial new old

[–] Semi_Tech@alien.top 1 points 2 years ago

Download koboldcpp and get gguf version of any model you want, preferably 7B from our pal thebloke. Only get Q4 or higher quantization. Q6 is a bit slow but works good. In koboldcpp. Exe select cublast and set the layers at 35-40.

You should get abot 5T/s or more.

This is the simplest method to run llms from my testing.