this post was submitted on 18 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

Looking for any model that can run with 20 GB VRAM. Thanks!

you are viewing a single comment's thread
view the rest of the comments
[–] BriannaBromell@alien.top 1 points 10 months ago (3 children)

Im using this and its shockingly great:
https://huggingface.co/TheBloke/Xwin-MLewd-7B-V0.2-GPTQ

Just discovering TheBloke/Xwin-MLewd-13B-v0.2-GPTQ

[–] zumba75@alien.top 1 points 10 months ago (1 children)

What is the app you're using it in? I tried the 13b in Ooga Booga and wasn't able to make it work consistently (goes and replies instead of me after a short while)

[–] BriannaBromell@alien.top 1 points 9 months ago

I just recently wrote my own pure python/chromadb program but before i had great success in oogabooga and this model. I think maybe there is a setting that is overlooked that maybe i enabled in oobabooga or maybe its one of the generation kwargs that just seems to work flawlessly. The model has issues with keeping its self separate from the user so take care in your wording in the system message too.

having seen the model's tokenizer.default_chat_template that isnt unbelievable, its a real mess with impossible conditions.

My health is keeping me from making a better response but If you're dead set on using it message me and we'll work it out together. I like this model the most.

load more comments (1 replies)