LocalLLaMA

4 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Technical question about hardware limits (alien.top)

submitted 2 years ago by Mobile-Bandicoot-553@alien.top to c/localllama@poweruser.forum

3 comments fedilink hide all child comments

I want to begin by saying my specs are an rtx 4080 with 16gb VRAM + 32GB regular ram.
I've managed to run chronoboros 33b model pretty smoothly, even though a tad slow.
Yet I've ran into hardware issues (I think) trying to run TheBloke/Capybara-Tess-Yi-34B-200K-GPTQ and Panchovix/WizardLM-33B-V1.0-Uncensored-SuperHOT-8k (Tried both AWQ and GPTQ), is there a reason models with a pretty similar amount of parameters won't run?

top 3 comments

sorted by: hot top controversial new old

[–] FlishFlashman@alien.top 1 points 2 years ago (1 children)

What are you using to run them?

In any case, larger context models require *a lot* more RAM/VRAM.

[–] Mobile-Bandicoot-553@alien.top 1 points 2 years ago

I'm using ooba, I haven't bothered much with KoboldCPP because I'm not really running GGUF

[–] alexdzm@alien.top 1 points 2 years ago

What kind of performance do you get on this rig with a 7B 8bit model like mistral?