LocalLLaMA

11 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

submitted 2 years ago by abandonedexplorer@alien.top to c/localllama@poweruser.forum

13 comments fedilink hide all child comments

I am talking about this particular model:

I specifically use: goliath-120b.Q4_K_M.gguf

I can run it on runpod.io on this A100 instance with "humane" speed, but it is way too slow for creating long form text.

These are my settings in text-generation-webui:

Any advice? Thanks

you are viewing a single comment's thread
view the rest of the comments

[–] panchovix@alien.top 1 points 2 years ago (1 children)

I tested 4K and it worked fine at 4.5bpw. Max will be prob about 6k. I didn't use 8bit cache

Now 4.5bpw is kinda overkill, 4.12~ bpw is like 4bit 128g gptq, and that would let you use a lot more context.

[–] Dead_Internet_Theory@alien.top 1 points 2 years ago

That is awesome. What kind of platform do you use for that 3 GPUs setup?