overview for a_beautiful

Politically balanced chat model? in c/localllama@poweruser.forum

[–] a_beautiful_rhind@alien.top 1 points 2 years ago

Good luck. Centrism is not allowed. You would have to skip the last decade of internet data. Social engineering works for both people and language models much the same.

QuIP#: SOTA 2-bit quantization method, now implemented in text-generation-webui (experimental) in c/localllama@poweruser.forum

[–] a_beautiful_rhind@alien.top 1 points 2 years ago

From the issue about this in the exllamav2 repo, quip was using more memory and slower than exl. How much context can you fit?

How to upgrade to the next VRAM breakpoints, and is it worth it? in c/localllama@poweruser.forum

[–] a_beautiful_rhind@alien.top 1 points 2 years ago

I'm not getting a super huge jump with the bigger models yet. Just a mild bump. I got a P100 to load the low 100s and have exllama work. That's 64g of FP16 using vram.

For bigger I can use FP32 and put back the 2 more P40s. That's 120g of vram. Also 6 vidya cards :P

It required building for this type of system from the start. I'm not made of money either, I just upgrade it over time.

1

Yet another 120b. Trained on limarp. (huggingface.co)

submitted 2 years ago by a_beautiful_rhind@alien.top to c/localllama@poweruser.forum

1 comments fedilink

Yet another 70B Foundation Model: Aquila2-70B-Expr in c/localllama@poweruser.forum

[–] a_beautiful_rhind@alien.top 1 points 2 years ago

It really is christmas.

Looking for a 3090 to LLM on? ZOTAC GAMING GeForce RTX 3090 Ti AMP Extreme Holo [Open Box] 2 Year Warranty - $864. in c/localllama@poweruser.forum

[–] a_beautiful_rhind@alien.top 1 points 2 years ago

I got a P100 for like $150 to see how well it will work with exllama + 3090s and if it is any faster at SD.

These guys are all gone already.

NeuralHermes-2.5: Boosting SFT models' performance with DPO in c/localllama@poweruser.forum

[–] a_beautiful_rhind@alien.top 1 points 2 years ago

Would be cool to see this in a 34b and 70b.

Do you think there's a market for local LLMs? in c/localllama@poweruser.forum

[–] a_beautiful_rhind@alien.top 1 points 2 years ago

Aren't there people selling such services to companies here? Implementing RAG, etc.

Qwen-72B released in c/localllama@poweruser.forum

[–] a_beautiful_rhind@alien.top 1 points 2 years ago

Heh, 72b with 32k and GQA seems reasonable. Will make for interesting tunes if it's not super restricted.

Is Taiwan an independent country? Deepseek LLM: Msg withdrawn due to security! in c/localllama@poweruser.forum

[–] a_beautiful_rhind@alien.top 1 points 2 years ago

That's a good sign if anything.

How fast is 3090 for Codellama 70B 4/8bit? in c/localllama@poweruser.forum

[–] a_beautiful_rhind@alien.top 1 points 2 years ago

one is not enough

Deepseek llm 67b Chat & Base in c/localllama@poweruser.forum

[–] a_beautiful_rhind@alien.top 1 points 2 years ago

Does it give refusals on base? 67B sounds like full foundation train.

Why is a single a100 so slow? in c/localllama@poweruser.forum

[–] a_beautiful_rhind@alien.top 1 points 2 years ago

Something is wrong with your environment. even P40s give more than that.

Other option is you don't get enough tokens to get proper t/s speed. What was the total inference time?