overview for Secret_Joke

1

How to install llama.cpp version for Qwen72B? (alien.top)

submitted 9 months ago by Secret_Joke_2262@alien.top to c/localllama@poweruser.forum

1 comments fedilink

I can't figure out how to install this. There are no step-by-step instructions for noobs like me. If anyone can help me, please post your dis in the comments or write here how to install this.

Qwen-72B released in c/localllama@poweruser.forum

[–] Secret_Joke_2262@alien.top 1 points 9 months ago

What tests have you tested this in?

I'm very interested in storytelling and RP

Qwen-72B released in c/localllama@poweruser.forum

[–] Secret_Joke_2262@alien.top 1 points 9 months ago (1 children)

What do these tests mean for LLM? There are many values, and I see that in most cases qwen is better than gpt4. In others it is worse or much worse

Qwen-72B released in c/localllama@poweruser.forum

[–] Secret_Joke_2262@alien.top 1 points 9 months ago

Now everyone is most interested in how much better it is than 70b llama

Models Megathread #2 - What models are you currently using? in c/localllama@poweruser.forum

[–] Secret_Joke_2262@alien.top 1 points 9 months ago

70b Storytelling q5 k m

Quantizing 70b models to 4-bit, how much does performance degrade? in c/localllama@poweruser.forum

[–] Secret_Joke_2262@alien.top 1 points 9 months ago (1 children)

A friend told me that for 70b when using q4, performance drops by 10%. The larger the model, the less it suffers from weight quantization

1

How much more stupid is the 120B goliath Q3_K_M than the larger options? (alien.top)

submitted 10 months ago by Secret_Joke_2262@alien.top to c/localllama@poweruser.forum

6 comments fedilink

I want to download the goliath model but I can only afford Q3_K_M. It is written that it has high quality losses. How much quality loss is there?

I heard that the larger the model, the less it suffers intellectually when it is optimized. I usually use 70B Q5_K_M. Can I expect that 120B Q3_K_M will be significantly better than 70B Q5_K_M so that the time spent on downloading will be worth it?

https://preview.redd.it/1dvpq4bq8c0c1.png?width=1148&format=png&auto=webp&s=79588237d01a66643cfdb12cc13b84866df4bf68

Anyone spend a bunch of $$ on a computer for LLM and regret it? in c/localllama@poweruser.forum

[–] Secret_Joke_2262@alien.top 1 points 10 months ago

120 thousand rubles.

I was an idiot when assembling the PC and somehow inexplicably focused on the processor when assembling the PC, and the video card is quite weak. However, after a while, I realized that this was for the better. I can use the 70B model with 1 token per second. Maybe in the future I will buy another video card so that more layers will help with data processing

3060 12 & 13600K

1

Using gptq if there is not enough video memory on the GPU. How do others do it? (alien.top)

submitted 10 months ago by Secret_Joke_2262@alien.top to c/localllama@poweruser.forum

0 comments fedilink

Using gptq if there is not enough video memory on the GPU. How do others do it?

I read somewhere that a video card can use RAM to compensate for the lack of its own memory, but the memory taken from the RAM will be 10 times slower. How to do it? If I'm not mistaken, then for this you need to install a specific version of the video card driver. I have a 3060 12GB and 64GB of RAM.

Maybe this is not the smartest idea, considering that I can get good speed using GGUF, but I heard that if I use exllama2, the speed will be 2 times faster when using a video card.

Help me figure out what's what.