Secret_Joke_2262

joined 10 months ago
 

I can't figure out how to install this. There are no step-by-step instructions for noobs like me. If anyone can help me, please post your dis in the comments or write here how to install this.

[–] Secret_Joke_2262@alien.top 1 points 9 months ago

What tests have you tested this in?

I'm very interested in storytelling and RP

[–] Secret_Joke_2262@alien.top 1 points 9 months ago (1 children)

What do these tests mean for LLM? There are many values, and I see that in most cases qwen is better than gpt4. In others it is worse or much worse

[–] Secret_Joke_2262@alien.top 1 points 9 months ago

Now everyone is most interested in how much better it is than 70b llama

[–] Secret_Joke_2262@alien.top 1 points 9 months ago

70b Storytelling q5 k m

[–] Secret_Joke_2262@alien.top 1 points 9 months ago (1 children)

A friend told me that for 70b when using q4, performance drops by 10%. The larger the model, the less it suffers from weight quantization

 

I want to download the goliath model but I can only afford Q3_K_M. It is written that it has high quality losses. How much quality loss is there?

I heard that the larger the model, the less it suffers intellectually when it is optimized. I usually use 70B Q5_K_M. Can I expect that 120B Q3_K_M will be significantly better than 70B Q5_K_M so that the time spent on downloading will be worth it?

https://preview.redd.it/1dvpq4bq8c0c1.png?width=1148&format=png&auto=webp&s=79588237d01a66643cfdb12cc13b84866df4bf68

[–] Secret_Joke_2262@alien.top 1 points 10 months ago

120 thousand rubles.

I was an idiot when assembling the PC and somehow inexplicably focused on the processor when assembling the PC, and the video card is quite weak. However, after a while, I realized that this was for the better. I can use the 70B model with 1 token per second. Maybe in the future I will buy another video card so that more layers will help with data processing

3060 12 & 13600K

 

Using gptq if there is not enough video memory on the GPU. How do others do it?

I read somewhere that a video card can use RAM to compensate for the lack of its own memory, but the memory taken from the RAM will be 10 times slower. How to do it? If I'm not mistaken, then for this you need to install a specific version of the video card driver. I have a 3060 12GB and 64GB of RAM.

Maybe this is not the smartest idea, considering that I can get good speed using GGUF, but I heard that if I use exllama2, the speed will be 2 times faster when using a video card.

Help me figure out what's what.