overview for Tacx79

Maybe anecdotal but I have very high hopes for Yi 34b finetunes. in c/localllama@poweruser.forum

[–] Tacx79@alien.top 1 points 11 months ago

I tried base yi-34-chat yesterday and it felt like the golden times of character.ai again, I imported my c.ai character card with 3-4k tokens, extended context to 8k and it's just the right model for the job. It even followed the short hints about how the character should behave unlike the original c.ai model. Sure, finetunning on rp chats could make it even better but I don't think I will move away from it in the near future

Yi-34B and Yi-34B-Chat are out in c/localllama@poweruser.forum

[–] Tacx79@alien.top 1 points 11 months ago

You can also drive on formula 1 track in Toyota corolla and it will be decent too

Picking out the next gpu's to buy, after using P40's for running LLama in c/localllama@poweruser.forum

[–] Tacx79@alien.top 1 points 11 months ago

From what I see rtx8000 is a bit slower than p40 in inference, a bit faster in training. The only speed up would be from running 2 cards instead of 6. Out of curiosity - what speeds did you have with p40s?

Look for a model better than MythoMax for Chat/RP in c/localllama@poweruser.forum

[–] Tacx79@alien.top 1 points 1 year ago (1 children)

GGUF? Even on gtx 1080 you get like 4t/s with q8 which is almost as fast as average person read speed, with 16gb it should be 4-5x faster

Look for a model better than MythoMax for Chat/RP in c/localllama@poweruser.forum

[–] Tacx79@alien.top 1 points 1 year ago (3 children)

What about Cat 13b 1.0? It slipped through here without much attention but it looks really good, with 16gb you could run q8

Hugging Face Llama-2 (7b) taking too much time while inferencing in c/localllama@poweruser.forum

[–] Tacx79@alien.top 1 points 1 year ago

You're loading it in fp32 which requires ~28gb of memory, try koboldcpp or oobabooga with GGUF models from TheBloke

How is the processor core usage managed, and can we tweak it? in c/localllama@poweruser.forum

[–] Tacx79@alien.top 1 points 1 year ago

As owner of r7 1700 and r5 4600H I tested it and you don't get any speed benefits when using more than 5 threads, even if you use all 12+ cores, they will all spike to 100% but the speed will be the same as with 5 threads because memory bandwidth is the bottleneck here