You can also drive on formula 1 track in Toyota corolla and it will be decent too
Tacx79
joined 1 year ago
From what I see rtx8000 is a bit slower than p40 in inference, a bit faster in training. The only speed up would be from running 2 cards instead of 6. Out of curiosity - what speeds did you have with p40s?
GGUF? Even on gtx 1080 you get like 4t/s with q8 which is almost as fast as average person read speed, with 16gb it should be 4-5x faster
What about Cat 13b 1.0? It slipped through here without much attention but it looks really good, with 16gb you could run q8
As owner of r7 1700 and r5 4600H I tested it and you don't get any speed benefits when using more than 5 threads, even if you use all 12+ cores, they will all spike to 100% but the speed will be the same as with 5 threads because memory bandwidth is the bottleneck here
I tried base yi-34-chat yesterday and it felt like the golden times of character.ai again, I imported my c.ai character card with 3-4k tokens, extended context to 8k and it's just the right model for the job. It even followed the short hints about how the character should behave unlike the original c.ai model. Sure, finetunning on rp chats could make it even better but I don't think I will move away from it in the near future