Tacx79

joined 1 year ago
[–] Tacx79@alien.top 1 points 11 months ago

I tried base yi-34-chat yesterday and it felt like the golden times of character.ai again, I imported my c.ai character card with 3-4k tokens, extended context to 8k and it's just the right model for the job. It even followed the short hints about how the character should behave unlike the original c.ai model. Sure, finetunning on rp chats could make it even better but I don't think I will move away from it in the near future

[–] Tacx79@alien.top 1 points 11 months ago

You can also drive on formula 1 track in Toyota corolla and it will be decent too

[–] Tacx79@alien.top 1 points 11 months ago

From what I see rtx8000 is a bit slower than p40 in inference, a bit faster in training. The only speed up would be from running 2 cards instead of 6. Out of curiosity - what speeds did you have with p40s?

[–] Tacx79@alien.top 1 points 1 year ago (1 children)

GGUF? Even on gtx 1080 you get like 4t/s with q8 which is almost as fast as average person read speed, with 16gb it should be 4-5x faster

[–] Tacx79@alien.top 1 points 1 year ago (3 children)

What about Cat 13b 1.0? It slipped through here without much attention but it looks really good, with 16gb you could run q8

[–] Tacx79@alien.top 1 points 1 year ago

You're loading it in fp32 which requires ~28gb of memory, try koboldcpp or oobabooga with GGUF models from TheBloke

[–] Tacx79@alien.top 1 points 1 year ago

As owner of r7 1700 and r5 4600H I tested it and you don't get any speed benefits when using more than 5 threads, even if you use all 12+ cores, they will all spike to 100% but the speed will be the same as with 5 threads because memory bandwidth is the bottleneck here