I’ve had the same experience. Are you using GGUF? I do, and I’ve heard that Yi may suffer from GGUF. So EXL2 might be better… I need to try it and see.
TobyWonKenobi
I honestly haven’t tried the 6.7b version of Deepseek yet, but I’ve heard great things about it!
You can run 34b models in q4 k m quant because it’s only ~21 GB . I run it with one 3090.
Deepseek coder 34b for code
OpenHermes 2.5 for general chat
Yi-34b chat is ok too, but I am a bit underwhelmed when I use it vs Hermes. Hermes seems to be more consistent and hallucinate less.
It’s amazing that I am still using 7b when there are finally decent 34b models.
If you are using it on LM Studio, I think you need to upgrade to the latest Beta, which includes a fix.
I ran into the same issues with Deepseek Gguf
LM Studio - very clean UI and easy to use with gguf.
Agreed - This is the best conversational model I have tried yet.
34B is the largest model size that I prefer running on my GPU, and this along with Nous-Capybara are fantastic.
Has anyone tried out TheBloke's quants for 7b openhermes 2 5 neural chat v3 1?
7b OpenHermes 2.5 was really good by itself, but the merge with neural chat seems REALLY good so far based on my limited chats with it.
https://huggingface.co/TheBloke/OpenHermes-2.5-neural-chat-7B-v3-1-7B-GGUF