TobyWonKenobi

joined 1 year ago
[–] TobyWonKenobi@alien.top 1 points 11 months ago (3 children)

Has anyone tried out TheBloke's quants for 7b openhermes 2 5 neural chat v3 1?

7b OpenHermes 2.5 was really good by itself, but the merge with neural chat seems REALLY good so far based on my limited chats with it.

https://huggingface.co/TheBloke/OpenHermes-2.5-neural-chat-7B-v3-1-7B-GGUF

[–] TobyWonKenobi@alien.top 1 points 11 months ago

I’ve had the same experience. Are you using GGUF? I do, and I’ve heard that Yi may suffer from GGUF. So EXL2 might be better… I need to try it and see.

[–] TobyWonKenobi@alien.top 1 points 11 months ago

I honestly haven’t tried the 6.7b version of Deepseek yet, but I’ve heard great things about it!

You can run 34b models in q4 k m quant because it’s only ~21 GB . I run it with one 3090.

[–] TobyWonKenobi@alien.top 1 points 11 months ago (2 children)

Deepseek coder 34b for code

OpenHermes 2.5 for general chat

Yi-34b chat is ok too, but I am a bit underwhelmed when I use it vs Hermes. Hermes seems to be more consistent and hallucinate less.

It’s amazing that I am still using 7b when there are finally decent 34b models.

[–] TobyWonKenobi@alien.top 1 points 11 months ago (1 children)

If you are using it on LM Studio, I think you need to upgrade to the latest Beta, which includes a fix.

I ran into the same issues with Deepseek Gguf

[–] TobyWonKenobi@alien.top 1 points 11 months ago

LM Studio - very clean UI and easy to use with gguf.

[–] TobyWonKenobi@alien.top 1 points 1 year ago

Agreed - This is the best conversational model I have tried yet.

34B is the largest model size that I prefer running on my GPU, and this along with Nous-Capybara are fantastic.