dampflokfreund

joined 11 months ago
 

Gryphe, creator of MythoMax basically merged the best Mistral models together. This should be a really fantastic model!

https://huggingface.co/Gryphe/MythoMist-7b (links to quantized models by TheBloke can be found there)

Edit: oof, messed up the title. It's MythoMist, not Mix.

​

[–] dampflokfreund@alien.top 1 points 10 months ago

Yes, it's system wide. You can set your prefered way in Nvidia control panel->global settings-> cuda systemem fallback policity.

Driver default is prefer systemmem fallback, which means it's going to offload to RAM instead of crashing when VRAM is full.

No System Mem fallback is basically the old memory management, it crashes once your VRAM is full.

[–] dampflokfreund@alien.top 1 points 10 months ago (3 children)

Great test!

Unfortunately the Llama 2 Chat template is completely broken in SillyTavern. It not only uses a new line as separator instead of the correct one, but also ends the prompt after the system prompt with the input sequence [INS] instead of [/INST] if you are using the vector storage or an example dialogue. You can see for yourself by comparing the output to what the format should look like.

So these Airoboros 3.1.2 tests are unfortunately borked. Still though, interesting result for the other models.

 

This will make people, who reverted to older drivers because they suffered from lower performance due to ram swapping, happy. If you follow this simple guide: https://nvidia.custhelp.com/app/answers/detail/a_id/5490 the old memory management will return, where it just crashes instead of slowing down

Personally I prefer the new memory management, but I'm glad the option is there now for people who don't. Thank you Nvidia for listening! :D