this post was submitted on 25 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

So I'm considering getting a good LLM rig, and the M2 Ultra seems to be a good option for large memory, with much lower power usage/heat than 2 to 8 3090s or 4090s, albeit with lower speeds.

I want to know if anyone is using one, and what it's like. I've read that it is less supported by software which could be an issue. Also, is it good for Stable Diffusion?

Another question is about memory and context length. Does a big memory let you increase the context length with smaller models where the parameters don't fill the memory? I feel a big context would be useful for writing books and things.

Is there anything else to consider? Thanks.

you are viewing a single comment's thread
view the rest of the comments
[–] aikitoria@alien.top 1 points 11 months ago

Hmm, I didn't notice a major quality loss when I swapped from mistral-7b-openorca.Q8_0.gguf (running in koboldcpp) to Mistral-7B-OpenOrca-8.0bpw-h6-exl2 (running in text-gen-webui). Maybe I should try again. Sure you were using comparable sampling settings for both? I noticed for example SillyTavern has entirely different presets per backend.

Still need to try the new NeuralChat myself also, I was just going to go for the exl2, so this could be a good tip!