this post was submitted on 26 Nov 2023
1 points (100.0% liked)
LocalLLaMA
11 readers
4 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Honestly the m1 is probably the cheapest solution you have , get your self LLM studio and try out a 7b_K_M model your going to struggle with anything larger then that. But that will let you get to experience what we are all playing with.
3b's work amazingly and super smoothly but 7b models while running at a fair 15 tokens per second prevent me from using any other application at the same time and occasionally freeze my mouse and screen temporarily until the response is finished
What's the difference between `K_M` models, also why is `Q_4` legacy but not `Q_4_1`, it would be great if someone could explain that lol
Not sure about the K but the M means medium loss of info during the quantisation phase afaik
Q4_0 and Q4_1 would both be legacy.
The k_m is the new "k quant" (I guess it's not that new anymore, it's been around for months now).
The idea is that the more important layers are done at a higher precision, while the less important layers are done at a lower precision.
It seems to work well, thus why it has become the new standard for the most part.
Q4_k_m does the most important layers at 5 bit and the less important ones at 4 bit.
It is closer in quality/perplexity to q5_0, while being closer in size to q4_0.