this post was submitted on 26 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

Not super knowledgeable about all the different specs of the different Orange PI and Rasberry PI models. I'm looking for something relatively cheap that can connect to WiFi and USB. I want to be able to run at least 13b models at a a decent tok / s.

Also open to other solutions. I have a Mac M1 (8gb RAM) and upgrading the computer itself would be cost prohibitive for me.

you are viewing a single comment's thread
view the rest of the comments
[–] ClassroomGold6910@alien.top 1 points 11 months ago (2 children)

What's the difference between `K_M` models, also why is `Q_4` legacy but not `Q_4_1`, it would be great if someone could explain that lol

[–] ThinkExtension2328@alien.top 1 points 11 months ago

Not sure about the K but the M means medium loss of info during the quantisation phase afaik

[–] Sea_Particular_4014@alien.top 1 points 11 months ago

Q4_0 and Q4_1 would both be legacy.

The k_m is the new "k quant" (I guess it's not that new anymore, it's been around for months now).

The idea is that the more important layers are done at a higher precision, while the less important layers are done at a lower precision.

It seems to work well, thus why it has become the new standard for the most part.

Q4_k_m does the most important layers at 5 bit and the less important ones at 4 bit.

It is closer in quality/perplexity to q5_0, while being closer in size to q4_0.