watkykjynaaier

joined 10 months ago
[–] watkykjynaaier@alien.top 1 points 9 months ago (1 children)

Given my M1 Max's 400GB/s memory bandwidth, what would be the bottleneck for this on Apple Silicon? Disk speed? Is it possible to get this running on Metal?

[–] watkykjynaaier@alien.top 1 points 9 months ago (1 children)

I'm on M1 Max with 32gb, with GGUF in LM Studio you can run the 34b Yi finetunes well, but that's as high as you can go for now. The 3 bit 70b quants will technically run but not in any useful way. As others have noted, RAM is the make or break factor here. Get as much as you can, the processor generation is much less important.

[–] watkykjynaaier@alien.top 1 points 10 months ago (1 children)

I've completely fixed gibberish output on Yi-based and other models by setting the RoPE Frequency Scale to a number less than one, which seems to be the default. I have no idea why that works, but it does.

What I find even more strange is the models often keep working after setting the frequency scale back to 1.