overview for watkykjynaaier

Fitting 70B models in a 4gb GPU, The whole model, no quants or distil or anything! in c/localllama@poweruser.forum

[–] watkykjynaaier@alien.top 1 points 9 months ago (1 children)

Given my M1 Max's 400GB/s memory bandwidth, what would be the bottleneck for this on Apple Silicon? Disk speed? Is it possible to get this running on Metal?

Best bang for buck for MacBP? in c/localllama@poweruser.forum

[–] watkykjynaaier@alien.top 1 points 9 months ago (1 children)

I'm on M1 Max with 32gb, with GGUF in LM Studio you can run the 34b Yi finetunes well, but that's as high as you can go for now. The 3 bit 70b quants will technically run but not in any useful way. As others have noted, RAM is the make or break factor here. Get as much as you can, the processor generation is much less important.

Yi-23B-Llama: Distil version of Yi-34B-Llama in c/localllama@poweruser.forum

[–] watkykjynaaier@alien.top 1 points 10 months ago (1 children)

I've completely fixed gibberish output on Yi-based and other models by setting the RoPE Frequency Scale to a number less than one, which seems to be the default. I have no idea why that works, but it does.

What I find even more strange is the models often keep working after setting the frequency scale back to 1.