BangkokPadang

joined 11 months ago
[–] BangkokPadang@alien.top 1 points 9 months ago

If people started doing this with any regularity, nVidia would intentionally bork the drivers.

[–] BangkokPadang@alien.top 1 points 9 months ago

Honestly, a 4bit quantized version of the 220B model should run on a 192GB M2 Studio, assuming these models could even work with a current transformer/loader.

[–] BangkokPadang@alien.top 1 points 10 months ago (1 children)

What makes this any different than the “base” Yi-34B-200k model?

Where can we see a description of what the model has been finetuned on (datasets used, Lora’s used, etc.) and/or your methods for doing so? I’m not finding any of this information in the model card or the substack link.

[–] BangkokPadang@alien.top 1 points 10 months ago

Text gen web ui. Let’s me use all model formats depending on what I want to test at that moment.

[–] BangkokPadang@alien.top 1 points 10 months ago

For llama2 models set your alpha to 2.65 when loading them at 8k.

The general suggestion is “2.5” but if you plot the formula on a graph, 8192 context aligns with 2.642, so 2.65 is more accurate than 2.5