overview for BangkokPadang

100B, 220B, and 600B models on huggingface! in c/localllama@poweruser.forum

[–] BangkokPadang@alien.top 1 points 9 months ago

If people started doing this with any regularity, nVidia would intentionally bork the drivers.

100B, 220B, and 600B models on huggingface! in c/localllama@poweruser.forum

[–] BangkokPadang@alien.top 1 points 9 months ago

Honestly, a 4bit quantized version of the 220B model should run on a 192GB M2 Studio, assuming these models could even work with a current transformer/loader.

Introducing Tess: Tess-M with 200K Context Length in c/localllama@poweruser.forum

[–] BangkokPadang@alien.top 1 points 10 months ago (1 children)

What makes this any different than the “base” Yi-34B-200k model?

Where can we see a description of what the model has been finetuned on (datasets used, Lora’s used, etc.) and/or your methods for doing so? I’m not finding any of this information in the model card or the substack link.

What UI do you use and why? in c/localllama@poweruser.forum

[–] BangkokPadang@alien.top 1 points 10 months ago

Text gen web ui. Let’s me use all model formats depending on what I want to test at that moment.

How to achieve more than 4k context? in c/localllama@poweruser.forum

[–] BangkokPadang@alien.top 1 points 10 months ago

For llama2 models set your alpha to 2.65 when loading them at 8k.

The general suggestion is “2.5” but if you plot the formula on a graph, 8192 context aligns with 2.642, so 2.65 is more accurate than 2.5