overview for Acceptable

40x or more speedup by selecting important neurons in c/localllama@poweruser.forum

[–] Acceptable_Can5509@alien.top 1 points 1 year ago (1 children)

Basically gpt 4 turbo

bellman-7b - a Swedish llama2 finetune in c/localllama@poweruser.forum

[–] Acceptable_Can5509@alien.top 1 points 1 year ago (1 children)

Can you share the colab so others can look at how it was done?

Clearing up confusion: GPT 3.5-Turbo may not be 20b after all in c/localllama@poweruser.forum

[–] Acceptable_Can5509@alien.top 1 points 1 year ago

Probably heavily quantized and uses a smaller gpt-3 model.

I am going to buy H100s. There are too many options. in c/localllama@poweruser.forum

[–] Acceptable_Can5509@alien.top 1 points 1 year ago (2 children)

Wait, whos money is it? Can't you just rent as well?

1

Llama-2 7b Unquantized Transformers using 26.8GB of Vram. (alien.top)

submitted 1 year ago by Acceptable_Can5509@alien.top to c/localllama@poweruser.forum

0 comments fedilink

I'm running Llama-2 7b using Google Colab on a 40gb A100. However it's using 26.8 gb of vram, is that normal? I tried using 13b version however the system ran out of memory. Yes I know quantized versions are almost as good but I specifically need unquantized.

https://colab.research.google.com/drive/10KL87N1ZQxSgPmS9eZxPKTXnobUR_pYT?usp=sharing