JoseConseco_

joined 1 year ago

ExLlamaV2: The Fastest Library to Run LLMs in c/localllama@poweruser.forum

[–] JoseConseco_@alien.top 1 points 11 months ago

So how much vram would be required for 34b model or 14b model? I assume no cpu offloading right? With my 12gb vram, I guess I could only feed 14bilion parameters models, maybe even not that.

permalink
fedilink
source