JawGBoi

joined 10 months ago

70b Q5_K_M gguf models on rtx 3090 (24gb) (alien.top)

submitted 9 months ago by JawGBoi@alien.top to c/localllama@poweruser.forum

0 comments fedilink

I'm thinking of upgrading to 64GB of RAM so I can lot larger models on my rtx 3090.

If I want to run tigerbot-70b-chat-v2.Q5_K_M.gguf which has max RAM usage of 51.61GB, assuming I load 23GB worth of layers into VRAM that leaves 51.61-23=28.61 left to load in RAM. My operating system already uses up to 9.2GB of RAM which means I need 37.81GB of RAM (hence 64GB).

How many tokens/s can I expect from 23GB out of 51.61GB being loaded in VRAM, and 28.61GB being loaded in RAM on an rtx 3090? I'm mostly curious about Q5_K_M quant, but I'm still interested in other quants.

Any Good Speech-to-Speech Voice Changer Models? in c/localllama@poweruser.forum

[–] JawGBoi@alien.top 1 points 10 months ago

RVC is definitely the best for this. Unlike most other methods, you don't provide text transcriptions for the training dataset - this makes RVC models really easy to train and there is no compromise of quality.