constanzabestest

joined 10 months ago
[–] constanzabestest@alien.top 1 points 9 months ago (3 children)

This is absolutely amazing but i have a question. is there a way to make it consistently generate less text? im enjoying my RPs the most when the messages are a bit more on a simpler side (around 100 tokens), but these settings make the ai generate well past the 300 token target. I tried adding stuff like "around 100 words long" or "no more than 100 words" or even "limit yourself to 100 tokens" to the last output sequence but nothing seems to work.

 

Title essentially. I'm currently running RTX 3060 with 12GB of VRAM, 32GB RAM and an i5-9600k. Been running 7B and 13B models effortlessly via KoboldCPP(i tend to offload all 35 layers to GPU for 7Bs, and 40 for 13Bs) + SillyTavern for role playing purposes, but slowdown becomes noticeable at higher context with 13Bs(Not too bad so i deal with it). Is this setup capable of running bigger models like 20B or potentially even 34B?