Ravenpest

joined 1 year ago
[–] Ravenpest@alien.top 1 points 11 months ago

I got a 4090, 128 GB of RAM. 70b runs fine at quant 5 and takes about 280 seconds to generate a message (full reprocessing) and around 100 less on a normal message. So I'd say yo'd be fine with that.

[–] Ravenpest@alien.top 1 points 11 months ago

High temp does more harm than good. I would suggest looking into what the other settings do before raising it, no matter the model

[–] Ravenpest@alien.top 1 points 11 months ago (3 children)

No issues here, just a lot of confidence on certain tokens but overall very little repetition. I use Koboldcpp, Q5 K M. Dont abuse temp, the model seems to be exceedingly sensitive and the smallest imbalance breaks its flow. Try temp 0,9, rep pen 1.11, top k 0, min-p 0.1, typical 1, tfs 1.

[–] Ravenpest@alien.top 1 points 11 months ago

At present the simpler and narrower the scope of the instruction the better. They cannot understand complex tasks. Extrapolate how the model thinks in general, then focus on one thing in particular and arrange your prompt accordingly.

[–] Ravenpest@alien.top 1 points 11 months ago

LLMs are not able to "claim" anything, they're just roleplaying nonsense. Relax.

[–] Ravenpest@alien.top 1 points 11 months ago

Speed+quality = Nous-Capybara 34b. Offload 13 layers to system and get a Q5_K_M. If you have enough system RAM and a decent CPU you wont even feel it. Just quality, Euryale 1.3 70b. It will be slow - up to 200 seconds for a single message at Q5_K_M - but it will deliver.

[–] Ravenpest@alien.top 1 points 1 year ago

If you are old enough to recall Commodore programming I would suggest a better usage of your remaining time on this Earth. No really, this tech is not at the level you seem to be desiring yet.

[–] Ravenpest@alien.top 1 points 1 year ago (1 children)

7b is waay too dumb to be able to roleplay right now. 13b is the bare minimum for that specific task.